Educational Assessment: A Brief History
Educational assessment in the Western world has a long but very irregular history. Two distinct threads are woven together: the first is the variety of settings in which testing itself came to have practical use while the second is the incorporation of increasingly rigorous methods by which to make sense out of the results of that testing. This chapter sets out some of the key developments in each of these two areas, from their origins until the dawn of contemporary psychometrics. For extended periods of time even the simplest improvements in either testing or statistics fought long and hard against tradition and inertia. It took many generations for the two threads to finally merge into a full-fledged science of educational measurement.
This is a preview of subscription content, log in via an institution to check access.
Access this chapter
Subscribe and save
Springer+ Basic
€32.70 /Month
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (France)
eBook EUR 42.79 Price includes VAT (France)
Softcover Book EUR 52.74 Price includes VAT (France)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Similar content being viewed by others
What Counts as Evidence: A Review of Validity Studies in Educational and Psychological Measurement
Chapter © 2014
Test Standards and Psychometric Modeling
Chapter © 2016
Measurement in Education: A Primer on Designing Assessments
Chapter © 2018
References
- Adams, H. F. (1936). Validity, reliability and objectivity. In W.R. Miles (Ed.), Psychological studies of human variability. Psychological Monographs, 57, 329–350. Google Scholar
- Barthelmess, H. M. (1931). The validity of intelligence test elements. New York: Teachers College. Google Scholar
- Binet, A. (1898). La mesure en psychologie individuelle. Revue Philosophique, 46, 113–123. Google Scholar
- Binet, A., & Simon, T. (1905). Methodes nouvelles pour le diagnostic scientifique des etats inferieurs de l’intelligence. L’Annee Psychologique, 11, 163–190. ArticleGoogle Scholar
- Binet, A., & Simon, T. (1910). Sur la necessite d’une methode applicable au diagnostic des arrierees militaires. Annales Medico-psychologique. Google Scholar
- Birnbaum, A. (1957). An efficient design and use of tests of a mental ability for various decision making problems. Series Report No. 58-16, USAF School of Aviation Medicine, Randolph, TX. Google Scholar
- Birnbaum, A. (1958). On the estimation of mental ability. Series Report No.15, USAF School of Aviation Medicine, Randolph, TX. Google Scholar
- Bower, J. (1975). A history of western education. Civilization of Europe sixth to sixteenth century, vol. 2. New York: St. Martin’s Press. Google Scholar
- Bright, O. T. (1895). Changes — wise and unwise — in grammar and high schools. Journal of Proceeding and Addresses, St. Paul: National Education Association. Google Scholar
- Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322. Google Scholar
- Brown, W., & Thompson, G. H. (1940). The essentials of mental measurement, Cambridge, MA: Cambridge University Press. Google Scholar
- Brownless, V. T., & Keats, J. A. (1958). A retest method of studying partial knowledge and other factors influencing item response. Psychometrika, 23, 67–73. ArticleGoogle Scholar
- Burt, C. L. (1909). Experimental tests of general intelligence. British Journal of Psychology, 3, 94–177. Google Scholar
- Burt, C. L. (1936). The use of psychological tests in England. In Sadler, M. E., Abbott, A., Burts, C. L., Burns, C. D., Hartog, P., Spearman, C., and Stirk, S. D. Essays on examinations. London: Macmillan. Google Scholar
- Campbell, N.R. (1920). Physics, the elements. Cambridge: Cambridge University Press. Google Scholar
- Campbell, N.R. (1921). What is science? London: Methuen. Google Scholar
- Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381. ArticleGoogle Scholar
- Cattell, R. B. (1964). Validity and reliability: A proposed more basic set of concepts. Journal of Educational Psychology, 55, 1–22. ArticleGoogle Scholar
- Clarke, A. D. B., and Clarke, A. M. (1985). Mental testing: origins, evolution, and present status. History of Education, 14, 263–272. ArticleGoogle Scholar
- Cochran, W. G. (1976). Early development of techniques in experimentation. In D. B. Owen (Ed.), On the history of statistics and probability. New York: Dekker. Google Scholar
- Cremin, L. (1961). The transformation of the school. New York: Knopf. Google Scholar
- Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12, 1–16. ArticleGoogle Scholar
- Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American Psychologist, 30, 1–14. ArticleGoogle Scholar
- Cullen, M. J. (1975). The statistical movement in early Victorian Britain: The foundations of empirical social research. New York: Barnes & Noble. Google Scholar
- DuBois, P. H. (1964). A test-dominated society: China, 1115 B.C.-1905 A.D. ETS Invitational conference on testing problems. Princeton: Educational Testing Service. Google Scholar
- DuBois, P. H. (1970). A history of psychological testing. Boston: Allyn and Bacon. Google Scholar
- Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of the Royal Statistical Society. 53, 460–475, 644-673. Google Scholar
- Englehart, M. D. (1950). Examinations. In W. S. Monroe (Ed.), Encyclopedia of educational research. New York: MacMillan. Google Scholar
- Ferguson, G. A. (1942). Item selection by the constant process. Psychometrika, 7 19–29. ArticleGoogle Scholar
- Fisher, R. A. (1956). Statistical methods and scientific inference. New York: Hafner. Google Scholar
- Fisher, A. (1915). The mathematical theory of probabilities and its application to frequency curves and statistical methods. New York: Macmillan. Google Scholar
- Freeman, F. N. (1926). Mental tests: Their history, principles and applications. Boston: Houghton Mifflin. BookGoogle Scholar
- Goodenough, F. L. (1936). A critical note on the use of the term ‘reliability’ in mental measurement. Journal of Educational Psychology, 27, 173–178. ArticleGoogle Scholar
- Graves, F. P. (1950). A history of education in modern times. New York: MacMillan. Google Scholar
- Guilford, J.P. (1936). Psychometric methods. New York: McGraw-Hill. Google Scholar
- Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika, 26, 93–107. ArticleGoogle Scholar
- Gulliksen, H. (1950). Theory of mental tests. New York: Wiley. BookGoogle Scholar
- Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150. ArticleGoogle Scholar
- Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96. ArticleGoogle Scholar
- Horst, A. P. (1936). Item selection by means of maximizing function. Psychometrika, 1, 229–244. ArticleGoogle Scholar
- Keats, J. A., & Lord, F. M. (1962). A theoretical distribution for mental test scores. Psychometrika, 27, 59–72. ArticleGoogle Scholar
- Kelley, T. L. (1927). Interpretation of educational measurements. Yonkers-on-Hudson, NY: World. Google Scholar
- Kelley, T. L., & Krey, A. C. (1934). Tests and measurements in the social sciences. Report of the Commission on the Social Studies, American Historical Association, Part IV. New York: Charles Scribner’s Sons. Google Scholar
- Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160. ArticleGoogle Scholar
- Latham, H. (1877). On the action of examinations considered as a means of selection. Cambridge: Deighton Bell. Google Scholar
- Lawley, D. N. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh, 61, Section A, 273–287. Google Scholar
- Lazarsfeld, P. F. (1960). Latent structure analysis and test theory. In H. Gulliksen and S. Messick (Eds.), Psychological scaling: Theory and applications. New York: Wiley. Google Scholar
- Lazarsfeld, P. F. (1950). The logical and mathematical foundations of latent struture analysis. In S. A. Stouffer, et al (Eds.), Measurement and prediction. Princeton: Princeton University Press. Google Scholar
- Lentz, T. F., Hirshstein, B., & Finch, F. H. (1932). Evaluation of methods of evaluating test items. Journal of Educational Psychology, 23, 344–350. ArticleGoogle Scholar
- Lincoln, E. A. (1932). The unreliability of reliability coefficients. Journal of Educational Psychology, 23, 11–14. ArticleGoogle Scholar
- Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, No. 7 Google Scholar
- Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley. Google Scholar
- Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2, 99–120. ArticleGoogle Scholar
- Marks, R. (1977). Providing for individual differences: A history of the intelligence testing movement in North America. Interchange, 7, 3–16. ArticleGoogle Scholar
- McCall, W. A. (1922). How to Measure in Education. New York: Macmillan. Google Scholar
- Meitzen, A. (1891). History, theory, and technique of statistics. fnnals of the American Academy of Political and Social Science, 1, 1–237. Google Scholar
- Meyer, A. E. (1965). Educational history of the western world. New York: McGraw Hill. Google Scholar
- Monroe, W. S. (1923). Introduction to the theory of educational measurement. Boston: Houghton Mifflin. BookGoogle Scholar
- Monroe, W. S. (1945). Educational measurement in 1920 and 1945. Journal of Educational Research, 38, 334–340. Google Scholar
- Pearson, E. S. (Ed.) (1978). The history of statistics in the 17th and 18th centuries, against the changing background of intellectual, scientific and religious thought. Lectures by Karl Pearson. London: Charles Griffin. Google Scholar
- Peterson, J. (1925). Early conceptions and tests of intelligence. Yonkers-on-Hudson, NY: World. BookGoogle Scholar
- Quetelet, M.A. (1849). Letters on the theory of probabilities. London: Charles and Edwin Layton. Google Scholar
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Neilsen & Lydiche. Google Scholar
- Rice, J. M. Forum, 1897. Cited in W. H. Wilds & K. V. Lottich, (1970). Foundations of modern education. New York: Holt, Rinehart & Winston. Google Scholar
- Ruch, G. M. (1929). The objective or new-type examination, an introduction to educational measurement. Chicago: Scott, Foresman. Google Scholar
- Ruch, G. M., & deGraff, M. H. (1926). Corrections for chance and “guess” vs. “do not guess” instructions in multiple-response tests. Journal of Educational Psychology, 17, 368–375. ArticleGoogle Scholar
- Rugg, H. O. (1917). Statistical methods applied to education. Boston: Houghton Mifflin. Google Scholar
- Sadler, M. E. (1936). The scholarship system in England to 1890 and some of its developments. In Sadler, M. E., Abbott, A., Burts, C. L. Burns, C. D., Hartog, P., Spearman, C, and Stirk, S. D. Essays on examinations. London: MacMillan. Google Scholar
- Sharp, S. E. (1899). Individual psychology: A study in psychological method. American Journal of Psychology, 10, 329–391. ArticleGoogle Scholar
- Smallwood, M. L. (1935). An historical study of examinations and grading systems in early American universities. Cambridge: Harvard University Press (Harvard Studies in Education vol. 24). Google Scholar
- Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295. Google Scholar
- Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychology, 15, 201–292. ArticleGoogle Scholar
- Spring, J. H. (1972). Psychologists and the war: The meaning of intelligence and the Alpha and Beta tests. History of Education Quarterly, 12, 3–15. ArticleGoogle Scholar
- Strayer, G. D. (1913).Standards and tests for measuring the efficiency of schools or systems of schools. Bulletin, United States Bureau of Education. Whole No. 13: Report of the Committee of the National Council of Education. Google Scholar
- Sylvester, D. W. (1970). Educational documents 800-1816. London: Methuen Google Scholar
- Thompson, G. O. B., & Sharp, S. (1983). History of mental testing. In T. Husen & N. Postlethwaite (Eds.), International encyclopedia of education: Research and studies, Oxford: Pergamon Press. Google Scholar
- Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. New York: Science Press. BookGoogle Scholar
- Thorndike, E. L. (1913). Educational measurements of fifty years ago. Journal of Educational Psychology, 6, 551–552. Google Scholar
- Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16, 433–451. ArticleGoogle Scholar
- Thurstone, L. L. (1931). The reliability and validity of tests. Ann Arbor: Edwards. Google Scholar
- Thurstone, L. L. (1926). The scoring of individual performance. Journal of Educational Psychology, 17, 446–457. ArticleGoogle Scholar
- Thurstone, L. L. (1927). The unit of measurement in educational scales. Journal of Educational Psychology, 18, 505–524. ArticleGoogle Scholar
- Toulouse, E., & Pieron, H. (1904). Technique de psychologie experimentale. Paris: Doin. Google Scholar
- Tryon, R. C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249. ArticleGoogle Scholar
- Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1–13. ArticleGoogle Scholar
- Wilds, E. H., & Lottich, K. V. (1970). Foundations of modern education. New York: Holt, Rinehart & Winston. Google Scholar
- Wissler, C. (1901). The correlation of mental and physical tests. Psychological Review, Monograph Supplement Vol. 8, No. 16. Google Scholar
- Wright, B.D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3, 281–288. Google Scholar
- Yerkes, R. M. (Ed.) (1921). Psychological examining in the United States Army. Memoirs of the National Academy of Sciences, 15, 1—890. Google Scholar
- Yule, G.U. (1910). An introduction to the theory of statistics. London: Charles Griffin. Google Scholar
Author information
Authors and Affiliations
- Center for Student Testing, Evaluation and Standards, Graduate School of Education, University of California Los Angeles, Los Angeles, CA, 90024, USA David L. McArthur PhD
- David L. McArthur PhD