Educational Assessment: A Brief History

Educational assessment in the Western world has a long but very irregular history. Two distinct threads are woven together: the first is the variety of settings in which testing itself came to have practical use while the second is the incorporation of increasingly rigorous methods by which to make sense out of the results of that testing. This chapter sets out some of the key developments in each of these two areas, from their origins until the dawn of contemporary psychometrics. For extended periods of time even the simplest improvements in either testing or statistics fought long and hard against tradition and inertia. It took many generations for the two threads to finally merge into a full-fledged science of educational measurement.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic €32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

eBook EUR 42.79 Price includes VAT (France)

Softcover Book EUR 52.74 Price includes VAT (France)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Preview

What Counts as Evidence: A Review of Validity Studies in Educational and Psychological Measurement

Test Standards and Psychometric Modeling

Measurement in Education: A Primer on Designing Assessments

References

Adams, H. F. (1936). Validity, reliability and objectivity. In W.R. Miles (Ed.), Psychological studies of human variability. Psychological Monographs, 57, 329–350. Google Scholar
Barthelmess, H. M. (1931). The validity of intelligence test elements. New York: Teachers College. Google Scholar
Binet, A. (1898). La mesure en psychologie individuelle. Revue Philosophique, 46, 113–123. Google Scholar
Binet, A., & Simon, T. (1905). Methodes nouvelles pour le diagnostic scientifique des etats inferieurs de l’intelligence. L’Annee Psychologique, 11, 163–190. ArticleGoogle Scholar
Binet, A., & Simon, T. (1910). Sur la necessite d’une methode applicable au diagnostic des arrierees militaires. Annales Medico-psychologique. Google Scholar
Birnbaum, A. (1957). An efficient design and use of tests of a mental ability for various decision making problems. Series Report No. 58-16, USAF School of Aviation Medicine, Randolph, TX. Google Scholar
Birnbaum, A. (1958). On the estimation of mental ability. Series Report No.15, USAF School of Aviation Medicine, Randolph, TX. Google Scholar
Bower, J. (1975). A history of western education. Civilization of Europe sixth to sixteenth century, vol. 2. New York: St. Martin’s Press. Google Scholar
Bright, O. T. (1895). Changes — wise and unwise — in grammar and high schools. Journal of Proceeding and Addresses, St. Paul: National Education Association. Google Scholar
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296–322. Google Scholar
Brown, W., & Thompson, G. H. (1940). The essentials of mental measurement, Cambridge, MA: Cambridge University Press. Google Scholar
Brownless, V. T., & Keats, J. A. (1958). A retest method of studying partial knowledge and other factors influencing item response. Psychometrika, 23, 67–73. ArticleGoogle Scholar
Burt, C. L. (1909). Experimental tests of general intelligence. British Journal of Psychology, 3, 94–177. Google Scholar
Burt, C. L. (1936). The use of psychological tests in England. In Sadler, M. E., Abbott, A., Burts, C. L., Burns, C. D., Hartog, P., Spearman, C., and Stirk, S. D. Essays on examinations. London: Macmillan. Google Scholar
Campbell, N.R. (1920). Physics, the elements. Cambridge: Cambridge University Press. Google Scholar
Campbell, N.R. (1921). What is science? London: Methuen. Google Scholar
Cattell, J. M. (1890). Mental tests and measurements. Mind, 15, 373–381. ArticleGoogle Scholar
Cattell, R. B. (1964). Validity and reliability: A proposed more basic set of concepts. Journal of Educational Psychology, 55, 1–22. ArticleGoogle Scholar
Clarke, A. D. B., and Clarke, A. M. (1985). Mental testing: origins, evolution, and present status. History of Education, 14, 263–272. ArticleGoogle Scholar
Cochran, W. G. (1976). Early development of techniques in experimentation. In D. B. Owen (Ed.), On the history of statistics and probability. New York: Dekker. Google Scholar
Cremin, L. (1961). The transformation of the school. New York: Knopf. Google Scholar
Cronbach, L. J. (1947). Test “reliability”: Its meaning and determination. Psychometrika, 12, 1–16. ArticleGoogle Scholar
Cronbach, L. J. (1975). Five decades of public controversy over mental testing. American Psychologist, 30, 1–14. ArticleGoogle Scholar
Cullen, M. J. (1975). The statistical movement in early Victorian Britain: The foundations of empirical social research. New York: Barnes & Noble. Google Scholar
DuBois, P. H. (1964). A test-dominated society: China, 1115 B.C.-1905 A.D. ETS Invitational conference on testing problems. Princeton: Educational Testing Service. Google Scholar
DuBois, P. H. (1970). A history of psychological testing. Boston: Allyn and Bacon. Google Scholar
Edgeworth, F. Y. (1890). The element of chance in competitive examinations. Journal of the Royal Statistical Society. 53, 460–475, 644-673. Google Scholar
Englehart, M. D. (1950). Examinations. In W. S. Monroe (Ed.), Encyclopedia of educational research. New York: MacMillan. Google Scholar
Ferguson, G. A. (1942). Item selection by the constant process. Psychometrika, 7 19–29. ArticleGoogle Scholar
Fisher, R. A. (1956). Statistical methods and scientific inference. New York: Hafner. Google Scholar
Fisher, A. (1915). The mathematical theory of probabilities and its application to frequency curves and statistical methods. New York: Macmillan. Google Scholar
Freeman, F. N. (1926). Mental tests: Their history, principles and applications. Boston: Houghton Mifflin. BookGoogle Scholar
Goodenough, F. L. (1936). A critical note on the use of the term ‘reliability’ in mental measurement. Journal of Educational Psychology, 27, 173–178. ArticleGoogle Scholar
Graves, F. P. (1950). A history of education in modern times. New York: MacMillan. Google Scholar
Guilford, J.P. (1936). Psychometric methods. New York: McGraw-Hill. Google Scholar
Gulliksen, H. (1961). Measurement of learning and mental abilities. Psychometrika, 26, 93–107. ArticleGoogle Scholar
Gulliksen, H. (1950). Theory of mental tests. New York: Wiley. BookGoogle Scholar
Guttman, L. (1944). A basis for scaling qualitative data. American Sociological Review, 9, 139–150. ArticleGoogle Scholar
Hambleton, R. K., & Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 75–96. ArticleGoogle Scholar
Horst, A. P. (1936). Item selection by means of maximizing function. Psychometrika, 1, 229–244. ArticleGoogle Scholar
Keats, J. A., & Lord, F. M. (1962). A theoretical distribution for mental test scores. Psychometrika, 27, 59–72. ArticleGoogle Scholar
Kelley, T. L. (1927). Interpretation of educational measurements. Yonkers-on-Hudson, NY: World. Google Scholar
Kelley, T. L., & Krey, A. C. (1934). Tests and measurements in the social sciences. Report of the Commission on the Social Studies, American Historical Association, Part IV. New York: Charles Scribner’s Sons. Google Scholar
Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160. ArticleGoogle Scholar
Latham, H. (1877). On the action of examinations considered as a means of selection. Cambridge: Deighton Bell. Google Scholar
Lawley, D. N. (1943). On problems connected with item selection and test construction. Proceedings of the Royal Society of Edinburgh, 61, Section A, 273–287. Google Scholar
Lazarsfeld, P. F. (1960). Latent structure analysis and test theory. In H. Gulliksen and S. Messick (Eds.), Psychological scaling: Theory and applications. New York: Wiley. Google Scholar
Lazarsfeld, P. F. (1950). The logical and mathematical foundations of latent struture analysis. In S. A. Stouffer, et al (Eds.), Measurement and prediction. Princeton: Princeton University Press. Google Scholar
Lentz, T. F., Hirshstein, B., & Finch, F. H. (1932). Evaluation of methods of evaluating test items. Journal of Educational Psychology, 23, 344–350. ArticleGoogle Scholar
Lincoln, E. A. (1932). The unreliability of reliability coefficients. Journal of Educational Psychology, 23, 11–14. ArticleGoogle Scholar
Lord, F. M. (1952). A theory of test scores. Psychometric Monographs, No. 7 Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley. Google Scholar
Macready, G. B., & Dayton, C. M. (1977). The use of probabilistic models in the assessment of mastery. Journal of Educational Statistics, 2, 99–120. ArticleGoogle Scholar
Marks, R. (1977). Providing for individual differences: A history of the intelligence testing movement in North America. Interchange, 7, 3–16. ArticleGoogle Scholar
McCall, W. A. (1922). How to Measure in Education. New York: Macmillan. Google Scholar
Meitzen, A. (1891). History, theory, and technique of statistics. fnnals of the American Academy of Political and Social Science, 1, 1–237. Google Scholar
Meyer, A. E. (1965). Educational history of the western world. New York: McGraw Hill. Google Scholar
Monroe, W. S. (1923). Introduction to the theory of educational measurement. Boston: Houghton Mifflin. BookGoogle Scholar
Monroe, W. S. (1945). Educational measurement in 1920 and 1945. Journal of Educational Research, 38, 334–340. Google Scholar
Pearson, E. S. (Ed.) (1978). The history of statistics in the 17th and 18th centuries, against the changing background of intellectual, scientific and religious thought. Lectures by Karl Pearson. London: Charles Griffin. Google Scholar
Peterson, J. (1925). Early conceptions and tests of intelligence. Yonkers-on-Hudson, NY: World. BookGoogle Scholar
Quetelet, M.A. (1849). Letters on the theory of probabilities. London: Charles and Edwin Layton. Google Scholar
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Neilsen & Lydiche. Google Scholar
Rice, J. M. Forum, 1897. Cited in W. H. Wilds & K. V. Lottich, (1970). Foundations of modern education. New York: Holt, Rinehart & Winston. Google Scholar
Ruch, G. M. (1929). The objective or new-type examination, an introduction to educational measurement. Chicago: Scott, Foresman. Google Scholar
Ruch, G. M., & deGraff, M. H. (1926). Corrections for chance and “guess” vs. “do not guess” instructions in multiple-response tests. Journal of Educational Psychology, 17, 368–375. ArticleGoogle Scholar
Rugg, H. O. (1917). Statistical methods applied to education. Boston: Houghton Mifflin. Google Scholar
Sadler, M. E. (1936). The scholarship system in England to 1890 and some of its developments. In Sadler, M. E., Abbott, A., Burts, C. L. Burns, C. D., Hartog, P., Spearman, C, and Stirk, S. D. Essays on examinations. London: MacMillan. Google Scholar
Sharp, S. E. (1899). Individual psychology: A study in psychological method. American Journal of Psychology, 10, 329–391. ArticleGoogle Scholar
Smallwood, M. L. (1935). An historical study of examinations and grading systems in early American universities. Cambridge: Harvard University Press (Harvard Studies in Education vol. 24). Google Scholar
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271–295. Google Scholar
Spearman, C. (1904). General intelligence objectively determined and measured. American Journal of Psychology, 15, 201–292. ArticleGoogle Scholar
Spring, J. H. (1972). Psychologists and the war: The meaning of intelligence and the Alpha and Beta tests. History of Education Quarterly, 12, 3–15. ArticleGoogle Scholar
Strayer, G. D. (1913).Standards and tests for measuring the efficiency of schools or systems of schools. Bulletin, United States Bureau of Education. Whole No. 13: Report of the Committee of the National Council of Education. Google Scholar
Sylvester, D. W. (1970). Educational documents 800-1816. London: Methuen Google Scholar
Thompson, G. O. B., & Sharp, S. (1983). History of mental testing. In T. Husen & N. Postlethwaite (Eds.), International encyclopedia of education: Research and studies, Oxford: Pergamon Press. Google Scholar
Thorndike, E. L. (1904). An introduction to the theory of mental and social measurements. New York: Science Press. BookGoogle Scholar
Thorndike, E. L. (1913). Educational measurements of fifty years ago. Journal of Educational Psychology, 6, 551–552. Google Scholar
Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of Educational Psychology, 16, 433–451. ArticleGoogle Scholar
Thurstone, L. L. (1931). The reliability and validity of tests. Ann Arbor: Edwards. Google Scholar
Thurstone, L. L. (1926). The scoring of individual performance. Journal of Educational Psychology, 17, 446–457. ArticleGoogle Scholar
Thurstone, L. L. (1927). The unit of measurement in educational scales. Journal of Educational Psychology, 18, 505–524. ArticleGoogle Scholar
Toulouse, E., & Pieron, H. (1904). Technique de psychologie experimentale. Paris: Doin. Google Scholar
Tryon, R. C. (1957). Reliability and behavior domain validity: Reformulation and historical critique. Psychological Bulletin, 54, 229–249. ArticleGoogle Scholar
Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1–13. ArticleGoogle Scholar
Wilds, E. H., & Lottich, K. V. (1970). Foundations of modern education. New York: Holt, Rinehart & Winston. Google Scholar
Wissler, C. (1901). The correlation of mental and physical tests. Psychological Review, Monograph Supplement Vol. 8, No. 16. Google Scholar
Wright, B.D. (1984). Despair and hope for educational measurement. Contemporary Education Review, 3, 281–288. Google Scholar
Yerkes, R. M. (Ed.) (1921). Psychological examining in the United States Army. Memoirs of the National Academy of Sciences, 15, 1—890. Google Scholar
Yule, G.U. (1910). An introduction to the theory of statistics. London: Charles Griffin. Google Scholar

Author information

Authors and Affiliations

Center for Student Testing, Evaluation and Standards, Graduate School of Education, University of California Los Angeles, Los Angeles, CA, 90024, USA David L. McArthur PhD

David L. McArthur PhD