Abstract
The ability to objectively quantify the complexity of a text can be a useful indicator of how likely learners of a given level will comprehend it. Before creating more complex models of assessing text difficulty, the basic building block of a text consists of words and, inherently, its overall difficulty is greatly influenced by the complexity of underlying words. One approach is to measure a word’s Age of Acquisition (AoA), an estimate of the average age at which a speaker of a language understands the semantics of a specific word. Age of Exposure (AoE) statistically models the process of word learning, and in turn an estimate of a given word’s AoA. In this paper, we expand on the model proposed by AoE by training regression models that learn and generalize AoA word lists across multiple languages including English, German, French, and Spanish. Our approach allows for the estimation of AoA scores for words that are not found in the original lists, up to the majority of the target language’s vocabulary. Our method can be uniformly applied across multiple languages though the usage of parallel corpora and helps bridge the gap in the size of AoA word lists available for non-English languages. This effort is particularly important for efforts toward extending AI to languages with fewer resources and benchmarked corpora.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Rayner, K., Duffy, S.A.: Lexical complexity and fixation times in reading: effects of word frequency, verb complexity, and lexical ambiguity. Memory Cogn. 14(3), 191–201 (1986)
Rosa, K.D., Eskenazi, M.: Effect of word complexity on L2 vocabulary learning. In: 6th Workshop on Innovative Use of NLP for Building Educational Applications, pp. 76–80. ACL, Portland, Oregon (2011)
Maddela, M., Xu, W.: A word-complexity lexicon and a neural readability ranking model for lexical simplification. arXiv preprint, arXiv:1810.05754 (2018)
Kuperman, V., Stadthagen-Gonzalez, H., Brysbaert, M.: Age-of-acquisition ratings for 30,000 English words. Behav. Res. Methods 44(4), 978–990 (2012)
Dascalu, M., McNamara, D.S., Crossley, S.A., Trausan-Matu, S.: Age of exposure: a model of word learning. In: 30th AAAI Conference on Artificial Intelligence, pp. 2928–2934. AAAI Press, Phoenix, AZ (2016)
Landauer, T.K., Kireyev, K., Panaccione, C.: Word maturity: a new metric for word knowledge. Sci. Stud. Reading 15(1), 92–108 (2011)
Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychol. Rev. 104(2), 211–240 (1997)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)
Esplà-Gomis, M., Forcada, M.L., Ramírez-Sánchez, G., Hoang, H.: ParaCrawl: Web-scale parallel corpora for the languages of the EU. In: Machine Translation Summit XVII Volume 2: Translator, Project and User Tracks, pp. 118–119. ACL, Dublin, Ireland (2019)
Flesch, R.: A new readability yardstick. J. Appl. Psychol. 32(3), 221–233 (1948)
Ferrand, L., Bonin, P., Méot, A., Augustinova, M., New, B., Pallier, C., Brysbaert, M.: Age-of-acquisition and subjective frequency estimates for all generally known monosyllabic French words and their relation with other psycholinguistic variables. Behavior Res. Methods 40(4), 1049–1054 (2008)
Alonso, M.A., Fernandez, A., Díez, E.: Subjective age-of-acquisition norms for 7,039 Spanish words. Behavior Res. Methods 47(1), 268–274 (2015)
Birchenough, J.M., Davies, R., Connelly, V.: Rated age-of-acquisition norms for over 3,200 German words. Behavior Res. Methods 49(2), 484–501 (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representation in vector space. In: Workshop at ICLR, Scottsdale, AZ (2013)
Altmann, A., Toloşi, L., Sander, O., Lengauer, T.: Permutation importance: a corrected feature importance measure. Bioinformatics 26(10), 1340–1347 (2010)
Yao, Z., Sun, Y., Ding, W., Rao, N., Xiong, H.: Dynamic word embeddings for evolving semantic discovery. In: 11th ACM International Conference on Web Search and Data Mining, pp. 673–681. ACM, Marina Del Rey, CA, USA (2018)
Di Carlo, V., Bianchi, F., Palmonari, M.: Training temporal word embeddings with a compass. In: AAAI Conference on Artificial Intelligence, vol. 33, pp. 6326–6334, Honolulu, Hawaii, USA (2019)
Arnon, I., McCauley, S.M., Christiansen, M.H.: Digging up the building blocks of language: age-of-acquisition effects for multiword phrases. J. Memory Lang. 92, 265–280 (2017)
Acknowledgements
This research was supported by a grant of the Romanian National Authority for Scientific Research and Innovation, CNCS – UEFISCDI, project number TE 70 PN-III-P1-1.1-TE-2019-2209, ATES – “Automated Text Evaluation and Simplification”, the Institute of Education Sciences (R305A180144 and R305A180261), and the Office of Naval Research (N00014-17-1-2300; N00014-20-1-2623). The opinions expressed are those of the authors and do not represent views of the IES or ONR.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Botarleanu, RM., Dascalu, M., Watanabe, M., McNamara, D.S., Crossley, S.A. (2021). Multilingual Age of Exposure. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12748. Springer, Cham. https://doi.org/10.1007/978-3-030-78292-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-78292-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78291-7
Online ISBN: 978-3-030-78292-4
eBook Packages: Computer ScienceComputer Science (R0)