Abstract
Language models (LMs) trained or large text corpora have demonstrated their superior performance in different language related tasks in the last years. These models automatically implicitly incorporate factual knowledge that can be used to complement existing Knowledge Graphs (KGs) that in most cases are structured from human curated databases. Here we report an experiment that attempts to gain insights about the extent to which LMs can generate factual information as that present in KGs. Concretely, we have tested such process using the English Wikipedia subset of YAGO and the GPT-J model for attributes related to individuals. Results show that the generation of correct factual information depends on the generation parameters of the model and are unevenly balanced across diverse individuals. Further, the LM can be used to populate further factual information, but it requires intermediate parsing to correctly map to KG attributes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, O., Ge, H., Shakeri, S., Al-Rfou, R.: Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training (2020). arXiv preprint arXiv:2010.12688
Callahan, E.S., Herring, S.C.: Cultural bias in Wikipedia content on famous persons. J. Am. Soc. Inf. Sci. Technol. 62(10), 1899–1915 (2011)
Gao, L., et al.: The pile: An 800GB dataset of diverse text for language modeling (2020). arXiv preprint arXiv:2101.00027
Hao, S., Tan, B., Tang, K., Zhang, H., Xing, E.P., Hu, Z.: BertNet: Harvesting Knowledge Graphs from Pretrained Language Models (2022). arXiv preprint arXiv:2206.14268
Huaman, E., Fensel, D.: Knowledge graph curation: a practical frame-work. In: The 10th International Joint Conference on Knowledge Graphs, pp. 166–171 (2021)
Jiang, Z., Anastasopoulos, A., Araki, J., Ding, H., Neubig, G.: X-FACTR: Multilingual factual knowledge retrieval from pretrained language models (2020). arXiv preprint arXiv:2010.06189
Kalo, J.-C., Fichtel, L., Ehler, P., Balke, W.-T.: KnowlyBERT - hybrid query answering over language models and knowledge graphs. In: Pan, J.Z., et al. (eds.) The Semantic Web – ISWC 2020. LNCS, vol. 12506, pp. 294–310. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_17
Logan IV, R.L., Liu, N.F., Peters, M.E., Gardner, M., Singh, S.: Barack’s wife hillary: Using knowledge-graphs for fact-aware language modeling (2019). arXiv preprint arXiv:1906.07241
Min, B., et al.: Recent advances in natural language processing via large pre-trained language models: a survey (2021). arXiv preprint arXiv:2111.01243
Mora-Cantallops, M., Sánchez-Alonso, S., García-Barriocanal, E.: A systematic literature review on Wikidata. Data Technol. Appl. 53(3), 250–268 (2019)
Omeliyanenko, J., Zehe, A., Hettinger, L., Hotho, A.: LM4KG: improving common sense knowledge graphs with language models. In: Pan, J.Z., et al. (eds.) The Semantic Web – ISWC 2020. LNCS, vol. 12506, pp. 456–473. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62419-4_26
Tanon, T.P., Weikum, G., Suchanek, F.: YAGO 4: a reasonable knowledge base. In: Harth, A., et al. (eds.) The Semantic Web. LNCS, vol. 12123, pp. 583–596. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_34
Petroni, F., et al.: Language models as knowledge bases? (2019). arXiv preprint arXiv:1909.01066
Tripodi, F.: Ms. Categorized: Gender, notability, and inequality on Wikipedia. New Media Soc. 14614448211023772 (2021)
Wang, C., Liu, X., Song, D.: Language models are open knowledge graphs (2020). arXiv preprint arXiv:2010.11967
Yasunaga, M., Ren, H., Bosselut, A., Liang, P., Leskovec, J.: QA-GNN: reasoning with language models and knowledge graphs for question answering (2021). arXiv preprint arXiv:2104.06378
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Martin-Moncunill, D., Sicilia, MA., González, L., Rodríguez, D. (2022). On Contrasting YAGO with GPT-J: An Experiment for Person-Related Attributes. In: Villazón-Terrazas, B., Ortiz-Rodriguez, F., Tiwari, S., Sicilia, MA., Martín-Moncunill, D. (eds) Knowledge Graphs and Semantic Web . KGSWC 2022. Communications in Computer and Information Science, vol 1686. Springer, Cham. https://doi.org/10.1007/978-3-031-21422-6_17
Download citation
DOI: https://doi.org/10.1007/978-3-031-21422-6_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21421-9
Online ISBN: 978-3-031-21422-6
eBook Packages: Computer ScienceComputer Science (R0)