Abstract
Knowledge Graphs are used in an increasing number of applications. Although considerable human effort has been invested into making knowledge graphs available in multiple languages, most knowledge graphs are in English. Additionally, regional facts are often only available in the language of the corresponding region. This lack of multilingual knowledge availability clearly limits the porting of machine learning models to different languages. In this paper, we aim to alleviate this drawback by proposing THOTH, an approach for translating and enriching knowledge graphs. THOTH extracts bilingual alignments between a source and target knowledge graph and learns how to translate from one to the other by relying on two different recurrent neural network models along with knowledge graph embeddings. We evaluated THOTH extrinsically by comparing the German DBpedia with the German translation of the English DBpedia on two tasks: fact checking and entity linking. In addition, we ran a manual intrinsic evaluation of the translation. Our results show that THOTH is a promising approach which achieves a translation accuracy of 88.56%. Moreover, its enrichment improves the quality of the German DBpedia significantly, as we report +18.4% accuracy for fact validation and +19% F\(_1\) for entity linking.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
a string or a value with a unit.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
We could not use RDF2Vec in our work as its code was incomplete.
- 11.
- 12.
The black squares represents how the model splits the frequent tokens in a sequence for a better translation process.
- 13.
More than one surface forms can be assigned to the entities.
- 14.
- 15.
We selected the subsets of mapping-based objects and labels to evaluate the quality of our approach since they are the most used ones for training Linked-Data NLP approaches.
- 16.
- 17.
We reduced our testset to the first subset of provided abstracts due to evaluation platform limits.
- 18.
References
Palmero Aprosio, A., Giuliano, C., Lavelli, A.: Towards an automatic creation of localized versions of DBpedia. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 494–509. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_31
Arcan, M., Buitelaar, P.: Ontology label translation. In: HLT-NAACL, pp. 40–46 (2013)
Arcan, M., Buitelaar, P.: Translating domain-specific expressions in knowledge bases with neural machine translation. arXiv preprint arXiv:1709.02184 (2017)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Böhning, D.: Multinomial logistic regression algorithm. Ann. Inst. Stat. Math. 1, 197–200 (1992)
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. ACM (2008)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: Advances in Neural Information Processing Systems, pp. 2787–2795 (2013)
Brümmer, M., Dojchinovski, M., Hellmann, S.: DBpedia abstracts: a large-scale, open, multilingual NLP training corpus. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), Paris, May 2016
Cao, Z., Wang, L., de Melo, G.: Link prediction via subgraph embedding-based convex matrix completion. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI 2018). AAAI Press (2018)
Chen, M., Tian, Y., Yang, M., Zaniolo, C.: Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 1511–1517. AAAI Press (2017)
Chen, M., Tian, Y., Yang, M., Zaniolo, C.: Multilingual Knowledge Graph Embeddings for Cross-lingual Knowledge Alignment. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI), pp. 1–10. AAAI Press (2017)
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Biased graph walks for RDF graph embeddings. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, p. 21. ACM (2017)
Cochez, M., Ristoski, P., Ponzetto, S.P., Paulheim, H.: Global RDF vector space embeddings. In: d’Amato, C., et al. (eds.) ISWC 2017. LNCS, vol. 10587, pp. 190–207. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68288-4_12
Edunov, S., Ott, M., Auli, M., Grangier, D.: Understanding back-translation at scale. arXiv preprint arXiv:1808.09381 (2018)
Feng, X., Tang, D., Qin, B., Liu, T.: English-Chinese knowledge base translation with neural network. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2935–2944 (2016)
Gerber, D., et al.: Defacto—temporal and multilingual deep fact validation. Web Semant. Sci. Serv. Agents World Wide Web 35, 85–101 (2015)
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, vol. 2, pp. 427–431 (2017)
Joulin, A., Grave, E., Bojanowski, P., Nickel, M., Mikolov, T.: Fast linear model for knowledge graph embeddings. arXiv preprint arXiv:1710.10881 (2017)
K M, A., Basu Roy Chowdhury, S., Dukkipati, A.: Learning beyond datasets: knowledge graph augmented neural networks for natural language processing. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 313–322. Association for Computational Linguistics (2018). http://aclweb.org/anthology/N18-1029
Kaffee, L.-A., et al.: Mind the (language) gap: generation of multilingual Wikipedia summaries from Wikidata for ArticlePlaceholders. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 319–334. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_21
Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. In: EMNLP, vol. 3, p. 413 (2013)
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.M.: OpenNMT: Open-Source Toolkit for Neural Machine Translation. ArXiv e-prints (2017)
Klein, G., Kim, Y., Deng, Y., Senellart, J., Rush, A.: OpenNMT: open-source toolkit for neural machine translation. In: Proceedings of ACL 2017, System Demonstrations, pp. 67–72 (2017)
Lakshen, G.A., Janev, V., Vraneš, S.: Challenges in quality assessment of Arabic DBpedia. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, p. 15. ACM (2018)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1412–1421. Association for Computational Linguistics (2015). https://doi.org/10.18653/v1/D15-1166. http://aclweb.org/anthology/D15-1166
McCrae, J.P., Arcan, M., Asooja, K., Gracia, J., Buitelaar, P., Cimiano, P.: Domain adaptation for ontology localization. Web Semant. Sci. Serv. Agents World Wide Web 36, 23–31 (2016)
Moussallem, D., Arčan, M., Ngomo, A.C.N., Buitelaar, P.: Augmenting neural machine translation with knowledge graphs. arXiv preprint arXiv:1902.08816 (2019)
Moussallem, D., Usbeck, R., Röeder, M., Ngomo, A.C.N.: MAG: a multilingual, knowledge-base agnostic and deterministic entity linking approach. In: Proceedings of the Knowledge Capture Conference, p. 9. ACM (2017)
Moussallem, D., Wauer, M., Ngomo, A.C.N.: Machine translation using semantic web technologies: a survey. J. Web Semant. 51, 1–19 (2018)
Nickel, M., Rosasco, L., Poggio, T.A., et al.: Holographic embeddings of knowledge graphs. In: AAAI, pp. 1955–1961 (2016)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Ristoski, P., Paulheim, H.: RDF2Vec: RDF graph embeddings for data mining. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 498–514. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_30
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725. Association for Computational Linguistics (2016)
Sorokin, D., Gurevych, I.: Modeling semantics with gated graph neural networks for knowledge base question answering. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3306–3317. Association for Computational Linguistics (2018). http://aclweb.org/anthology/C18-1280
Tang, G., Müller, M., Rios, A., Sennrich, R.: Why self-attention? A targeted evaluation of neural machine translation architectures. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4263–4272 (2018)
Usbeck, R., et al.: GERBIL: general entity annotator benchmarking framework. In: Proceedings of the 24th International Conference on World Wide Web, WWW 2015, Florence, Italy, 18–22 May 2015, pp. 1133–1143 (2015)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: a survey. Semant. Web 7(1), 63–93 (2016)
Acknowledgments
This work has been supported by the German Federal Ministry of Transport and Digital Infrastructure (BMVI) in the projects LIMBO (no. 19F2029I) and OPAL (no. 19F2028A) as well as by the Brazilian National Council for Scientific and Technological Development (CNPq) (no. 206971/2014-1).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Moussallem, D., Soru, T., Ngonga Ngomo, AC. (2019). THOTH: Neural Translation and Enrichment of Knowledge Graphs. In: Ghidini, C., et al. The Semantic Web – ISWC 2019. ISWC 2019. Lecture Notes in Computer Science(), vol 11778. Springer, Cham. https://doi.org/10.1007/978-3-030-30793-6_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-30793-6_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30792-9
Online ISBN: 978-3-030-30793-6
eBook Packages: Computer ScienceComputer Science (R0)