Abstract
This paper proposes an approach for ontology enrichment for automatically labeling documents describing entities, with very specific concepts reflecting specific users’ needs. The peculiarity of this approach is that it addresses a triple challenge: (1) the concepts used for labeling have no direct terminology in the documents, (2) their formal definitions are not initially known, (3) the information useful to label the documents is not necessarily mentioned in them. To solve those problems, we propose to use an existing ontology of the domain of concern and to enrich it with the definitions of the concepts used for labeling. To construct these definitions, we work on a set of manually labeled documents, used as examples. The ontology is populated with information extracted from these documents, and with information coming from external resources (Linked Open Data). The definitions that we want to get can then be learned based on this populated ontology and on the set of labeled documents. Learned definitions are then added to the ontology (ontology enrichment). Hence, whenever new documents of the same domain have to be labeled, the ontology can be populated in the same way and definitions apply, allowing the new documents to be labeled. This approach, named Saupodoc, is a novel approach to ontology population and enrichment, exploiting the foundations of the Semantic Web by combining contributions of text analysis, linked open data extraction, machine learning and reasoning tools. An evaluation, on two application domains, provides quality results and demonstrates the interest of the approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alec, C., Reynaud-Delaître, C., & Safar, B. (2016). A model for linked open data acquisition and SPARQL query generation. In Graph-based Modeling of Conceptual Structures. 22nd International Conference on Conceptual Structures, ICCS (pp. 237–251). Annecy, France: Springer.
Bontcheva, K., Tablan, V., Maynard, D., & Cunningham, H. (2004). Evolving GATE to meet new challenges in language engineering. Natural Language Engineering, 10(3/4), 349–373.
Cheng, X., & Roth, D. (2013). Relational inference for wikification. Empirical Methods in Natural Language Processing (EMNLP) (pp. 1787–1796), Seattle, Washington, USA.
Chitsaz, M. (2013). Enriching ontologies through data. In Doctoral Consortium Co-located with International Semantic Web Conference (ISWC) (pp. 1–8), Sydney, Australia.
Cimiano, P. (2006). Ontology learning and population from text: Algorithms. Evaluation and applications. Secaucus, NJ, USA: Springer New York Inc.
Cimiano, P., & Völker, J. (2005). Text2Onto: A framework for ontology learning and data-driven change discovery. In Proceedings of the 10th International Conference on Natural Language Processing and Information Systems, NLDB (pp. 227–238). Alicante, Spain: Springer.
Cimiano, P., Völker, J., & Studer, R. (2006). Ontologies on demand?–A description of the state-of-the-art, applications, challenges and trends for ontology learning from text. Information, Wissenschaft und Praxis, 57(6–7), 315–320.
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V., Aswani, N., Roberts, I., Gorrell, G., Funk, A., Roberts, A., Damljanovic, D., Heitz, T., Greenwood, M. A., Saggion, H., Petrak, J., Li, Y., & Peters, W. (2011). Text Processing with GATE. ACM Digital Library.
Esposito, F., Fanizzi, N., Iannone, L., Palmisano, I., & Semeraro, G. (2004). Knowledge-intensive induction of terminologies from metadata. In Third International Semantic Web Conference (ISWC), Hiroshima, Japan, November 7–11 (pp. 441–455).
Fanizzi, N., d’Amato, C., & Esposito, F. (2008). DL-FOIL concept learning in description logics. 18th International Conference Inductive Logic Programming, (ILP) (pp. 107–121). Prague, Czech Republic.
Lehmann, J. (2009). DL-Learner: Learning concepts in description logics. Journal of Machine Learning Research, 10, 2639–2642.
Lehmann, J., Auer, S., Bühmann, L., & Tramp, S. (2011). Class expression learning for ontology engineering. Journal of Web Semantics, 9, 71–81.
Lehmann, J., & Hitzler, P. (2010). Concept learning in description logics using refinement operators. Machine Learning, 78(1–2), 203–250.
Ma, Y., & Distel, F. (2013a). Concept adjustment for description logics. 7th International Conference on Knowledge Capture, K-CAP’13 (pp. 65–72). Banff, Canada: ACM.
Ma, Y., & Distel, F. (2013b). Learning formal definitions for snomed CT from text. In Proceedings of Artificial Intelligence in Medicine (AIME) (pp. 73–77). Murcia, Spain: Springer.
Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011). DBpedia spotlight: Shedding light on the web of documents. 7th International Conference on Semantic Systems, I-Semantics’11 (pp. 1–8). NY, USA: ACM.
Petasis, G., Möller, R., & Karkaletsis, V. (2013). BOEMIE: Reasoning-based information extraction. 12th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR) (pp. 60–75), A Corunna, Spain.
Ratinov, L., Roth, D., Downey, D., & Anderson, M. (2011). Local and global algorithms for disambiguation to wikipedia. In 49th Annual Meeting of the Association for Computational Linguistics (ACL) (pp. 1375–1384).
Shearer, R., Motik, B., & Horrocks, I. (2008). HermiT: A highly-efficient OWL reasoner. In Fifth Workshop on OWL (OWLED), Co-located with the 7th International Semantic Web Conference, volume 432 of CEUR Workshop Proceedings.
Sirin, E., Parsia, B., Grau, B. C., Kalyanpur, A., & Katz, Y. (2007). Pellet: A practical OWL-DL reasoner. Journal of Web Semantics, 5(2), 51–53.
Tsarkov, D., & Horrocks, I. (2006). FaCT++ description logic reasoner: System description. In Third International Joint Conference Automated Reasoning (IJCAR) (pp. 292–297), Seattle, WA, USA.
Völker, J., Hitzler, P., & Cimiano, P. (2007). Acquisition of OWL DL axioms from lexical resources. In 4th European Semantic Web Conference (ESWC), pp. 670–685. Innsbruck, Austria: Springer.
Yelagina, N., & Panteleyev, M. (2014). Deriving of thematic facts from unstructured texts and background knowledge. 5th International Conference Knowledge Engineering and the Semantic Web (KESW) (pp. 208–218). Kazan, Russia: Springer.
Yosef, M. A., Hoffart, J., Bordino, I., Spaniol, M., & Weikum, G. (2011). AIDA: An online tool for accurate disambiguation of named entities in text and tables. In Proceedings of the 37th International Conference on Very Large Databases, (VLDB) (pp. 1450–1453).
Acknowledgements
We acknowledge the Wepingo startup, which has funded this work in the settings of the Poraso project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Alec, C., Reynaud-Delaître, C., Safara, B. (2018). A Combined Approach for Ontology Enrichment from Textual and Open Data. In: Pinaud, B., Guillet, F., Cremilleux, B., de Runz, C. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 732. Springer, Cham. https://doi.org/10.1007/978-3-319-65406-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-65406-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65405-8
Online ISBN: 978-3-319-65406-5
eBook Packages: EngineeringEngineering (R0)