Abstract
This paper describes explorations in word sense disambiguation using Wikipedia as a source of sense annotations. Through experiments on four different languages, we show that the Wikipedia-based sense annotations are reliable and can be used to construct accurate sense classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
The average length of a paragraph is 80 words.
- 3.
- 4.
Note that this baseline assumes the availability of a sense tagged corpus in order to determine the most frequent sense of a word. The baseline is therefore “informed,” as compared to a random, “uninformed” sense selection.
- 5.
- 6.
References
Agirre E, de Lacalle OL (2009) Supervised domain adaption for WSD. In: Proceedings of the 12th conference of the European chapter of the association for computational linguistics, association for computational linguistics, EACL ’09, Stroudsburg, PA, USA, pp 42–50
Agirre E, Martinez D (2004) Unsupervised word sense disambiguation based on automatically retrieved examples: the importance of bias. In: Proceedings of EMNLP 2004, Barcelona, Spain
Agirre E, De Lacalle OL, Soroa A (2009) Knowledge-based WSD on specific domains: performing better than generic supervised WSD. In: Proceedings of the 21st international joint conference on artifical intelligence, IJCAI’09. Morgan Kaufmann, San Francisco, pp 1501–1506
Ahn D, Jijkoun V, Mishne G, Muller K, de Rijke M, Schlobach S (2004) Using Wikipedia at the TREC QA track. In: Proceedings of the 13th text retrieval conference (TREC 2004), Gaithersburg, MD
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia – a crystallization point for the Web of data. Web Semant 7:154–165
Bryl V, Giuliano C, Serafini L, Tymoshenko K (2010) Using background knowledge to support coreference resolution. In: Proceedings of the 2010 conference on ECAI 2010: 19th European conference on artificial intelligence, Amsterdam, The Netherlands, pp 759–764
Bunescu R, Pasca M (2006) Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of the European conference of the association for computational linguistics, Trento, Italy
Chklovski T, Mihalcea R (2002) Building a sense tagged corpus with open mind word expert. In: Proceedings of the ACL 2002 workshop on word sense disambiguation: recent successes and future directions, Philadelphia
Cimiano P, Schultz A, Sizov S, Sorg P, Staab S (2009) Explicit versus latent concept models for cross-language information retrieval. In: International joint conference on artificial intelligence, IJCAI-09, Pasadena, CA, pp 1513–1518
Cucerzan S (2007) Large-scale named entity disambiguation based on Wikipedia data. In: Proceedings of the conference on empirical methods in natural language processing, Prague, Czech Republic, pp 708–716
Diab M (2004) Relieving the data acquisition bottleneck in word sense disambiguation. In: Proceedings of the 42nd meeting of the association for computational linguistics (ACL 2004), Barcelona, Spain
Diab M, Resnik P (2002) An unsupervised method for word sense tagging using parallel corpora. In: Proceedings of the 40st annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA
Ferrucci DA, Brown EW, Chu-Carroll J, Fan J, Gondek D, Kalyanpur A, Lally A, Murdock JW, Nyberg E, Prager JM, Schlaefer N, Welty CA (2010) Building Watson: an overview of the DeepQA project. AI Mag 31(3):59–79
Gabrilovich E, Markovitch S (2006) Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the national conference on artificial intelligence (AAAI), Boston
Gabrilovich E, Markovitch S (2007) Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the international joint conference on artificial intelligence, Hyderabad, pp 1606–1611
Galley M, McKeown K (2003) Improving word sense disambiguation in lexical chaining. In: Proceedings of the 18th international joint conference on artificial intelligence (IJCAI 2003), Acapulco, Mexico
Hachey B, Radford W, Nothman J, Honnibal M, Curran JR (2013) Evaluating entity linking with Wikipedia. Artif Intell 194:130–150
Haghighi A, Klein D (2009) Simple coreference resolution with rich syntactic and semantic features. In: Proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 1152–1161
Henrich V, Hinrichs EW, Vodolazova T (2011) Semi-automatic extension of GermaNet with sense definitions from Wiktionary. In: Proceedings of the 5th language and technology conference: human language technologies as a challenge for computer science and linguistics, Poznań, Poland pp 126–130
Henrich V, Hinrichs EW, Vodolazova T (2012) An automatic method for creating a sense-annotated corpus harvested from the Web. In: 13th international conference on intelligent text processing and computational linguistics, CICLing-2012, New Delhi, India
Henrich V, Hinrichs EW, Vodolazova T (2012) Webcage – a Web-harvested corpus annotated with GermaNet senses. In: 13th conference of the European chapter of the association for computational linguistics, EACL ’12, Avignon, France, pp 387–396
Kaisser M (2008) The QuALiM question answering demo: supplementing answers with paragraphs drawn from Wikipedia. In: Proceedings of the ACL-08 human language technology demo session, Columbus, Ohio, pp 32–35
Kunze C, Lemnitzer L (2002) GermaNet – Representation, visualization, application. In: 3rd international conference on language resources and evaluation, LREC’02, Las Palmas, Spain, pp 1485–1491
Leacock C, Chodorow M, Miller G (1998) Using corpus statistics and WordNet relations for sense identification. Comput Linguist 24(1):147–165
Lee Y, Ng H (2002) An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation. In: Proceedings of the 2002 conference on empirical methods in natural language processing (EMNLP 2002), Philadelphia
Lesk M (1986) Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the SIGDOC conference 1986, Toronto
Li Y, Luk R, Ho E, Chung K (2007) Improving weak ad-hoc queries using Wikipedia as external corpus. In: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, Amsterdam, Netherlands, pp 797–798
Medelyan O, Milne D, Legg C, Witten IH (2009) Mining meaning from Wikipedia. Inter J Human Comput Stud 67(9):716–754
de Melo G, Weikum G (2010) Menta: inducing multilingual taxonomies from Wikipedia. In: Proceedings of the 19th ACM international conference on information and knowledge management, CIKM ’10. ACM, New York, pp 1099–1108
Meyer CM, Gurevych I (2011) What psycholinguists know about chemistry: aligning Wiktionary and WordNet for increased domain coverage. In: Proceedings of the 5th international joint conference on natural language processing (IJCNLP), pp 883–892
Mihalcea R (2002) Bootstrapping large sense tagged corpora. In: Proceedings of the third international conference on language resources and evaluation LREC 2002, Canary Islands, Spain, pp 1407–1411
Mihalcea R (2007) Using Wikipedia for automatic word sense disambiguation. In: Human language technologies 2007: the conference of the North American chapter of the association for computational linguistics, Rochester, New York
Mihalcea R, Csomai A (2007) Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the sixteenth ACM conference on information and knowledge management, Lisbon, Portugal
Mihalcea R, Moldovan D (1999) An automatic method for generating sense tagged corpora. In: Proceedings of AAAI-99, Orlando, FL, pp 461–466
Miller G (1995) Wordnet: A lexical database for English. Commun ACM 38(11):39–41
Milne D (2007) Computing semantic relatedness using Wikipedia link structure. In: Proceedings of the New Zealand computer science research student conference, Hamilton, New Zealand
Milne D, Witten I (2008) Learning to link with Wikipedia. In: Proceedings of the seventeenth ACM conference on information and knowledge management, Napa Valley, CA
Nastase V, Strube M, Boerschinger B, Zirn C, Elghafari A (2010) WikiNet: a very large scale multi-lingual concept network. In: 7th international conference on language resources and evaluation, LREC’10, Valletta
Navigli R, Ponzetto S (2010) BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th annual meeting of the association for computational linguistics, Uppsala, Sweden
Navigli R, Velardi P (2005) Structural semantic interconnections: a knowledge-based approach to word sense disambiguation. IEEE Trans Pattern Anal Mach Intell (PAMI) 27:1075–1086
Ng H, Lee H (1996) Integrating multiple knowledge sources to disambiguate word sense: an examplar-based approach. In: Proceedings of the 34th annual meeting of the association for computational linguistics (ACL 1996), Santa Cruz
Ng H, Wang B, Chan Y (2003) Exploiting parallel texts for word sense disambiguation: an empirical study. In: Proceedings of the 41st annual meeting of the association for computational linguistics (ACL 2003), Sapporo, Japan
Niemann E, Gurevych I (2011) The people’s Web meets linguistic knowledge: automatic sense alignment of Wikipedia and Wordnet. In: Proceedings of the ninth international conference on computational semantics, association for computational linguistics, IWCS ’11, Stroudsburg, PA, USA, pp 205–214
Pedersen T (2001) A decision tree of bigrams is an accurate predictor of word sense. In: Proceedings of the North American chapter of the association for computational linguistics (NAACL 2001), Pittsburgh, pp 79–86
Ponzetto SP, Navigli R (2009) Large-scale taxonomy mapping for restructuring and integrating Wikipedia. In: Proceedings of the 21th international joint conference on artificial intelligence, Pasadena, CA
Ponzetto SP, Navigli R (2010) Knowledge-rich word sense disambiguation rivaling supervised systems. In: Proceedings of the 48th annual meeting of the association for computational linguistics, association for computational linguistics, Stroudsburg, PA, USA, pp 1522–1531
Potthast M, Stein B, Anderka MA (2008) Wikipedia-based multilingual retrieval model. In: Proceedings of the 30th European conference on IR research, Glasgow, United Kingdom
Rahman A, Ng V (2011) Coreference resolution with world knowledge. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies – volume 1, association for computational linguistics, Stroudsburg, PA, USA, pp 814–824
Resnik P, Yarowsky D (1999) Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat Lang Eng 5(2):113–134
Strube M, Ponzetto SP (2006) Wikirelate! computing semantic relatedeness using Wikipedia. In: Proceedings of the American association for artificial intelligence, Boston, MA
Suchanek FM, Kasneci G, Weikum G (2007) Yago: A core of semantic knowledge. In: Proceedings of the 16th World Wide Web conference, Banff, Alberta, Canada
Wu F, Weld D (2007) Autonomously semantifying Wikipedia. In: Proceedings of the 16th ACM conference on information and knowledge management, Lisbon, Portugal
Wu F, Weld D (2008) Automatically refining the Wikipedia Infobox ontology. In: Proceedings of the 17th international World Wide Web conference, Beijing, China
Yarowsky D (1995) Unsupervised word sense disambiguation rivaling supervised methods. In: Proceedings of the 33rd annual meeting of the association for computational linguistics (ACL 1995), Cambridge, MA
Acknowledgements
This material is based in part upon work supported by the National Science Foundation IIS awards #1018613 and #1018590 and CAREER award #0747340. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dandala, B., Mihalcea, R., Bunescu, R. (2013). Word Sense Disambiguation Using Wikipedia. In: Gurevych, I., Kim, J. (eds) The People’s Web Meets NLP. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35085-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-35085-6_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35084-9
Online ISBN: 978-3-642-35085-6
eBook Packages: Computer ScienceComputer Science (R0)