Abstract
We created a supertagger for the Spanish language aimed at disambiguating the HPSG lexical frames for the verbs in a sentence. The supertagger uses a CRF model and achieves an accuracy of 83.58 % for the verb classes on the test set. The tagset contains 92 verb classes, extracted from a Spanish HPSG-compatible annotated corpus that was created by automatically transforming the Ancora Spanish corpus. The verb tags include information about the arguments structure and syntactic categories of the arguments, so they can be easily translated into HPSG lexical entries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pollard, C., Sag, I.A.: Head-Driven Phrase Structure Grammar. University of Chicago Press/CSLI Publications, Chicago/Stanford (1994)
Joshi, A.K., Srinivas, B.: Disambiguation of super parts of speech (or supertags): almost parsing. In: Proceedings of the 15th Conference on Computational Linguistics, vol. 1, pp. 154–160. Association for Computational Linguistics (1994)
Curran, J.R., Clark, S., Vadas, D.: Multi-tagging for lexicalized-grammar parsing. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 697–704. Association for Computational Linguistics (2006)
Lewis, M., Steedman, M.: Improved CCG parsing with semi-supervised supertagging. Trans. Assoc. Comput. Linguist. 2, 327–338 (2014)
Dridan, R.: Using lexical statistics to improve HPSG parsing. Doctoral dissertation, University of Saarland (2009)
Zhang, Y.Z., Matsuzaki, T., Tsujii, J.I.: Forest-guided supertagger training. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1281–1289. Association for Computational Linguistics (2010)
Silva, J., Branco, A.: Assigning deep lexical types. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 240–247. Springer, Heidelberg (2012)
Marimon, M., Bel, N., Espeja, S., Seghezzi, N.: The spanish resource grammar: pre-processing strategy and lexical acquisition. In: Proceedings of the Workshop on Deep Linguistic Processing, pp. 105–111. Association for Computational Linguistics (2007)
Kolachina, P., Bangalore, S., Kolachina, S. Extracting LTAG grammars from a Spanish treebank. In: Proceedings of ICON-2011: 9th International Conference on Natural Language Processing. Macmillan Publishers, India (2011)
Taulé, M., Martí, M.A., Recasens, M.: Ancora: multilevel annotated corpora for catalan and Spanish. In: Proceedings of 6th International Conference on Language Resources and Evaluation, Marrakesh, Morocco (2008)
Chiruzzo, L., Wonsever, D.: Desarrollo de un parser HPSG Estadístico para el Español. In: Proceedings of I Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish, São Carlos, SP, Brazil (2014)
Copestake, A., Flickinger, D., Pollard, C., Sag, I.A.: Minimal recursion semantics: an introduction. Res. Lang. Comput. 3(2–3), 281–332 (2005)
Miyao, Y., Ninomiya, T., Tsujii, J.: Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the penn treebank. In: Su, K.-Y., Tsujii, J., Lee, J.-H., Kwong, O.Y. (eds.) IJCNLP 2004. LNCS (LNAI), vol. 3248, pp. 684–693. Springer, Heidelberg (2005)
Babko-Malaya, O.: PropBank annotation guidelines (2005)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Kudo, T.: CRF++: yet another CRF toolkit (2005). Software available at http://crfpp.sourceforge.net
Manning, C., Klein, D.: Stanford classifier. The Stanford Natural Language Processing Group (2003). Software available at http://nlp.stanford.edu/software/classifier.shtml
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chiruzzo, L., Wonsever, D. (2015). Supertagging for a Statistical HPSG Parser for Spanish. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)