Abstract
We are presenting a method for the evaluation of multilingual multi-document summarisation that allows saving precious annotation time and that makes the evaluation results across languages directly comparable. The approach is based on the manual selection of the most important sentences in a cluster of documents from a sentence-aligned parallel corpus, and by projecting the sentence selection to various target languages. We also present two ways of exploiting inter-annotator agreement levels, apply them both to a baseline sentence extraction summariser in seven languages, and discuss the result differences between the two evaluation versions, as well as a preliminary analysis between languages. The same method can in principle be used to evaluate single-document summarisers or information extraction tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Steinberger, R., Pouliquen, B., van der Goot, E.: An Introduction to the Europe Media Monitor Family of Applications. In: Information Access in a Multilingual World workshop at SIGIR, Boston, USA, pp. 1–8 (2009)
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. In: X Machine Translation Summit, Phuket, Thailand, pp. 79–86 (2005)
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D., Varga, D.: The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: LREC, Genova, Italy, pp. 24–26 (2006)
Steinberger, J., Ježek, K.: Update summarisation based on Latent Semantic Analysis. In: TSD, Pilsen, Czech Republic (2009)
Kanungo, T., Resnik, P.: The Bible, truth, and multilingual OCR evaluation. International Society for Optical Engineering, 86–96 (1999)
Koehn, P.: Europarl: A Multilingual Corpus for Evaluation of Machine Translation, unpublished draft (2002)
Van Zaanen, M., Roberts, A., Atwell, E.: A multilingual parallel parsed corpus as gold standard for grammatical inference evaluation. In: The Amazing Utility of Parallel and Comparable Corpora Workshop, pp. 58–61 (2004)
Radev, D., Allison, T., Blair-Goldensohn, S., Blitzer, J., Celebi, A., et al.: MEAD-a platform for multidocument multilingual text summarisation. In: LREC, Lisbon, Portugal, pp. 86–96 (2004)
Lin, C., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: HLT-NAACL, Edmonton, Canada, pp. 71–78 (2003)
Hovy, E., Lin, C., Zhou, L.: Evaluating duc 2005 using basic elements. In: DUC 2005 (2005)
Nenkova, A., Passonneau, R.: Evaluating content selection in summarisation: The pyramid method. In: NAACL, Boston, USA (2004)
Gale, W.A., Church, K.W.: A program for aligning sentences in bilingual corpora. Computational Linguistics 19, 75–102 (1994)
Piskorski, J.: CORLEONE-Core Linguistic Entity Online Extraction. Technical report EUR 23393 EN, European Commission (2008)
Gong, Y., Liu, X.: Generic text summarisation using relevance measure and latent semantic analysis. In: ACM SIGIR, New Orleans, US, pp. 19–25
Steinberger, J., Ježek, K.: Text summarisation and singular value decomposition. In: Yakhno, T. (ed.) ADVIS 2004. LNCS, vol. 3261, pp. 245–254. Springer, Heidelberg (2004)
Steinberger, J., Kabadjov, M., Pouliquen, B., Steinberger, R., Poesio, M.: WB-JRC-UT’s Participation in TAC 2009: Update summarisation and AESOP Tasks. In: TAC, NIST (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Turchi, M., Steinberger, J., Kabadjov, M., Steinberger, R. (2010). Using Parallel Corpora for Multilingual (Multi-document) Summarisation Evaluation. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2010. Lecture Notes in Computer Science, vol 6360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15998-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-15998-5_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15997-8
Online ISBN: 978-3-642-15998-5
eBook Packages: Computer ScienceComputer Science (R0)