Abstract
The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1): 37–66
Bangalore S, Joshi AK (1999) Supertagging: an approach to almost parsing. Comput Linguist 25(2): 237–265
Bangalore S, Haffner P, Kanthak S (2007) Statistical machine translation through global lexical selection and sentence reconstruction. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague, Czech Republic, pp 152–159
Berger AL, Della Pietra VJ, Della Pietra SA (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1): 39–71
Brown PF, Cocke J, Della Pietra SA, Della Pietra VJ, Jelinek F, Lafferty JDD, Mercer RL, Roossin PS (1990) A statistical approach to machine translation. Comput Linguist 16(2): 79–85
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1991) A statistical approach to sense disambiguation in machine translation. In: Proceedings of the workshop on speech and natural language, HLT 1991, Pacific Grove, CA, pp 146–151
Brown PF, Della Pietra SA, Della Pietra VJ, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19(2): 263–311
Brunning J, Gispert A, Byrne W (2009) Context-dependent alignment models for statistical machine translation. In: NAACL HLT 2009: proceedings of human language technologies: the 2009 annual conference of the North American chapter of the ACL, Boulder, CO, pp 110–118
Carpuat M, Wu D (2005) Word sense disambiguation vs. statistical machine translation. In: 43rd Annual meeting of the association for computational linguistics (ACL 2005), University of Michigan, Ann Arbor, MI, pp 387–394
Carpuat M, Wu D (2007) Improving statistical machine translation using word sense disambiguation. In: EMNLP-CoNLL-2007: proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 61–72
Carreras X, Márquez L (2004) Introduction to the CoNLL-2004 shared task: semantic role labeling. In: Proceedings of the CoNLL 2004 shared task, Boston, MA, pp 89–97
Chen J, Bangalore S, Vijay-Shanker K (2006) Automated extraction of tree-adjoining grammars from treebanks. Nat Lang Eng 12(3): 251–299
Chan YS, Ng HT, Chiang D (2007) Word sense disambiguation improves statistical machine translation. In: Proceedings of the 45th annual meeting of the association for computational linguistics (ACL 2007), Prague, Czech Republic, pp 33–40
Chiang D (2007) Hierarchical phrase-based translation. Comput Linguist 33(2): 202–228
Chiang D, Knight K, Wang W (2009) 11,001 new features for statistical machine translation. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics (HLT NAACL 2009), Boulder, CO, pp 218–226
Clark S, Curran JR (2004) The importance of supertagging for wide-coverage CCG parsing. In: Proceedings of the 20th international conference on computational linguistics (COLING 2004), Geneva, Switzerland, pp 282–288
Daelemans W, van den Bosch A (2005) Memory-based language processing. Cambridge University Press, Cambridge
Daelemans W, van den Bosch A, Weijters A (1997) IGTree: using trees for compression and classification in lazy learning algorithms. Artif Intell Rev 11: 407–423
Daelemans W, van den Bosch A, Zavrel J (1997b) A feature-relevance heuristic for indexing and compressing large case bases. In: Van Someren M, Widmer G (eds) Poster papers of the ninth European conference on machine learning, Prague, Czech Republic, pp 29–39
Doddington G (2002) Automatic evaluation of language translation using n-gram cooccurrence statistics. In: HLT 2002: human language technology conference: proceedings of the second international conference on human language technology research, San Diego, CA, pp 138–145
Foster G, Kuhn R, Johnson H (2006) Phrasetable smoothing for statistical machine translation. In: EMNLP-2006: proceedings of the 2006 conference on empirical methods in natural language processing, Sydney, Australiapages, pp 53–61
Galley M, Graehl J, Knight K, Marcu D, DeNeefe S, Wang W, Thayer I (2006) Scalable inference and training of context-rich syntatic translation models. In: Coling-ACL 2006: proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 961–968
García-Varea I, Och FJ, Ney H, Casacuberta F (2001) Refined lexicon models for statistical machine translation using a maximum entropy approach. In: 39th Annual meeting of the association for computational linguistics and 10th conference of the European chapter of the association for computational linguistics (ACL/EACL 2001), Toulouse, France, pp 204–211
García-Varea I, Och FJ, Ney H, Casacuberta F (2002) Improving alignment quality in statistical machine translation using context-dependent maximum entropy models. In: Proceedings of the 19th international conference on computational linguistics (Coling 2002), Taipei, Taiwan, pp 1051–1054
Giménez J, Màrquez L (2007) Context-aware discriminative phrase selection for statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, ACL 2007, Prague, Czech Republic, pp 159–166
Giménez J, Màrquez L (2009) Discriminative phrase selection for statistical machine translation. In: Goutte C, Cancedda N, Dymetman M, Foster G (eds) Learning machine translation. NIPS Workshop Series. MIT Press, Cambridge
Gimpel K, Smith NA (2008) Rich source-side context for statistical machine translation. In: Proceedings of the third workshop on statistical machine translation, ACL-08:HLT, Columbus, OH, pp 9–17
Gimpel K, Smith NA (2009) Feature-rich translation by quasi-syncronous lattice parsing. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 219–228
Haque R, Naskar SK, Ma Y, Way A (2009a) Using supertags as source language context in SMT. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 234–241
Haque R, Naskar SK, van den Bosch A, Way A (2009b) Dependency relations as source context in phrase-based SMT. In: Proceedings of PACLIC 23: the 23rd pacific asia conference on language, information and computation, Hong Kong, China, pp 170–179
Haque R, Naskar SK, van den Bosch A, Way A (2010) Supertags as source language context in hierarchical phrase-based SMT. In: Proceedings of AMTA 2010: the ninth conference of the association for machine translation in the Americas, Denver, CO, pp 210–219
Hasan S, Ganitkevitch J, Ney H, Andrés-Ferrer J (2008) Triplet lexicon models for statistical machine translation. In: EMNLP 2008: Proceedings of the 2008 conference on empirical methods in natural language processing, Honolulu, HI, pp 372–381
Hockenmaier J (2003) Data and models for statistical parsing with combinatory categorial grammar. PhD thesis, University of Edinburgh, UK
Ittycheriah A, Roukos S (2007) Direct translation model 2. In: NAACL-HLT-2007 human language technology: the conference of the North American chapter of the association for computational linguistics, Rochester, NY, pp 57–64
Johansson R, Nugues P (2008) Dependency-based syntactic-semantic analysis with PropBank and NomBank. In: Proceedings of the CoNLL-2008 shared task, Manchester, UK, pp 183–187
Koehn P (2004a) Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In: Frederking Robert E, Taylor Kathryn B (eds) Machine translation: from real users to research: 6th conference of the association for machine translation in the Americas, AMTA 2004, Washington, DC, pp 115–124
Koehn P (2004b) Statistical significance tests for machine translation evaluation. In: EMNLP-2004: Proceedings of the 2004 conference on empirical methods in natural language processing, Barcelona, Spain, pp 388–395
Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: MT summit X, the tenth machine translation summit, Phuket, Thailand, pp 79–86
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: HLT-NAACL 2003: conference combining human language technology conference series and the North American chapter of the association for computational linguistics conference series, Edmonton, AB, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the demo and poster sessions, ACL 2007, Prague, Czech Republic, pp 177–180
Lavie A, Agarwal A (2007) METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the second workshop on statistical machine translation, ACL 2007, Prague, Czech Republic, pp 228–231
Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: Coling-ACL 2006: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 761–768
Marton Y, Resnik P (2008) Soft syntactic constraints for hierarchical phrased-based translation. In: Proceedings of the 46th annual meeting of the association for computational linguistics: human language technologies (ACL-08: HLT), The Ohio State University, Columbus, OH, pp 1003–1011
Mauser A, Hasan S, Ney H (2009) Extending statistical machine translation with discriminative and trigger-based Lexicon models. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 210–218
Max A, Makhloufi R, Langlais P (2008) Explorations in using grammatical dependencies for contextual phrase translation disambiguation. In: EAMT 2008: 12th annual conference of the European association for machine translation, Hamburg, Germany, pp 114–119
Nivre J, Hall J, Nilsson J (2006) MaltParser: a data-driven parser generator for dependency parsing. In: LREC 2006: Proceedings of the fifth international conference on language resources and evaluation, Genoa, Italy, pp 2216–2219
Och FJ (2003) Minimum error rate training in statistical machine translation. In: 41st Annual meeting of the association for computational linguistics (ACL 2003), Sapporo, Japan, pp 160–167
Och FJ, Ney H (2000) A comparison of alignment models for statistical machine translation. In: Coling 2000: the 18th international conference on computational linguistics, Saarbrücken, Germany, pp 1086–1090
Och FJ, Ney H (2002) Discriminative training and maximum entropy models for statistical machine translation. In: 40th Annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, pp 295–302
Okita S, Jiang J, Haque R, Al-Maghout H, Du J, Naskar SK, Way A (2010) MaTrEx: the DCU MT system for NTCIR-8. In: Proceedings of NTCIR-8, Tokyo, Japan, pp 377–383
Papineni K, Roukos S, Zhu W (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th Annual meeting of the association for computational linguistics (ACL 2002), Philadelphia, PA, pp 311–318
Patry A, Langlais P (2009) Prediction of words in statistical machine translation using a multilayer perceptron. In: MT Summit XII: proceedings of the twelfth machine translation Summit, Ottawa, ON, Canada, pp 101–111
Penkale S, Haque R, Dandapat S, Banerjee P, Srivastava AK, Du J, Pecina P, Naskar SK, Forcada ML, Way A (2010) MATREX: the DCU MT system for WMT 2010. In: Proceedings of the joint fifth workshop on statistical machine translation and metrics MATR (WMT-MetricsMATR 2010), ACL 2010, Uppsala, Sweden, pp 143–148
Quirk C, Menezes A, Cherry C (2005) Dependency treelet translation: syntactically informed phrasal SMT. In: ACL-2005: 43rd annual meeting of the association for computational linguistics, Ann Arbor, MI, pp 271–279
Shen L, Zhang B, Matsoukas S, Weischedel R (2009) Effective use of linguistic and contextual information for statistical machine translation. In: EMNLP-2009: proceedings of the 2009 conference on empirical methods in natural language processing, Singapore, pp 72–80
Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTA 2006: Proceedings of the 7th Conference of the association for machine translation in the Americas, Cambridge, MA, pp 223–231
Specia L, Sankaran B, Nunes MGV (2008) n-Best reranking for the efficient integration of word sense disambiguation and statistical machine translation. In: Proceedings of international conference on intelligent text processing and computational linguistics (CICLING 2008), Haifa, Israel, pp 399–410
Steedman M (2000) The syntactic process. MIT Press, Cambridge, MA
Stroppa N, van den Bosch A, Way A (2007) Exploiting source similarity for SMT using context-informed features. In: Proceedings of the 11th international conference on theoretical and methodological issues in machine translation (TMI 2007), Skövde, Sweden, pp 231–240
Surdeanu M, Johansson R, Meyers A, Màrquez L, Nivre J (2008) The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In: Proceedings of the 12th conference on computational natural language learning (CoNLL-2008), Manchester, UK, pp 159–177
Tiedemann J, Nygaard L (2004) The OPUS corpus—parallel & free. In: Proceedings of the 4th international conference on language resources and evaluation (LREC 2004), Lisbon, Portugal, pp 1183–1186
Tillmann C, Zhang T (2006) A discriminative global training algorithm for statistical mt. In: Coling-ACL 2006: proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, Sydney, Australia, pp 721–728
van den Bosch A (2004) Wrapped progressive sampling search for optimizing learning algorithm parameters. In: Verbrugge R, Taatgen N, Schomaker L (eds) Proceedings of the 16th Belgian-Dutch conference on artificial intelligence, Groningen, The Netherlands
van den Bosch A, Busser B, Canisius S, Daelemans W (2007) An efficient memorybased morpho-syntactic tagger and parser for Dutch. In: Proceedings of computational linguistics in the Netherlands: selected papers from the seventeenth CLIN meeting, Leuven, Belgium, pp 99–114
Venkatapathy S (2008) NLP tools contest—2008: summary. In: Proceedings of the NLP tools contest, ICON 2008, Pune, India
Venkatapathy S, Bangalore S (2007) Three models for discriminative machine translation using global lexical selection and sentence reconstruction. In: SSST, NAACL-HLT-2007 AMTA workshop on syntax and structure in statistical translation, Rochester, NY, pp 96–102
Vickrey D, Biewald L, Teyssier M, Koller D (2005) Word-sense disambiguation for machine translation. In: HLT-EMNLP-2005: proceedings of human language technology conference and conference on empirical methods in natural language processing, Vancouver, BC, Canada, pp 771–778
Wu D, Fung P (2009) Can semantic role labeling improve SMT?. In: EAMT-2009: proceedings of the 13th annual conference of the European association for machine translation, Barcelona, Spain, pp 218–225
Xiong D, Zhang M, Li H (2010) Learning translation boundaries for phrase-based decoding. In: NAACL-HLT-2010: human language technologies: the 2010 annual conference of the North American chapter of the association for computational linguistics, Los Angeles, CA, pp 136–144
Zens R, Ney H (2004) Improvements in phrase-based statistical machine translation. In: HLT-NAACL 2004: human language technology conference and North American chapter of the association for computational linguistics annual meeting, Boston, MA, pp 257–264
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Haque, R., Naskar, S.K., van den Bosch, A. et al. Integrating source-language context into phrase-based statistical machine translation. Machine Translation 25, 239–285 (2011). https://doi.org/10.1007/s10590-011-9100-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10590-011-9100-2