Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures

Aires, José; Lopes, Gabriel Pereira; Gomes, Luis

doi:10.1007/978-3-642-04686-5_48

José Aires²³,
Gabriel Pereira Lopes²³ &
Luis Gomes²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5816))

Included in the following conference series:

Portuguese Conference on Artificial Intelligence

1400 Accesses
9 Citations

Abstract

In this paper, we will address term translation extraction from indexed aligned parallel corpora, by using a couple of association measures combined by a voting scheme, for scaling down translation pairs according to the degree of internal cohesiveness, and evaluate results obtained. Precision obtained is clearly much better than results obtained in related work for the very low range of occurrences we have dealt with, and compares with the best results obtained in word translation.

Research supported by FCT/MCTES, through Ph.D. scholarship, ref. SFRH/BD/48839/2008, and project VIP-Access, ref. PTDC/PLP/72142/2006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A system for terminology extraction and translation equivalent detection in real time

Article 01 September 2017

Automatic Lexical Alignment between Syntactically Weak Related Languages. Application for English and Romanian

Mapping and Aligning Units from Comparable Corpora

References

Aires, J., Lopes, G., Silva, J.: Efficient Multi-Word Expressions Extractor Using Suffix Arrays and Related Structures. In: ACM iNEWS 2008, Napa Valley, California, USA (2008)
Google Scholar
Ballesteros, L., Croft, W.B.: Phrasal translation and query expansion techniques for cross language information retrieval. In: ACM-SIGIR Conference on Research and Development in Information Retrieval, pp. 84–91 (1997)
Google Scholar
Gale, W.A., Church, K.W.: A Programme for aligning sentences in bilingual Corpora. Computational Linguistics 19(1), 75–102 (1993)
Google Scholar
Gomes, L.: Parallel Texts Alignment, M.Sc. Thesis, FCT/UNL (2009)
Google Scholar
Henderson, J.: Word Alignment Baselines. In: HLT-NAACL Workshop on Building and Using Parallel Texts Data Driven Machine Translation and Beyond, pp. 27–30 (2003)
Google Scholar
Hjelm, H.: Identifying Cross Language Term Equivalents Using Statistical Machine Translation and Distributional Association Measures. In: Proceedings of Nodalida 2007, the 16th Nordic Conference of Computational Linguistics, Tartu, Estonia (2007)
Google Scholar
Langlais, P., Simard, M.: Merging example-based and statistical machine translation: An experiment. In: Richardson, S.D. (ed.) AMTA 2002. LNCS (LNAI), vol. 2499, pp. 104–113. Springer, Heidelberg (2002)
Chapter Google Scholar
Manber, U., Myers, G.: Suffix arrays: A new method for on-line string searches. In: Proceedings of The First Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 319–327 (1990)
Google Scholar
Melamed, D.: Models of translational equivalence among words. Computational Linguistics 26(2), 221–249 (2000)
Article Google Scholar
Och, F.J., Ney, H.: Asystematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Article MATH Google Scholar
Ribeiro, A., Dias, G., Lopes, G., Mexia, J.: Cognates Alignment. In: Maegaard, B. (ed.) Proceedings of the Machine Translation Summit VIII (MT Summit VIII), Santiago de Compostela, Spain, September 18-22, 2001. European Association of Machine Translation, pp. 287–292 (2001)
Google Scholar
Ribeiro, A., Lopes, G., Mexia, J.: Extracting Translation Equivalents from aligned parallel texts: comparison of measures of similarity. In: Monard, M.C., Sichman, J.S. (eds.) SBIA 2000 and IBERAMIA 2000. LNCS, vol. 1952, pp. 339–349. Springer, Heidelberg (2000)
Chapter Google Scholar
Sahlgren, M., Karlgren, J.: Automatic bilingual lexicon acquisition using random indexing of parallel corpora. Natural Language Engineering 11(3), 1–38 (2005)
Article Google Scholar
Smadja, F., McKeeown, K.R., Hatzivassiloglou, V.: Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics 22(1), 1–38 (1996)
Google Scholar
Venugopal, A., Vogel, S., Waibel, A.: Effective phrase translation extraction from alignment models. In: Proc. of the 41st Annual Meeting of ACL, July 2003, pp. 319–326 (2003)
Google Scholar
Veronis, J., Langlais, P.: Evaluation of parallel text alignment systems: he ARCADE project. In: Veronis, J. (ed.) ‘Parallel Text Processing”, Text, Speech and Language Technology Series. Speech and Language Technology Series, vol. 13, pp. 369–388. Kluwer Academic Publishers, Dordrecht (2001)
Google Scholar
Yamamoto, M., Church, K.: Using suffix arrays to compute term frequency and document frequency for all sub-strings in a corpus. Computational Linguistics 27(1), 1–30 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

CITI, Departamento de Informática, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
José Aires, Gabriel Pereira Lopes & Luis Gomes

Authors

José Aires
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Pereira Lopes
View author publications
You can also search for this author in PubMed Google Scholar
Luis Gomes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IEETA/Department of Electronics & Telecommunication, University of Aveiro, Campus Santiago, P.O. Box, 3810-153, Aveiro, Portugal
Luís Seabra Lopes
LSE-IEETA/DETI, Universidade de Aveiro, Portugal
Nuno Lau
Universidade de Aveiro, Aveiro, Portugal
Pedro Mariano
School of Informatics, Indiana University, Bloomington, IN, USA, and Computational Biology Collaboratorium, Instituto ulbenkian da Ciencia, Portugal
Luís M. Rocha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aires, J., Lopes, G.P., Gomes, L. (2009). Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures. In: Lopes, L.S., Lau, N., Mariano, P., Rocha, L.M. (eds) Progress in Artificial Intelligence. EPIA 2009. Lecture Notes in Computer Science(), vol 5816. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04686-5_48

Download citation

DOI: https://doi.org/10.1007/978-3-642-04686-5_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04685-8
Online ISBN: 978-3-642-04686-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A system for terminology extraction and translation equivalent detection in real time

Automatic Lexical Alignment between Syntactically Weak Related Languages. Application for English and Romanian

Mapping and Aligning Units from Comparable Corpora

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Phrase Translation Extraction from Aligned Parallel Corpora Using Suffix Arrays and Related Structures

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A system for terminology extraction and translation equivalent detection in real time

Automatic Lexical Alignment between Syntactically Weak Related Languages. Application for English and Romanian

Mapping and Aligning Units from Comparable Corpora

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation