Statistical Machine Translation of Broadcast News from Spanish to Portuguese

Sánchez Martínez, Raquel; da Silva Neto, João Paulo; Caseiro, Diamantino António

doi:10.1007/978-3-540-85980-2_12

Raquel Sánchez Martínez¹,
João Paulo da Silva Neto¹ &
Diamantino António Caseiro¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5190))

Included in the following conference series:

International Conference on Computational Processing of the Portuguese Language

601 Accesses
2 Citations

Abstract

In this paper we describe the work carried out to develop an automatic system for translation of broadcast news from Spanish to Portuguese. Two challenging topics of speech and language processing were involved: Automatic Speech Recognition (ASR) of the Spanish News and Statistical Machine Translation (SMT) of the results to the Portuguese language. ASR of broadcast news is based on the AUDIMUS.MEDIA system, a hybrid ANN/HMM system with multiple stream decoding. A 22.08% Word Error Rate (WER) was achieved in a Spanish Broadcast News task, which is comparable to other international state of the art systems. Parallel normalized texts from European Parliament database were used to train the SMT system from Spanish to Portuguese. Preliminary non-exhaustive human evaluation showed a fluency of 3.74 and sufficiency of 4.23.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Cross-Lingual Summarization of Speech-to-Speech Translation: A Baseline

The Vocapia Research ASR Systems for Evalita 2011

Analyzing Multilingual Automatic Speech Recognition Systems Performance

References

Matsoukas, S., Prasad, R., Laxminarayan, S., Xiang, B., Nguyen, L., Schwartz, R.: The 2004 BBN 1xRT Recognition Systems for English Broadcast News and Conversational Telephone Speech. In: Proceedings INTERSPEECH, Lisbon, Portugal (2005)
Google Scholar
Nguyen, L., Abdou, S., Afify, M., Makhoul, J., Matsoukas, S., Schwartz, R., Xiang, B., Lamel, L., Gauvain, J., Adda, G., Schwenk, H., Lefevre, F.: The 2004 BBN/LIMSI 10xRT English broadcast news transcription system. In: Proceedings INTERSPEECH, Lisbon, Portugal (2005)
Google Scholar
Huerta, J.M., Thayer, E., Ravishankar, M.K., Stern, R.: The Development of the 1997 CMU Spanish Broadcast News Transcription System. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA (1998)
Google Scholar
Kubala, F., Davenport, J., Jin, H., Liu, D., Leek, T., Matsoukas, S., Miller, D., Nguyen, L., Richardson, F., Schwartz, R., Makhoul, J.: The 1997 BBN byblos system applied to broadcast news transcription. In: Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Lansdowne, VA (1998)
Google Scholar
Matsoukas, S., Nguyen, L., Davenport, J., Billa, J., Richardson, F., Siu, M., Liu, D., Schwartz, R., Makhoul, J.: The 1998 BBN Byblos primary system applied to English and Spanish broadcast news transcription. In: Proceedings DARPA Broadcast News Workshop, Herndon, VA (1999)
Google Scholar
Westphal, M.: TC-STAR Recognition Baseline Results, TC-STAR Deliverable n⁰ D6 (2004), http://www.tc-star.org/documents/deliverable/deliverable_updated14april05/D6.pdf
Meinedo, H., Caseiro, D., Neto, J., Trancoso, I.: AUDIMUS.MEDIA - A Broadcast News speech recognition system for the European Portuguese language. In: Proceedings PROPOR, Faro, Portugal (2003)
Google Scholar
Neto, J., Martins, C., Meinedo, H., Almeida, L.: AUDIMUS - Sistema de reconhecimento de fala contínua para o Português Europeu. In: Proceedings PROPOR IV, Évora, Portugal (1999)
Google Scholar
Koehn, P.: Europarl: A Parallel Corpus for Statistical Machine Translation. MT Summit 2005 (2005)
Google Scholar
Hermansky, H., Morgan, N., Baya, A., Kohn, P.: RASTA-PLP speech analysis technique. In: Proceedings ICASSP, San Francisco, USA (1992)
Google Scholar
Kingsbury, B.E., Morgan, N., Greenberg, S.: Robust speech recognition using the modulation spectrogram. Speech Comunication 25, 117–132 (1998)
Article Google Scholar
Caseiro, D., Trancoso, I.: Using Dynamic WFST Composition for Recognizing Broadcast News. In: ICSLP, Denver, CO (2002)
Google Scholar
Caseiro, D., Trancoso, I., Oliveira, L., Viana, C.: Grapheme-to-Phone Using Finite-State Transducers. In: IEEE Workshop on Speech Synthesis, Santa Monica, CA (2002)
Google Scholar
Souto, N., Meinedo, H., Neto, J.: Building language models for continuous speech recognition systems. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389. Springer, Heidelberg (2002)
Chapter Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. Proceedings ICASSP 1, 181–184 (1995)
Google Scholar
Jelinek, F.: Self-organized language modeling for speech recognition. Speech Recognition 1, 450–506 (1990)
Google Scholar
Caseiro, D.: The INESC-ID Phrase-based Statistical Translation System. In: TC-STAR OpenLab, Trento, Italy (2006)
Google Scholar
Callison-Burch, C., Koehn, P.: Introduction to Statistical Machine Translation. ESSLLI Summer Course on SMT (2005)
Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted Finite-State Transducers in Speech Recognition. Computer Speech and Language 16(1), 69–88 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

L2F - Spoken Language Systems Laboratory, INESC ID Lisboa, R. Alves Redol, 9, 1000-029, Lisboa, Portugal
Raquel Sánchez Martínez, João Paulo da Silva Neto & Diamantino António Caseiro

Authors

Raquel Sánchez Martínez
View author publications
You can also search for this author in PubMed Google Scholar
João Paulo da Silva Neto
View author publications
You can also search for this author in PubMed Google Scholar
Diamantino António Caseiro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sánchez Martínez, R., da Silva Neto, J.P., Caseiro, D.A. (2008). Statistical Machine Translation of Broadcast News from Spanish to Portuguese. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_12

Download citation

DOI: https://doi.org/10.1007/978-3-540-85980-2_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85979-6
Online ISBN: 978-3-540-85980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Statistical Machine Translation of Broadcast News from Spanish to Portuguese

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Cross-Lingual Summarization of Speech-to-Speech Translation: A Baseline

The Vocapia Research ASR Systems for Evalita 2011

Analyzing Multilingual Automatic Speech Recognition Systems Performance

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Statistical Machine Translation of Broadcast News from Spanish to Portuguese

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Cross-Lingual Summarization of Speech-to-Speech Translation: A Baseline

The Vocapia Research ASR Systems for Evalita 2011

Analyzing Multilingual Automatic Speech Recognition Systems Performance

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation