iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://doi.org/10.1145/3542924
Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model | ACM Transactions on Asian and Low-Resource Language Information Processing skip to main content
research-article

Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model

Published: 27 December 2022 Publication History

Abstract

Transliterating the text of a language to a foreign script is called forward transliteration and transliterating the text back to the original script is called backward transliteration. In this work, we perform both forward as well as backward transliteration on Punjabi. We transliterate Punjabi person names from Gurmukhi script to English Roman script and from English Roman script back to Gurmukhi script using n-gram language model. We used more than one million parallel entities of person names in Gurmukhi and Roman script as the training corpus. We generated English to Punjabi and Punjabi to English n-grams databases from the corpus. To get better results, we tried to create as long n-grams as possible ranging from bi-gram to 30-gram. Our n-grams database contains more than 10 million n-grams, with each n-gram having multiple mappings of the other script. The most challenging part is to find the mapping for the given n-gram from the parallel name entity while creating n-grams databases. As per the orthography rules, the same combination of letters may have different pronunciation, depending upon its location in the word. Therefore, we categorized n-grams into starting, middle, and ending n-grams and used them accordingly in the transliteration process. The transliteration process works like the merge sort. We start searching the longest possible n-gram in the database and split the string recursively until the match is found. The transliterated strings are merged back to form the final output. In English to Punjabi transliteration, we achieved 96% accuracy using gold standard and 99.14% accuracy using minimum edit distance. In Punjabi to English transliteration, the result showed 96.85% and 99.35% accuracy for the gold standard and minimum edit distance, respectively.

References

[1]
Nasreen Abdul Jaleel and Leah S. Larkey. 2003. Statistical transliteration for English-Arabic cross language information retrieval. In Proceedings of the 12th International Conference on Information and Knowledge Management. ACM, 139–146.
[2]
Lorna Balkan. 1994. Test Suites: Some Issues in Their Use and Design. Citeseer.
[3]
Deepti Bhalla, Nisheeth Joshi, and Iti Mathur. 2013. Rule based transliteration scheme for English to Punjabi. Int. J. Nat. Lang. Comput. 2, 2 (2013).
[4]
Soma Chatterjee and Kamal Sarkar. 2021. Machine transliteration using SVM and HMM. Int. J. Adv. Intell. Parad. 19, 1 (2021), 3–27.
[5]
Manoj K. Chinnakotla, Om P. Damani, and Avijit Satoskar. 2010. Transliteration for resource-scarce languages. ACM Trans. Asian Lang. Inf. Process. 9, 4 (2010), 14.
[6]
Kamal Deep and Dr. Vishal Goyal. 2011. Hybrid approach for Punjabi to English transliteration system. Int. J. Comput. Applic. 28, 1 (2011), 0975–8887.
[7]
Kamal Deep and Vishal Goyal. 2011. Development of a Punjabi to English transliteration system. Int. J. Comput. Sci. Commun. 2, 2 (2011), 521–526.
[8]
Manikrao Dhore, Shantanu Dixit, and Ruchi Dhore. 2012. Hindi and Marathi to English NE transliteration tool using phonology and stress analysis. In Proceedings of the International Conference on Computational Linguistics. 111–118.
[9]
Debbie Elliott, Anthony Hartley, and E. S. Atwell. 2003. Rationale for a multilingual corpus for machine translation evaluation. In Proceedings of the International Conference on Corpus Linguistics. 191–200.
[10]
Vishal Goyal and Gurpreet Singh Lehal. 2009. Hindi-Punjabi machine transliteration system (for machine translation system). George Ronchi Found. J., Italy 64, 1 (2009).
[11]
Vishal Goyal and Gurpreet Singh Lehal. 2010. Web based Hindi to Punjabi machine translation system. J. Emerg. Technol. Web Intell. 2, 2 (2010), 148–151.
[12]
James E. Hoard. 1991. Preliminaries to the development of evaluation metrics for natural language semantic and pragmatic analysis systems. In Proceedings of the Natural Language Processing Systems Evaluation Workshop. 97.
[13]
Gurpreet Singh Josan and Jagroop Kaur. 2011. Punjabi to Hindi statistical machine transliteration. Int. J. Inf. Technol. Knowl. Manag. 4, 2 (2011), 459–463.
[14]
Gurpreet Singh Josan and Gurpreet Singh Lehal. 2010. A Punjabi to Hindi machine transliteration system. Int. J. Computat. Ling. Chinese Lang. Process. 15, 2 (2010).
[15]
Arshveer Kaur and Vishal Goyal. 2018. Punjabi to English machine transliteration for proper nouns. In Proceedings of the 3rd International Conference on Internet of Things: Smart Innovation and Usages (IoT-SIU). IEEE, 1–7.
[16]
Devinder Kaur and Rishamjot Kaur. 2015. English to Punjabi script converter system for proper nouns using hybrid approach. Int. J. Sci. Res. Manag. 3, 3 (2015).
[17]
Kamaljeet Kaur and Parminder Singh. 2014. Review of machine transliteration techniques. Int. J. Comput. Applic. 107, 20 (2014).
[18]
Navapat Khantonthong, Asanee Kawtrakul, and Yuen Poovarawan. 2000. An enhancement of thai text retrieval efficiency by automatic backward transliteration. In Proceedings of the 7th International Workshop on Academic Information Networks and Systems, Bangkok, Thailand. 73–84.
[19]
Kevin Knight and Jonathan Graehl. 1998. Machine transliteration. Computat. Ling. 24, 4 (1998), 599–612.
[20]
B. S. Sowmya Lakshmi and B. R. Shambhavi. 2021. An ensemble of grapheme and phoneme-based models for automatic English to Kannada back-transliteration. Int. J. Intell. Sustain. Comput. 1, 2 (2021), 138–150.
[21]
Jae Sung Lee and Key-Sun Choi. 1998. English to Korean statistical transliteration for information retrieval. Comput. Process. Orient. Lang. 12, 1 (1998), 17–37.
[22]
Gurpreet Singh Lehal. 2009. A Gurmukhi to Shahmukhi transliteration system. In Proceedings of the 7th International Conference on Natural Language Processing. 167–173.
[23]
Gurpreet Singh Lehal and Tejinder Singh Saini. 2012. Conversion between scripts of Punjabi: Beyond simple transliteration. In Proceedings of the International Conference on Computational Linguistics. 633–642.
[24]
Muhammad G. Malik. 2006. Punjabi machine transliteration. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 1137–1144.
[25]
Peter Nabende. 2009. Transliteration system using pair HMM with weighted FSTs. In Proceedings of the Named Entities Workshop: Shared Task on Transliteration. Association for Computational Linguistics, 100–103.
[26]
Jong-Hoon Oh and Key-Sun Choi. 2002. An English-Korean transliteration model using pronunciation and contextual rules. In Proceedings of the 19th International Conference on Computational Linguistics. Association for Computational Linguistics, 1–7.
[27]
Jong-Hoon Oh, Key-Sun Choi, and Hitoshi Isahara. 2006. A machine transliteration model based on correspondence between graphemes and phonemes. ACM Trans. Asian Lang. Inf. Process. 5, 3 (2006), 185–208.
[28]
Sheilly Padda, Rupinderdeep Kaur, and Nidhi. 2012. Punjabi phonetic: Punjabi text to IPA conversion. Int. J. Emerg. Technol. Adv. Eng. Retrieved from www.ijetae.com.
[29]
Tejinder Singh Saini and Gurpreet Singh Lehal. 2008. Shahmukhi to Gurmukhi transliteration system: A corpus based approach. Res. Comput. Sci. 33 (2008), 151–162.
[30]
Samandeep Kaur and Charanjiv Singh. 2015. Conversion of Punjabi text to Ipa using phonetic symbols. Inte. J. Techno. Res. Eng. 2, 12 (2015).
[31]
Bonnie Glover Stalls and Kevin Knight. 1998. Translating names and technical terms in Arabic text. In Proceedings of the Workshop on Computational Approaches to Semitic Languages. Association for Computational Linguistics, 34–41.
[32]
Paola Virga and Sanjeev Khudanpur. 2003. Transliteration of proper names in cross-lingual information retrieval. In Proceedings of the ACL Workshop on Multilingual and Mixed-language Named Entity Recognition. Association for Computational Linguistics, 57–64.

Cited By

View all
  • (2023)A Survey of Advancements in Real-Time Sign Language Translators: Integration with IoT TechnologyTechnologies10.3390/technologies1104008311:4(83)Online publication date: 22-Jun-2023

Index Terms

  1. Forward-backward Transliteration of Punjabi Gurmukhi Script Using N-gram Language Model

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Asian and Low-Resource Language Information Processing
      ACM Transactions on Asian and Low-Resource Language Information Processing  Volume 22, Issue 2
      February 2023
      624 pages
      ISSN:2375-4699
      EISSN:2375-4702
      DOI:10.1145/3572719
      Issue’s Table of Contents

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 December 2022
      Online AM: 09 June 2022
      Accepted: 31 May 2022
      Revised: 11 May 2022
      Received: 13 June 2021
      Published in TALLIP Volume 22, Issue 2

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Transliteration
      2. Punjabi
      3. computational linguistics etc

      Qualifiers

      • Research-article
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)43
      • Downloads (Last 6 weeks)5
      Reflects downloads up to 11 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)A Survey of Advancements in Real-Time Sign Language Translators: Integration with IoT TechnologyTechnologies10.3390/technologies1104008311:4(83)Online publication date: 22-Jun-2023

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Full Text

      View this article in Full Text.

      Full Text

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media