Abstract
This paper represents an approach to speech-to-text conversion in the Bengali language. In this area, we have found most of the methodologies were focused on other languages rather than Bengali. We started with a novel dataset of 56 unique words from 160 individual subjects was prepared. Then in this paper, we illustrate the approach to increasing accuracy in a speech-to-text over the Bengali language where initially we started with Gated Recurrent Unit(GRU) and Long short-term memory (LSTM) algorithms. During further observation, we found that the output of the GRU failed to give any stable output. So, we moved completely to the LSTM algorithm where we achieved 90% accuracy on an unexplored dataset. Voices of several demographic populations and noises were used to validate the model. In the testing phase, we tried a variety of classes based on their length, complexity, noise, and gender variant. Moreover, we expect that this research will help to develop a real-time Bengali speak-to-text recognition model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Vanajakshi, P., Mathivanan, M.: A detailed survey on large vocabulary continuous speech recognition techniques. In: 2017 International Conference on Computer Communication and Informatics (ICCCI), 2017, pp. 1–7 (2017). https://doi.org/10.1016/0022-2836(81)90087-5
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep re-current neural networks. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 38, March 2013. https://doi.org/10.1109/ICASSP.2013.6638947
Sandanalakshmi, R., Viji, P.A., Kiruthiga, M., Manjari, M., Sharina, M.: Speaker independent continuous speech to text converter for mobile application (2013). eprint: arXiv:1307.5736, https://doi.org/10.48550/arXiv.1307.5736
Gupta, A., Joshi, A.: Speech recognition using artificial neural network. In: 2018 International Conference on Communication and Signal Processing (2018). https://doi.org/10.1109/ICCSP.2018.8524333
Chen, Y.C., Huang, S.F., Lee, H.Y., Lee, L.S.: From semi-supervised to almost-unsupervised speech recognition with very-low resource by jointly learning phonetic structures from audio and text embeddings (2019). eprint: arXiv:1904.05078. https://doi.org/10.48550/arXiv.1904.05078
Sultana, R., Palit, R.: A survey on Bengali speech-to-text recognition techniques. In: 2014 9th International Forum on Strategic Technology (IFOST) 2014, pp. 26–29 (2014). https://doi.org/10.1109/IFOST.2014.6991064
Masum, A.K.M., Majedul Islam, M., Abujar, S., Sorker, A.K., Hossain, S.A.: Bengali news headline generation on the basis of sequence to sequence learning using bi-directional RNN. In: Borah, S., Pradhan, R., Dey, N., Gupta, P. (eds.) Soft Computing Techniques and Applications. AISC, vol. 1248, pp. 491–501. Springer, Singapore (2021). https://doi.org/10.1007/978-981-15-7394-1_45, https://doi.org/10.1109/ICCCNT45670.2019.8944784
Tausif, M.T., Chowdhury, S., Hawlader, M.S., Hasanuzzaman, M., Heickal, H.: Deep learning based bangla speech-to-text conversion. In: 2018 5th International Conference on Computational Science/Intelligence and Applied Informatics (CSII), 2018, pp. 49–54 (2018)
Khatun, A., Rahman, A., Chowdhury, H.A., Islam, M.S., Tasnim, A.: A subword level language model for Bangla language. In: Uddin, M.S., Bansal, J.C. (eds.) Proceedings of International Joint Conference on Computational Intelligence. AIS, pp. 385–396. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-3607-6_31
Mehedy, L., Arifin, S.M.N., Kaykobad, M.: Bangla syntax analysis: a comprehensive approach, October 2020
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Jahan, N. et al. (2023). A Comparison of LSTM and GRU for Bengali Speech-to-Text Transformation. In: Daimi, K., Al Sadoon, A. (eds) Proceedings of the 2023 International Conference on Advances in Computing Research (ACR’23). ACR 2023. Lecture Notes in Networks and Systems, vol 700. Springer, Cham. https://doi.org/10.1007/978-3-031-33743-7_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-33743-7_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33742-0
Online ISBN: 978-3-031-33743-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)