Abstract
Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of Automatic Speech Recognition (ASR) systems are comparable to Human Speech Recognition (HSR) only under very strict working conditions, and in general far lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to rise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper an optimized digital Knowledge-based Automatic Speech Classifier for real-time applications is implemented on FPGA using six attribute scoring Multi-Layer Perceptrons (MLP). Digital MLP key features are a virtual neuron architecture and use of sinusoidal activation functions for the hidden layer. Implementation results on FPGA show that use of sinusoidal activation functions decrease hardware resource usage of more than 50% for slices, FFs, LUTs and more than 35% for FPGA RAM blocks when compared with the standard sigmoid-based neuron implementation. Furthermore, neuron virtualization allows for a significant decrease of concurrent memory access, resulting in improved performance for the entire attribute scoring module.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Vitabile, S., Gentile, A.L., Dammone, G.B., Sorbello, F.: MLP Neural Network Implementation on a SIMD Architecture. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 99–108. Springer, Heidelberg (2002)
Sorbello, F., Gioiello, G.A.M., Vitabile, S.: Handwritten Character Recognition using a MLP. In: Knowledge-Based Intelligent Techniques in Character Recognition, ch. 5, pp. 91–119. CRC Press Publishers, Boca Raton (1999)
Vitabile, S., Gentile, A., Sorbello, F.: Real-Time Road Signs Recognition on a SIMD Architecture. WSEAS Transactions on Circuits and Systems 3(3), 664–669 (2004) ISSN: 1109-2734
Huelsbergen, L.: A Representation for Dynamic Graphs in Reconfigurable Hardware and its Application to Fundamental Graph Algorithms. In: 8th International Symposium on Field Programmable Gate Arrays, ISBN 1-58113-193-3
Porrmann, M., Witkowski, U., Kalte, H., Ruckert, U.: Implementation of Artificial Neural Hardware Accelerator. In: 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing, Spain, January 9-11, pp. 243–250 (2002)
RC203 Software Manual, http://www.celoxica.com/support/documentation
Ortigosa, E.M., Ortigosa, P.M., Canas, A., Ros, E., Agis, R., Ortega, J.: FPGA Implementation of Multi-layer Perceptrons for Speech Recognition. In: Cheung, P.Y.K., Constantinides, G.A. (eds.) FPL 2003. LNCS, vol. 2778, pp. 1048–1052. Springer, Heidelberg (2003)
Kirchhoff, K.: Combining Articulatory and Acoustic Information for Speech Recognition in Noisy and Reverberant Environments. In: Proc. of the International Conference on Spoken Language Processing, Sydney, Australia, pp. 891–894
Lee, K.F., Hon, H.W.: Speaker-independent phone recognition using hidden Markov models. IEEE Trans. On Acoust., Speech and Signal Process. 37(11), 1641–1648 (1989)
Li, J., Tsao, Y., Lee, C.-H.: A Study on Knowledge source integration for candidate rescoring in automatic speech recognition. In: Proc. of ICASSP 2005 (2005)
Lee, C.-H.: From knowledge-ignorant to knowledge-rich modeling: a new speech research paradigm for next generation automatic speech recognition. In: Proc. ICSLP (2004)
Garofolo, J.S., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pallett, D.S., Dahlgren, N.L.: DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. U.S. Dept. of Commerce, NIST, Gaithersburg, MD (February 1993)
Wang, J.-C., et al.: Chipdesign of MFCC extraction for speech recognition. INTEGRATION, the VLSI journal 32, 111–131 (2002)
Siniscalchi, S.M., Li, J., Pilato, G., Vassallo, G., Clements, M.A., Gentile, A., Sorbello, F.: Application of E-aNets to Feature Recognition of Manner of Articulation in Knowledge-based Automatic Speech Recognition. In: Apolloni, B., Marinaro, M., Nicosia, G., Tagliaferri, R. (eds.) WIRN 2005 and NAIS 2005. LNCS, vol. 3931, pp. 140–146. Springer, Heidelberg (2006)
Vitabile, S., Conti, V., Gennaro, F., Sorbello, F.: Efficient MLP Digital Implementation on FPGA. In: 80 EUROMICRO Conference on Digital System Design (DSD 2005), pp. 218–222. IEEE Computer Society Press, Los Alamitos (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Siniscalchi, S.M., Gennaro, F., Vitabile, S., Gentile, A., Sorbello, F. (2005). Efficient FPGA Implementation of a Knowledge-Based Automatic Speech Classifier. In: Yang, L.T., Zhou, X., Zhao, W., Wu, Z., Zhu, Y., Lin, M. (eds) Embedded Software and Systems. ICESS 2005. Lecture Notes in Computer Science, vol 3820. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11599555_21
Download citation
DOI: https://doi.org/10.1007/11599555_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30881-2
Online ISBN: 978-3-540-32297-9
eBook Packages: Computer ScienceComputer Science (R0)