Comparison of spectral analysis methods for automatic speech recognition

Parinam, Venkata Neelima; Vootkuri, Chandra; Zahorian, Stephen A.

doi:10.21437/Interspeech.2013-742

iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://doi.org/10.21437/Interspeech.2013-742

ISCA Archive - Comparison of spectral analysis methods for automatic speech recognition

ISCA Archive Interspeech 2013

Comparison of spectral analysis methods for automatic speech recognition

Venkata Neelima Parinam, Chandra Vootkuri, Stephen A. Zahorian

In this paper, we evaluate the front-end of Automatic Speech Recognition (ASR) systems, with respect to different types of spectral processing methods that are extensively used. Experimentally, we show that direct use of FFT spectral values is just as effective as using either Mel or Gammatone filter banks, as an intermediate processing stage, if the cosine basis vectors used for dimensionality reduction are appropriately modified. Furthermore it is shown that trajectory features computed over intervals of approximately 300ms are considerably more effective, in terms of ASR accuracy, than are delta and delta-delta terms often used for ASR. Although there is no major performance disadvantage if a filter bank is used, simplicity of analysis is a reason to eliminate this step in speech processing. The experimental results which confirm the above assertions are based on the TIMIT phonetically labeled database. The assertions hold for both clean and noisy speech.