Combined low level and high level features for out-of-vocabulary word detection

Lecouteux, Benjamin; Linarès, Georges; Favre, Benoit

doi:10.21437/Interspeech.2009-344

iBet uBet web content aggregator. Adding the entire web to your favor.

Link to original content: https://doi.org/10.21437/Interspeech.2009-344

ISCA Archive - Combined low level and high level features for out-of-vocabulary word detection

ISCA Archive Interspeech 2009

Combined low level and high level features for out-of-vocabulary word detection

Benjamin Lecouteux, Georges Linarès, Benoit Favre

This paper addresses the issue of Out-Of-Vocabulary (OOV) word detection in Large Vocabulary Continuous Speech Recognition (LVCSR) systems. We propose a method inspired by confidence measures, that consists in analyzing the recognition system outputs in order to automatically detect errors due to OOV words. This method combines various features based on acoustic, linguistic, decoding graph and semantics. We evaluate separately each feature and we estimate their complementarity. Experiments are conducted on a large French broadcast news corpus from the ESTER evaluation campaign. Results show good performance in real conditions: the method obtains an OOV word detection rate of 43%90% with 2.5%17.5% of false detection.