This paper proposes a multichannel speech enhancement technique that leverages three essential cues embedded in the observed signal, i.e., spatial, spectral and temporal cues, for differentiating underlying clean speech component from noise. The proposed method estimates clean speech and noise features in a single optimization criterion by integrating two approaches, namely, example- and model-based multichannel speech enhancement approaches: The former utilizes spectral and temporal cues, while the latter spatial and spectral cues. In the experiment, we show the superiority of the proposed method over the conventional methods in terms of the automatic keyword recognition performance in adverse and highly non-stationary noisy environment.
Index Terms: example-based speech enhancement, model-based approach, speech recognition, blind source separation