A Hybrid Warping Method Approach to Speaker Warping Adaptation

Roh, Yong-Wan; Kim, Jung-Hyun; Kim, Dong-Joo; Hong, Kwang-Seok

doi:10.1007/11676935_18

Yong-Wan Roh²¹,
Jung-Hyun Kim²¹,
Dong-Joo Kim²¹ &
…
Kwang-Seok Hong²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3849))

Included in the following conference series:

International Workshop on Fuzzy Logic and Applications

864 Accesses

Abstract

The method of speaker normalization has been known as the successful method for improving the speech recognition at speaker independent speech recognition system. This paper propose a new power spectrum warping approach to making improvement of speaker normalization better than a frequency warping. The power spectrum warping uses Mel-frequency cepstral of Mel filter bank in MFCC. Also, this paper proposes the hybrid VTN combined the power spectrum warping and a frequency warping. Experiment of this paper did a comparative analysis about the recognition performance of the SKKU PBW DB applied each the power spectrum is 3.06%, and hybrid VTN is 4.07% word error rate reduction as word recognition performance of baseline system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Improved Speech-Signal Based Frequency Warping Scale for Cepstral Feature in Robust Speaker Verification System

Article 11 March 2020

Speaker Recognition System Using Dynamic Time Warping Matching and Mel-Scale Frequency Cepstral Coefficients

Study of the Effect of Reducing Training Data in Speech Synthesis Adaptation Based on Frequency Warping

References

Lee, L., Rose, R.: A Frequency Warping Approach to Speaker Normalization. IEEE Transactions on Speech and Audio Processing 6(1) (January 1998)
Google Scholar
Welling, L., Ney, H., Kanthak, S.: Speaker Adaptive Modeling by Vocal Tract Normalization. IEEE Transaction on Speech and Audio Processing 10(6) (September 2002)
Google Scholar
Andreou, A., Kam, T., Cohen, J.: Experiments in Vocal Tract Normalization. In: Proc. CAIP Workshop: Frontiers in Speech Recognition II (1994)
Google Scholar
Seltzer, M.: SPHINX III Signal Processing Front End Specification, CMU Speech Group (August 1999)
Google Scholar
Linde, Y., Duzo, A., Gray, R.M.: An Algorithm for Vector Quantizer Design. IEEE Transaction on COM. 28 (January 1980)
Google Scholar
Youn, J.S., Chung, K.W., Hong, K.S.: A Continuous Digit Speech Recognition Applied Vowel Sequence and VCCV Unit HMM. In: Proceeding of the Acoustical Society of Korea, vol. 20(2) (2001)
Google Scholar
Rossing, T.D., Wheeler, P., Moore, F.R.: The Science of Sound. Addition Wesley. Addison Wesley, London (2002)
Google Scholar
Roth, R., et al.: Dragon systems 1994 Large Vocabulary Continuous Speech Recognizer. In: Proc. Spoken Language Systems Technology Workshop (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information and Communication Engineering, Sungkyunkwan University, 300, Chunchun Dong, Jangan-gu, Suwon, Kyungki-do, 440-746, Korea
Yong-Wan Roh, Jung-Hyun Kim, Dong-Joo Kim & Kwang-Seok Hong

Authors

Yong-Wan Roh
View author publications
You can also search for this author in PubMed Google Scholar
Jung-Hyun Kim
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Joo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Kwang-Seok Hong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. TSI, CNRS UMR 5141 LTCI, ENST (Télécom ParisTech), Paris, France
Isabelle Bloch
Centro Direzionale, DSA, University of Naples, Parthenope, Isola C/4, 80143, Naples, Italy
Alfredo Petrosino
Dipartimento di Tecnologie dell’Informazione, Università di Milano Crema,
Andrea G. B. Tettamanzi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roh, YW., Kim, JH., Kim, DJ., Hong, KS. (2006). A Hybrid Warping Method Approach to Speaker Warping Adaptation. In: Bloch, I., Petrosino, A., Tettamanzi, A.G.B. (eds) Fuzzy Logic and Applications. WILF 2005. Lecture Notes in Computer Science(), vol 3849. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11676935_18

Download citation

DOI: https://doi.org/10.1007/11676935_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32529-1
Online ISBN: 978-3-540-32530-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Hybrid Warping Method Approach to Speaker Warping Adaptation

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improved Speech-Signal Based Frequency Warping Scale for Cepstral Feature in Robust Speaker Verification System

Speaker Recognition System Using Dynamic Time Warping Matching and Mel-Scale Frequency Cepstral Coefficients

Study of the Effect of Reducing Training Data in Speech Synthesis Adaptation Based on Frequency Warping

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

A Hybrid Warping Method Approach to Speaker Warping Adaptation

Abstract

Access this chapter

Preview

Similar content being viewed by others

Improved Speech-Signal Based Frequency Warping Scale for Cepstral Feature in Robust Speaker Verification System

Speaker Recognition System Using Dynamic Time Warping Matching and Mel-Scale Frequency Cepstral Coefficients

Study of the Effect of Reducing Training Data in Speech Synthesis Adaptation Based on Frequency Warping

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation