default search action

combined dblp search
author search
venue search
publication search

ask others

Samuel Thomas 0001

> Home > Persons

Person information

affiliation: IBM Research AI, Thomas J. Watson Research Center, NY, USA
affiliation (former): Johns Hopkins University, USA

Other persons with the same name

see FAQ

Other persons with a similar name

see FAQ

Why are some names followed by a four digit number?

SPARQL queries

Refine list

refinements active!

zoomed in on ?? of ?? records

view refined list in

export refined list as

showing all ?? records

2020 – today

see FAQ

What is the meaning of the colors in the publication lists?

2024
[c99]
- view
  authority control:
- export record
  dblp key:
  - conf/cvpr/0001SRK0CFGK24
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/0001SRK0CFGK24
Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogério Feris, James R. Glass, Hilde Kuehne:
What, When, and Where? Self-Supervised Spatio- Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions. CVPR 2024: 18419-18429
[i29]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2406-10082
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2406-10082
Andrew Rouditchenko, Yuan Gong, Samuel Thomas, Leonid Karlinsky, Hilde Kuehne, Rogério Feris, James R. Glass:
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation. CoRR abs/2406.10082 (2024)
2023
[c98]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/FukudaT23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/FukudaT23
Takashi Fukuda, Samuel Thomas:
Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data. ICASSP 2023: 1-5
[c97]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/RouditchenkoCSTFKKHKG23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/RouditchenkoCSTFKKHKG23
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. ICASSP 2023: 1-5
[c96]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/SunderTKKF23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/SunderTKKF23
Vishal Sunder, Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury, Eric Fosler-Lussier:
Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding. ICASSP 2023: 1-5
[c95]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasKSK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasKSK23
Samuel Thomas, Hong-Kwang Jeff Kuo, George Saon, Brian Kingsbury:
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition. ICASSP 2023: 1-5
[c94]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunderF0KK23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunderF0KK23
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury:
ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding. INTERSPEECH 2023: 1129-1133
[c93]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouditchenkoK0F23
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouditchenkoK0F23
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. INTERSPEECH 2023: 2268-2272
[i28]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2303-16990
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2303-16990
Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogério Feris, James R. Glass, Hilde Kuehne:
What, when, and where? - Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions. CoRR abs/2303.16990 (2023)
[i27]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2305-12606
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2305-12606
Andrew Rouditchenko, Sameer Khurana, Samuel Thomas, Rogério Feris, Leonid Karlinsky, Hilde Kuehne, David Harwath, Brian Kingsbury, James R. Glass:
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages. CoRR abs/2305.12606 (2023)
2022
[c92]
- view
  authority control:
- export record
  dblp key:
  - conf/cvpr/ShvetsovaCR0KFH22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/ShvetsovaCR0KFH22
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CVPR 2022: 19988-19997
[c91]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KuoTTKS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KuoTTKS22
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Brian Kingsbury, George Saon:
Improving End-to-end Models for Set Prediction in Spoken Language Understanding. ICASSP 2022: 7162-7166
[c90]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/SunderTKGKF22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/SunderTKGKF22
Vishal Sunder, Samuel Thomas, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier:
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding. ICASSP 2022: 7497-7501
[c89]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KonsSKTCHK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KonsSKTCHK22
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury:
A New Data Augmentation Method for Intent Classification Enhancement and its Application on Spoken Conversation Datasets. ICASSP 2022: 7632-7636
[c88]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasKKS22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasKKS22
Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon:
Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems. ICASSP 2022: 7932-7936
[c87]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasKSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasKSK22
Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo:
Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models. ICASSP 2022: 8127-8131
[c86]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KonsAMDK0S22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KonsAMDK0S22
Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon:
Extending RNN-T-based speech recognition systems with emotion and language classification. INTERSPEECH 2022: 546-549
[c85]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SunderF0KK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SunderF0KK22
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Kuo, Brian Kingsbury:
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems. INTERSPEECH 2022: 2683-2687
[c84]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fukuda0SKSK22
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fukuda0SKSK22
Takashi Fukuda, Samuel Thomas, Masayuki Suzuki, Gakuto Kurata, George Saon, Brian Kingsbury:
Global RNN Transducer Models For Multi-dialect Speech Recognition. INTERSPEECH 2022: 3138-3142
[i26]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2201-12105
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2201-12105
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Brian Kingsbury, George Saon:
Improving End-to-End Models for Set Prediction in Spoken Language Understanding. CoRR abs/2201.12105 (2022)
[i25]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2202-10137
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-10137
Zvi Kons, Aharon Satt, Hong-Kwang Kuo, Samuel Thomas, Boaz Carmeli, Ron Hoory, Brian Kingsbury:
A new data augmentation method for intent classification enhancement and its application on spoken conversation datasets. CoRR abs/2202.10137 (2022)
[i24]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2202-13155
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2202-13155
Samuel Thomas, Brian Kingsbury, George Saon, Hong-Kwang Jeff Kuo:
Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models. CoRR abs/2202.13155 (2022)
[i23]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2203-00006
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2203-00006
Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury, George Saon:
Towards Reducing the Need for Speech Training Data To Build Spoken Language Understanding Systems. CoRR abs/2203.00006 (2022)
[i22]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2204-05169
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2204-05169
Vishal Sunder, Samuel Thomas, Hong-Kwang Jeff Kuo, Jatin Ganhotra, Brian Kingsbury, Eric Fosler-Lussier:
Towards End-to-End Integration of Dialog History for Improved Spoken Language Understanding. CoRR abs/2204.05169 (2022)
[i21]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2204-05188
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2204-05188
Vishal Sunder, Eric Fosler-Lussier, Samuel Thomas, Hong-Kwang Jeff Kuo, Brian Kingsbury:
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems. CoRR abs/2204.05188 (2022)
[i20]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2207-13965
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2207-13965
Zvi Kons, Hagai Aronowitz, Edmilson da Silva Morais, Matheus Damasceno, Hong-Kwang Kuo, Samuel Thomas, George Saon:
Extending RNN-T-based speech recognition systems with emotion and language classification. CoRR abs/2207.13965 (2022)
[i19]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/corr/abs-2210-03625
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2210-03625
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogério Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James R. Glass:
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval. CoRR abs/2210.03625 (2022)
2021
[j3]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - journals/taslp/SariHT21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/taslp/SariHT21
Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas:
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection. IEEE ACM Trans. Audio Speech Lang. Process. 29: 324-333 (2021)
[c83]
- view
  authority control:
- export record
  dblp key:
  - conf/eusipco/KoumparoulisP0M21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/eusipco/KoumparoulisP0M21
Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson da Silva Morais:
Resource-efficient TDNN Architectures for Audio-visual Speech Recognition. EUSIPCO 2021: 506-510
[c82]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/MoraisK0TK21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/MoraisK0TK21
Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas, Zoltán Tüske, Brian Kingsbury:
End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features. ICASSP 2021: 7483-7487
[c81]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/0001KSTKKKH21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/0001KSTKKKH21
Samuel Thomas, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory:
RNN Transducer Models for Spoken Language Understanding. ICASSP 2021: 7493-7497
[c80]
- view
  authority control:
- export record
  dblp key:
  - conf/iccv/ChenRDK0BPKFHGP21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/iccv/ChenRDK0BPKFHGP21
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. ICCV 2021: 7992-8001
[c79]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Ganhotra0KJSTK21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Ganhotra0KJSTK21
Jatin Ganhotra, Samuel Thomas, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury:
Integrating Dialog History into End-to-End Spoken Language Understanding Systems. Interspeech 2021: 1254-1258
[c78]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouditchenkoBHC21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouditchenkoBHC21
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Brian Chen, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Hilde Kuehne, Rameswar Panda, Rogério Schmidt Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. Interspeech 2021: 1584-1588
[c77]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/RouditchenkoBH021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/RouditchenkoBH021
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. Interspeech 2021: 3006-3010
[c76]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fukuda021
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fukuda021
Takashi Fukuda, Samuel Thomas:
Knowledge Distillation Based Training of Universal ASR Source Models for Cross-Lingual Transfer. Interspeech 2021: 3450-3454
[c75]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ChaHJPPK0M21
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ChaHJPPK0M21
Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Jeff Kuo, Samuel Thomas, Edmilson da Silva Morais:
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs. Interspeech 2021: 4723-4727
[i18]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2104-03842
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2104-03842
Samuel Thomas, Hong-Kwang Jeff Kuo, George Saon, Zoltán Tüske, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory:
RNN Transducer Models For Spoken Language Understanding. CoRR abs/2104.03842 (2021)
[i17]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2104-05752
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2104-05752
Sujeong Cha, Wangrui Hou, Hyun Jung, My Phung, Michael Picheny, Hong-Kwang Kuo, Samuel Thomas, Edmilson da Silva Morais:
Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs. CoRR abs/2104.05752 (2021)
[i16]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2104-12671
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2104-12671
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Schmidt Feris, David Harwath, James R. Glass, Michael Picheny, Shih-Fu Chang:
Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. CoRR abs/2104.12671 (2021)
[i15]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2108-08405
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2108-08405
Jatin Ganhotra, Samuel Thomas, Hong-Kwang Jeff Kuo, Sachindra Joshi, George Saon, Zoltán Tüske, Brian Kingsbury:
Integrating Dialog History into End-to-End Spoken Language Understanding Systems. CoRR abs/2108.08405 (2021)
[i14]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2111-04823
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2111-04823
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogério Feris, Brian Kingsbury, Michael Picheny, James R. Glass:
Cascaded Multilingual Audio-Visual Learning from Videos. CoRR abs/2111.04823 (2021)
[i13]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2112-00775
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2112-00775
Kevin Duarte, Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Samuel Thomas, Alexander H. Liu, David Harwath, James R. Glass, Hilde Kuehne, Mubarak Shah:
Routing with Self-Attention for Multimodal Capsule Networks. CoRR abs/2112.00775 (2021)
[i12]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2112-04446
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2112-04446
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Hilde Kuehne:
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval. CoRR abs/2112.04446 (2021)
2020
[c74]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/KoumparoulisP0M20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/KoumparoulisP0M20
Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson da Silva Morais:
Audio-Assisted Image Inpainting for Talking Faces. ICASSP 2020: 7664-7668
[c73]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HuangK0KAKHP20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HuangK0KAKHP20
Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny:
Leveraging Unpaired Text Data for Training End-To-End Speech-to-Intent Systems. ICASSP 2020: 7984-7988
[c72]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/Sari0H20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/Sari0H20
Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson:
Training Spoken Language Understanding Systems with Non-Parallel Speech and Text. ICASSP 2020: 8109-8113
[c71]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/Fukuda020
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/Fukuda020
Takashi Fukuda, Samuel Thomas:
Implicit Transfer of Privileged Acoustic Information in a Generalized Knowledge Distillation Framework. INTERSPEECH 2020: 41-45
[c70]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KuoT0HAKKKHL20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KuoT0HAKKKHL20
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras:
End-to-End Spoken Language Understanding Without Full Transcripts. INTERSPEECH 2020: 906-910
[c69]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/KoumparoulisP0M20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/KoumparoulisP0M20
Alexandros Koumparoulis, Gerasimos Potamianos, Samuel Thomas, Edmilson da Silva Morais:
Resource-Adaptive Deep Learning for Visual Speech Recognition. INTERSPEECH 2020: 3510-3514
[c68]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/0001AK20
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/0001AK20
Samuel Thomas, Kartik Audhkhasi, Brian Kingsbury:
Transliteration Based Data Augmentation for Training Multilingual ASR Acoustic Models in Low Resource Settings. INTERSPEECH 2020: 4736-4740
[i11]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2006-09199
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2006-09199
Andrew Rouditchenko, Angie W. Boggust, David Harwath, Dhiraj Joshi, Samuel Thomas, Kartik Audhkhasi, Rogério Feris, Brian Kingsbury, Michael Picheny, Antonio Torralba, James R. Glass:
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos. CoRR abs/2006.09199 (2020)
[i10]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2009-14386
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2009-14386
Hong-Kwang Jeff Kuo, Zoltán Tüske, Samuel Thomas, Yinghui Huang, Kartik Audhkhasi, Brian Kingsbury, Gakuto Kurata, Zvi Kons, Ron Hoory, Luis A. Lastras:
End-to-End Spoken Language Understanding Without Full Transcripts. CoRR abs/2009.14386 (2020)
[i9]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2010-04284
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2010-04284
Yinghui Huang, Hong-Kwang Kuo, Samuel Thomas, Zvi Kons, Kartik Audhkhasi, Brian Kingsbury, Ron Hoory, Michael Picheny:
Leveraging Unpaired Text Data for Training End-to-End Speech-to-Intent Systems. CoRR abs/2010.04284 (2020)
[i8]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-2011-08238
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-2011-08238
Edmilson da Silva Morais, Hong-Kwang Jeff Kuo, Samuel Thomas, Zoltán Tüske, Brian Kingsbury:
End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features. CoRR abs/2011.08238 (2020)

2010 – 2019

see FAQ

What is the meaning of the colors in the publication lists?

2019
[c67]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/FukudaT19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/FukudaT19
Takashi Fukuda, Samuel Thomas:
Mixed Bandwidth Acoustic Modeling Leveraging Knowledge Distillation. ASRU 2019: 509-515
[c66]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/SaonTAKPT19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/SaonTAKPT19
George Saon, Zoltán Tüske, Kartik Audhkhasi, Brian Kingsbury, Michael Picheny, Samuel Thomas:
Simplified LSTMS for Speech Recognition. ASRU 2019: 547-553
[c65]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/HuangTSTSP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/HuangTSTSP19
Yinghui Huang, Samuel Thomas, Masayuki Suzuki, Zoltán Tüske, Larry Sansone, Michael Picheny:
Semi-Supervised Training and Data Augmentation for Adaptation of Automatic Broadcast News Captioning Systems. ASRU 2019: 867-874
[c64]
- view
  - electronic edition @ thecvf.com (open access)
  - details & citations
- export record
  dblp key:
  - conf/cvpr/BoggustAJHTFGZ019
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/cvpr/BoggustAJHTFGZ019
Angie W. Boggust, Kartik Audhkhasi, Dhiraj Joshi, David Harwath, Samuel Thomas, Rogério Schmidt Feris, Danny Gutfreund, Yang Zhang, Antonio Torralba, Michael Picheny, James R. Glass:
Grounding Spoken Words in Unlabeled Video. CVPR Workshops 2019: 29-32
[c63]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/SariTHP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/SariTHP19
Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson, Michael Picheny:
Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News. ICASSP 2019: 6286-6290
[c62]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasSHKTSKPDK19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasSHKTSKPDK19
Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltán Tüske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko:
English Broadcast News Speech Recognition by Humans and Machines. ICASSP 2019: 6455-6459
[c61]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/SuzukiINKT19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/SuzukiINKT19
Masayuki Suzuki, Nobuyasu Itoh, Tohru Nagano, Gakuto Kurata, Samuel Thomas:
Improvements to N-gram Language Model Using Text Generated from Neural Language Model. ICASSP 2019: 7245-7249
[c60]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SariTH19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SariTH19
Leda Sari, Samuel Thomas, Mark A. Hasegawa-Johnson:
Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks. INTERSPEECH 2019: 769-773
[c59]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasATHP19
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasATHP19
Samuel Thomas, Kartik Audhkhasi, Zoltán Tüske, Yinghui Huang, Michael Picheny:
Detection and Recovery of OOVs for Improved English Broadcast News Captioning. INTERSPEECH 2019: 2973-2977
[i7]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1904-13258
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1904-13258
Samuel Thomas, Masayuki Suzuki, Yinghui Huang, Gakuto Kurata, Zoltán Tüske, George Saon, Brian Kingsbury, Michael Picheny, Tom Dibert, Alice Kaiser-Schatzlein, Bern Samko:
English Broadcast News Speech Recognition by Humans and Machines. CoRR abs/1904.13258 (2019)
2018
[c58]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/YangARTRH18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/YangARTRH18
Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson:
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition. ICASSP 2018: 5989-5993
[c57]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FukudaFRTRSK18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FukudaFRTRSK18
Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin, Gakuto Kurata:
Data Augmentation Improves Recognition of Foreign Accented Speech. INTERSPEECH 2018: 2409-2413
[c56]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuzukiNKT18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuzukiNKT18
Masayuki Suzuki, Tohru Nagano, Gakuto Kurata, Samuel Thomas:
Inference-Invariant Transformation of Batch Normalization for Domain Adaptation of Acoustic Models. INTERSPEECH 2018: 2893-2897
[c55]
- view
  - electronic edition @ lrec-conf.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/lrec/MirkinJLKTSKVS18
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/lrec/MirkinJLKTSKVS18
Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim:
A Recorded Debating Dataset. LREC 2018
[i6]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1802-02656
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1802-02656
Xuesong Yang, Kartik Audhkhasi, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Mark Hasegawa-Johnson:
Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition. CoRR abs/1802.02656 (2018)
[i5]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1811-01299
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1811-01299
Minh N. B. Nguyen, Samuel Thomas, Anne E. Gattiker, Sujatha Kashyap, Kush R. Varshney:
SimplerVoice: A Key Message & Visual Description Generator System for Illiteracy. CoRR abs/1811.01299 (2018)
[i4]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1812-00099
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1812-00099
Vidya Muthukumar, Tejaswini Pedapati, Nalini K. Ratha, Prasanna Sattigeri, Chai-Wah Wu, Brian Kingsbury, Abhishek Kumar, Samuel Thomas, Aleksandra Mojsilovic, Kush R. Varshney:
Understanding Unequal Gender Classification Accuracy from Face Images. CoRR abs/1812.00099 (2018)
2017
[c54]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/FukudaIKTTR17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/FukudaIKTTR17
Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran:
Effective joint training of denoising feature space transforms and Neural Network based acoustic models. ICASSP 2017: 5190-5194
[c53]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaonKSATDCRPLRH17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaonKSATDCRPLRH17
George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall:
English Conversational Telephone Speech Recognition by Humans and Machines. INTERSPEECH 2017: 132-136
[c52]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/FukudaSKTCR17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/FukudaSKTCR17
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, Bhuvana Ramabhadran:
Efficient Knowledge Distillation from an Ensemble of Teachers. INTERSPEECH 2017: 3697-3701
[i3]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/SaonKSATDCRPLRH17
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/SaonKSATDCRPLRH17
George Saon, Gakuto Kurata, Tom Sercu, Kartik Audhkhasi, Samuel Thomas, Dimitrios Dimitriadis, Xiaodong Cui, Bhuvana Ramabhadran, Michael Picheny, Lynn-Li Lim, Bergul Roomi, Phil Hall:
English Conversational Telephone Speech Recognition by Humans and Machines. CoRR abs/1703.02136 (2017)
[i2]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/abs-1709-06438
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/abs-1709-06438
Shachar Mirkin, Michal Jacovi, Tamar Lavee, Hong-Kwang Kuo, Samuel Thomas, Leslie Sager, Lili Kotlerman, Elad Venezian, Noam Slonim:
A Recorded Debating Dataset. CoRR abs/1709.06438 (2017)
2016
[c51]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/HawsDSTP16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/HawsDSTP16
David Haws, Dimitrios Dimitriadis, George Saon, Samuel Thomas, Michael Picheny:
On the importance of event detection for ASR. ICASSP 2016: 5705-5709
[c50]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/VazDTN16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/VazDTN16
Colin Vaz, Dimitrios Dimitriadis, Samuel Thomas, Shrikanth S. Narayanan:
CNMF-based acoustic features for noise-robust ASR. ICASSP 2016: 5735-5739
[c49]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SuzukiTTRS16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SuzukiTTRS16
Masayuki Suzuki, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran, George Saon:
Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings. INTERSPEECH 2016: 1588-1592
[c48]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/DimitriadisTG16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/DimitriadisTG16
Dimitrios Dimitriadis, Samuel Thomas, Sriram Ganapathy:
An Investigation on the Use of i-Vectors for Robust ASR. INTERSPEECH 2016: 3828-3832
[c47]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasACKR16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasACKR16
Samuel Thomas, Kartik Audhkhasi, Jia Cui, Brian Kingsbury, Bhuvana Ramabhadran:
Multilingual Data Selection for Low Resource Speech Recognition. INTERSPEECH 2016: 3853-3857
[i1]
- view
  - electronic edition @ arxiv.org (open access)
  - details & citations
- export record
  dblp key:
  - journals/corr/SerdyukABRTB16
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/corr/SerdyukABRTB16
Dmitriy Serdyuk, Kartik Audhkhasi, Philemon Brakel, Bhuvana Ramabhadran, Samuel Thomas, Yoshua Bengio:
Invariant Representations for Noisy Speech Recognition. CoRR abs/1612.01928 (2016)
2015
[c46]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasSSN15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasSSN15
Samuel Thomas, George Saon, Maarten Van Segbroeck, Shrikanth S. Narayanan:
Improvements to the IBM speech activity detection system for the DARPA RATS program. ICASSP 2015: 4500-4504
[c45]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GanapathyTDR15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GanapathyTDR15
Sriram Ganapathy, Samuel Thomas, Dimitrios Dimitriadis, Steven J. Rennie:
Investigating factor analysis features for deep neural networks in noisy speech recognition. INTERSPEECH 2015: 1898-1902
[c44]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasSKM15
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasSKM15
Samuel Thomas, George Saon, Hong-Kwang Jeff Kuo, Lidia Mangu:
The IBM BOLT speech transcription system. INTERSPEECH 2015: 3150-3153
2014
[c43]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasGSS14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasGSS14
Samuel Thomas, Sriram Ganapathy, George Saon, Hagen Soltau:
Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions. ICASSP 2014: 2519-2523
[c42]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GanapathyHTOSN14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GanapathyHTOSN14
Sriram Ganapathy, Kyu Jeong Han, Samuel Thomas, Mohamed Kamal Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan:
Robust language identification using convolutional neural network features. INTERSPEECH 2014: 1846-1850
[c41]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/RennieGT14
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/RennieGT14
Steven J. Rennie, Vaibhava Goel, Samuel Thomas:
Deep Order Statistic Networks. SLT 2014: 124-128
[c40]
- view
  authority control:
- export record
  dblp key:
  - conf/slt/RennieGT14a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/slt/RennieGT14a
Steven J. Rennie, Vaibhava Goel, Samuel Thomas:
Annealed dropout training of deep networks. SLT 2014: 159-164
2013
[c39]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasSCH13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasSCH13
Samuel Thomas, Michael L. Seltzer, Kenneth Church, Hynek Hermansky:
Deep neural network features and semi-supervised training for low resource speech recognition. ICASSP 2013: 6704-6708
[c38]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PlchotMMDMCGHMMSSTTZZ13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PlchotMMDMCGHMMSSTTZZ13
Oldrich Plchot, Spyros Matsoukas, Pavel Matejka, Najim Dehak, Jeff Z. Ma, Sandro Cumani, Ondrej Glembek, Hynek Hermansky, Sri Harish Reddy Mallidi, Nima Mesgarani, Richard M. Schwartz, Mehdi Soufifar, Zheng-Hua Tan, Samuel Thomas, Bing Zhang, Xinhui Zhou:
Developing a speaker identification system for the DARPA RATS project. ICASSP 2013: 6768-6772
[c37]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/JansenTH13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/JansenTH13
Aren Jansen, Samuel Thomas, Hynek Hermansky:
Weak top-down constraints for unsupervised acoustic model training. ICASSP 2013: 8091-8095
[c36]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/JansenDGJKCFHMRSCMVBBCDFHLLNPRST13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/JansenDGJKCFHMRSCMVBBCDFHLLNPRST13
Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard C. Rose, Mike Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Börschinger, Justin T. Chiu, Ewan Dunbar, Abdellah Fourtassi, David Harwath, Chia-ying Lee, Keith D. Levin, Atta Norouzian, Vijayaditya Peddinti, Rachael Richardson, Thomas Schatz, Samuel Thomas:
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition. ICASSP 2013: 8111-8115
[c35]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SaonTSGK13
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SaonTSGK13
George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury:
The IBM speech activity detection system for the DARPA RATS program. INTERSPEECH 2013: 3497-3501
2012
[c34]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/Garcia-RomeroZZSLGTNSMMJRMEHSD12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/Garcia-RomeroZZSLGTNSMMJRMEHSD12
Daniel Garcia-Romero, Xinhui Zhou, Dmitry N. Zotkin, Balaji Vasan Srinivasan, Yuancheng Luo, Sriram Ganapathy, Samuel Thomas, Sridhar Krishna Nemala, Garimella S. V. S. Sivaram, Majid Mirbagheri, Sri Harish Reddy Mallidi, Thomas Janu, Padmanabhan Rajan, Nima Mesgarani, Mounya Elhilali, Hynek Hermansky, Shihab A. Shamma, Ramani Duraiswami:
The UMD-JHU 2011 speaker recognition system. ICASSP 2012: 4229-4232
[c33]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasGH12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasGH12
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Multilingual MLP features for low-resource LVCSR systems. ICASSP 2012: 4269-4272
[c32]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasGJH12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasGJH12
Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky:
Data-driven Posterior Features for Low Resource Speech Recognition Applications. INTERSPEECH 2012: 791-794
[c31]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/JansenTH12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/JansenTH12
Aren Jansen, Samuel Thomas, Hynek Hermansky:
Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition. INTERSPEECH 2012: 879-882
[c30]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasMJHMZSNZNM12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasMJHMZSNZNM12
Samuel Thomas, Sri Harish Reddy Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab A. Shamma, Tim Ng, Bing Zhang, Long Nguyen, Spyros Matsoukas:
Acoustic and Data-driven Features for Robust Speech Activity Detection. INTERSPEECH 2012: 1985-1988
[c29]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/NorouzianJRT12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/NorouzianJRT12
Atta Norouzian, Aren Jansen, Richard C. Rose, Samuel Thomas:
Exploiting Discriminative Point Process Models for Spoken Term Detection. INTERSPEECH 2012: 2442-2445
[c28]
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/odyssey/ThomasMGH12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/odyssey/ThomasMGH12
Samuel Thomas, Sri Harish Reddy Mallidi, Sriram Ganapathy, Hynek Hermansky:
Adaptation transforms of auto-associative neural networks as features for speaker verification. Odyssey 2012: 98-104
[c27]
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/odyssey/GanapathyTH12
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/odyssey/GanapathyTH12
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Feature extraction using 2-d autoregressive models for speaker recognition. Odyssey 2012: 229-235
2011
[j2]
- view
  authority control:
- export record
  dblp key:
  - journals/csl/PoveyBAAKGGGKRRST11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/csl/PoveyBAAKGGGKRRST11
Daniel Povey, Lukás Burget, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K. Goel, Martin Karafiát, Ariya Rastrow, Richard C. Rose, Petr Schwarz, Samuel Thomas:
The subspace Gaussian mixture model - A structured model for speech recognition. Comput. Speech Lang. 25(2): 404-439 (2011)
[c26]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasNZH11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasNZH11
Samuel Thomas, Patrick Nguyen, Geoffrey Zweig, Hynek Hermansky:
MLP based phoneme detectors for Automatic Speech Recognition. ICASSP 2011: 5024-5027
[c25]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ZweigNCDACSWSHKJTSBK11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ZweigNCDACSWSHKJTSBK11
Geoffrey Zweig, Patrick Nguyen, Dirk Van Compernolle, Kris Demuynck, Les E. Atlas, Pascal Clark, Gregory Sell, Meihong Wang, Fei Sha, Hynek Hermansky, Damianos G. Karakos, Aren Jansen, Samuel Thomas, Sivaram G. S. V. S., Samuel R. Bowman, Justine T. Kao:
Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop. ICASSP 2011: 5044-5047
[c24]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/CarlinTJH11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/CarlinTJH11
Michael A. Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky:
Rapid Evaluation of Speech Representations for Spoken Term Discovery. INTERSPEECH 2011: 821-824
[c23]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MesgaraniTH11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MesgaraniTH11
Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
Adaptive Stream Fusion in Multistream Recognition of Speech. INTERSPEECH 2011: 2329-2332
[c22]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/SivaramTH11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/SivaramTH11
Garimella S. V. S. Sivaram, Samuel Thomas, Hynek Hermansky:
Mixture of Auto-Associative Neural Networks for Speaker Verification. INTERSPEECH 2011: 2381-2384
[c21]
- view
  - electronic edition @ isca-archive.org (open access)
  - details & citations
- export record
  dblp key:
  - conf/mlslp/HermanskyMT11
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/mlslp/HermanskyMT11
Hynek Hermansky, Nima Mesgarani, Samuel Thomas:
Performance monitoring for robustness in automatic recognition of speechi. MLSLP 2011: 31-34
2010
[c20]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/GanapathyTH10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/GanapathyTH10
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Robust spectro-temporal features based on autoregressive models of Hilbert envelopes. ICASSP 2010: 4286-4289
[c19]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/GhoshalPAABFGGKRRST10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/GhoshalPAABFGGKRRST10
Arnab Ghoshal, Daniel Povey, Mohit Agarwal, Pinar Akyazi, Lukás Burget, Kai Feng, Ondrej Glembek, Nagendra Goel, Martin Karafiát, Ariya Rastrow, Richard C. Rose, Petr Schwarz, Samuel Thomas:
A novel estimation of feature-space MLLR for full-covariance models. ICASSP 2010: 4310-4313
[c18]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/PoveyBAAFGGGKRRST10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/PoveyBAAFGGGKRRST10
Daniel Povey, Lukás Burget, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K. Goel, Martin Karafiát, Ariya Rastrow, Richard C. Rose, Petr Schwarz, Samuel Thomas:
Subspace Gaussian Mixture Models for speech recognition. ICASSP 2010: 4330-4333
[c17]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/BurgetSAAFGGGKPRRT10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/BurgetSAAFGGGKPRRT10
Lukás Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K. Goel, Martin Karafiát, Daniel Povey, Ariya Rastrow, Richard C. Rose, Samuel Thomas:
Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models. ICASSP 2010: 4334-4337
[c16]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/GanapathyTH10a
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/GanapathyTH10a
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Comparison of modulation features for phoneme recognition. ICASSP 2010: 5038-5041
[c15]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/GoelTAABFGGKPRRS10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/GoelTAABFGGKPRRS10
Nagendra Goel, Samuel Thomas, Mohit Agarwal, Pinar Akyazi, Lukás Burget, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Martin Karafiát, Daniel Povey, Ariya Rastrow, Richard C. Rose, Petr Schwarz:
Approaches to automatic lexicon learning with limited training examples. ICASSP 2010: 5094-5097
[c14]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/MesgaraniTH10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/MesgaraniTH10
Nima Mesgarani, Samuel Thomas, Hynek Hermansky:
A multistream multiresolution framework for phoneme recognition. INTERSPEECH 2010: 318-321
[c13]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasGH10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasGH10
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Cross-lingual and multi-stream posterior features for low resource LVCSR systems. INTERSPEECH 2010: 877-880
[c12]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasPGMH10
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasPGMH10
Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky:
A phoneme recognition framework based on auditory spectro-temporal receptive fields. INTERSPEECH 2010: 2458-2461

2000 – 2009

see FAQ

What is the meaning of the colors in the publication lists?

2009
[c11]
- view
  authority control:
- export record
  dblp key:
  - conf/asru/GanapathyTH09
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/asru/GanapathyTH09
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Temporal envelope subtraction for robust speech recognition using modulation spectrum. ASRU 2009: 164-169
[c10]
- view
  authority control:
- export record
  dblp key:
  - conf/icassp/ThomasGH09
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/icassp/ThomasGH09
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Phoneme recognition using spectral envelope and modulation frequency features. ICASSP 2009: 4453-4456
[c9]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GanapathyTH09
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GanapathyTH09
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Static and dynamic modulation spectrum for speech recognition. INTERSPEECH 2009: 2823-2826
[c8]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasGH09
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasGH09
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Tandem representations of spectral envelope and modulation frequency features for ASR. INTERSPEECH 2009: 2955-2958
[c7]
- view
  authority control:
- export record
  dblp key:
  - conf/waspaa/GanapathyTMH09
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/waspaa/GanapathyTMH09
Sriram Ganapathy, Samuel Thomas, Petr Motlícek, Hynek Hermansky:
Applications of signal analysis using autoregressive models for amplitude modulation. WASPAA 2009: 341-344
2008
[j1]
- view
  authority control:
- export record
  dblp key:
  - journals/spl/ThomasGH08
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/journals/spl/ThomasGH08
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Recognition of Reverberant Speech Using Frequency Domain Linear Prediction. IEEE Signal Process. Lett. 15: 681-684 (2008)
[c6]
- view
  - electronic edition @ ieee.org
  - details & citations
- export record
  dblp key:
  - conf/eusipco/ThomasGH08
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/eusipco/ThomasGH08
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Spectro-temporal features for Automatic Speech Recognition using Linear Prediction in spectral domain. EUSIPCO 2008: 1-4
[c5]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/GanapathyTH08
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/GanapathyTH08
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky:
Front-end for far-field speech recognition based on frequency domain linear prediction. INTERSPEECH 2008: 984-987
[c4]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasGH08
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasGH08
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech. INTERSPEECH 2008: 1521-1524
[c3]
- view
  authority control:
- export record
  dblp key:
  - conf/mlmi/ThomasGH08
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/mlmi/ThomasGH08
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky:
Hilbert Envelope Based Features for Far-Field Speech Recognition. MLMI 2008: 119-124
2007
[c2]
- view
  - electronic edition via DOI (open access)
  - details & citations
  authority control:
- export record
  dblp key:
  - conf/interspeech/ThomasV07
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/interspeech/ThomasV07
Samuel Thomas, Ashish Verma:
Language identification of person names using CF-IOF based weighing function. INTERSPEECH 2007: 1769-1772
2006
[c1]
- view
  - electronic edition @ ieee.org
  - details & citations
- export record
  dblp key:
  - conf/eusipco/ThomasRMR06
- ask others
- share record
  persistent URL:
  - https://dblp.org/rec/conf/eusipco/ThomasRMR06
Samuel Thomas, M. Nageshwara Rao, Hema A. Murthy, Coimbatore S. Ramalingam:
Natural sounding TTS based on syllable-like units. EUSIPCO 2006: 1-5

Coauthor Index

see FAQ

manage site settings

To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.