Abstract
The development of deep learning technology makes speech retrieval and recognition more accurate and efficient. Meanwhile, the privacy leakage problem of speech data is becoming increasingly prominent, but the emergence of fully homomorphic encryption (FHE) technology can alleviate the concerns about privacy information. In order to protect the privacy of speech data and deep binary hash codes, and realize the privacy-preserving similarity calculation, a secure speech retrieval method using deep hashing and CKKS (Cheon-Kim-Kim-Song) FHE was proposed. Firstly, a speech CKKS FHE scheme is designed to encrypt the original speech data. Then, the spectrogram image features of the original speech data are extracted as the input of triplet convolutional neural network (Tri-CNN) to generate efficient and compact deep binary hash codes, which are encrypted and uploaded to the cloud together with the encrypted speech data. When retrieving, the deep binary hash codes of the querying speech is extracted, encrypted and sent to the cloud server as a search trapdoor, and the security similarity is calculated with the index sequence in the secure index table. The experimental results show that the mean average precision of the proposed method in the TIMIT and THCHS-30 data sets is more than 93%, with a loss of about 2% compared with the plaintext domain, but with higher security.
Similar content being viewed by others
Data availability
Previously reported speech data (THCHS-30 and TIMIT data sets) were used to support this study and are available at 10.48550/arXiv.1512.01882 and 10.1016/0167–6393(90)90,010–7. This is cited at relevant places within the text as reference [25, 26].
References
Li Y, Ma J, Miao Y et al (2022) Similarity search for encrypted images in secure cloud computing[J]. IEEE Transactions on Cloud Computing 10(2):1142–1155. https://doi.org/10.1109/TCC.2020.2989923
Singh N, Kumar J, Singh A K, et al. Privacy-preserving multi-keyword hybrid search over encrypted data in cloud[J]. Journal of Ambient Intelligence and Humanized Computing, 2022: 1–14. https://doi.org/10.1007/s12652-022-03889-8
Rahulamathavan Y. Privacy-preserving Similarity Calculation of Speaker Features Using Fully Homomorphic Encryption[J]. arXiv preprint arXiv:2202.07994 , 2022. https://doi.org/10.48550/arXiv.2202.07994
Shen M, Cheng G, Zhu L et al (2020) Content-based multi-source encrypted image retrieval in clouds with privacy preservation[J]. Futur Gener Comput Syst 109:621–632. https://doi.org/10.1016/j.future.2018.04.089
Duan Y, Li Y, Lu L et al (2022) A faster outsourced medical image retrieval scheme with privacy preservation[J]. J Syst Archit 122:102356. https://doi.org/10.1016/j.sysarc.2021.102356
Wang Q, Feng C, Xu Y et al (2020) A novel privacy-preserving speech recognition framework using bidirectional LSTM[J]. Journal of Cloud Computing 9(1):1–13. https://doi.org/10.1186/s13677-020-00186-7
Shi C, Wang H, Hu Y et al (2021) A novel NMF-based authentication scheme for encrypted speech in cloud computing[J]. Multimedia Tools and Applications 80(17):25773–25798. https://doi.org/10.1007/s11042-021-10896-y
Wang Y, Huang Y, Zhang R et al (2021) Multi-format speech BioHashing based on energy to zero ratio and improved LP-MMSE parameter fusion[J]. Multimedia Tools and Applications 80(7):10013–10036. https://doi.org/10.1007/s11042-020-09701-z
Zhang S X, Gong Y, Yu D. Encrypted Speech Recognition Using Deep Polynomial Networks[C]. ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).IEEE, Brighton, 2019:5691–5695. https://doi.org/10.1109/ICASSP.2019.8683721
Yu X, Xu C, Dou B et al (2021) Multi-user search on the encrypted multimedia database: lattice-based searchable encryption scheme with time-controlled proxy re-encryption[J]. Multimedia Tools and Applications 80(2):3193–3211. https://doi.org/10.1007/s11042-020-09753-1
Cao R, Zhang Q, Zhu J et al (2020) Enhancing remote sensing image retrieval using a triplet deep metric learning network[J]. Int J Remote Sens 41(2):740–751. https://doi.org/10.1080/2150704X.2019.1647368
Li M, An Z, Wei Q et al (2019) Triplet Deep Hashing with Joint Supervised Loss Based on Deep Neural Networks[J]. Comput Intell Neurosci 2019:1–17. https://doi.org/10.1155/2019/8490364
Jia Y, Chen X, Yu J et al (2021) Speaker recognition based on characteristic spectrograms and an improved self-organizing feature map neural network[J]. Complex & Intelligent Systems 7(4):1749–1757. https://doi.org/10.1007/s40747-020-00172-1
Purwins H, Li B, Virtanen T et al (2019) Deep Learning for Audio Signal Processing[J]. IEEE J Selected Top Signal Process 13(2):206–219. https://doi.org/10.1109/JSTSP.2019.2908700
Cheon J H, Kim A, Kim M, et al. Homomorphic encryption for arithmetic of approximate numbers[C]//International conference on the theory and application of cryptology and information security. Springer, Cham, 2017: 409–437. https://doi.org/10.1007/978-3-319-70694-8_15
Chen C, Jiang D, Peng J et al (2021) Scalable Identity-Oriented Speech Retrieval[J]. IEEE Trans Knowl Data Eng 14(8):1–6. https://doi.org/10.1109/TKDE.2021.3127520
Zhang Q, Li Y, Hu Y (2021) A retrieval algorithm for encrypted speech based on convolutional neural network and deep hashing[J]. Multimedia Tools and Applications 80(1):1201–1221. https://doi.org/10.1007/s11042-020-09748-y
Zhang H (2021) Voice keyword retrieval method using attention mechanism and multimodal information fusion[J]. Sci Program 2021(8):1–11. https://doi.org/10.1155/2021/6662841
Yuan Y, Xie L, Leung CC et al (2020) Fast query-by-example speech search using attention-based deep binary embeddings[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1988–2000. https://doi.org/10.1109/TASLP.2020.2998277
Zhang Q, Fu M, Huang Y et al (2022) Encrypted Speech Retrieval Scheme Based on Multiuser Searchable Encryption in Cloud Storage[J]. Security and Communication Networks 2022:9045259. https://doi.org/10.1155/2022/9045259
Li W, Chen Y, Hu H et al (2020) Using granule to search privacy preserving voice in home IoT systems[J]. IEEE Access 8:31957–31969. https://doi.org/10.1109/ACCESS.2020.2972975
Li W, Xiao Y, Tang C et al (2020) Multi-user searchable encryption voice in home IoT system[J]. Internet of Things 11:100180. https://doi.org/10.1016/j.iot.2020.100180
Chen J, Chen Z, Zheng P, et al. Encrypted domain mel-frequency cepstral coefficient and fragile audiowatermarking[C]. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). IEEE, New York, 2018: 68-73. https://doi.org/10.1109/TrustCom/BigDataSE.2018.00021
Thaine P, Penn G (2019) Extracting Mel-Frequency and Bark-Frequency Cepstral Coefficients from Encrypted Signals[C]. INTERSPEECH, Graz 3715–3719. https://doi.org/10.21437/Interspeech.2019-1136
Tang Y, Zhu B, Ma X, et al (2019) Decoding homomorphically encrypted FLAC audio without decryption[C]//ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE 675–679. https://doi.org/10.1109/ICASSP.2019.8682780
Meftah S, Tan BHM, Mun CF et al (2021) Doren: toward efficient deep convolutional neural networks with fully homomorphic encryption[J]. IEEE Trans Inf Forensics Secur 16:3740–3752. https://doi.org/10.1109/TIFS.2021.3090959
Natarajan D, Dalskov A, Kales D, et al (2021) PRIORIS: Enabling Secure Detection of Suicidal Ideation from Speech Using Homomorphic Encryption[M]//Protecting Privacy through Homomorphic Encryption. Springer, Cham 133–146. https://doi.org/10.1007/978-3-030-77287-1_10
Liu J, Wang C, Tu Z et al (2021) Secure KNN classification scheme based on homomorphic encryption for cyberspace[J]. Secur Commun Netw 2021:8759922. https://doi.org/10.1155/2021/8759922
Wang D, Zhang X (2015) Thchs-30: A free chinese speech corpus[J]. arXiv preprint arXiv: 1512.01882. https://doi.org/10.48550/arXiv.1512.01882
Zue V, Seneff S, Glass J (1990) Speech database development at MIT: TIMIT and beyond[J]. Speech Commun 9(4):351–356. https://doi.org/10.1016/0167-6393(90)90010-7
Ullah B, Kamran M, Rui Y (2022) Predictive modeling of short-term rockburst for the stability of subsurface structures using machine learning approaches: T-SNE, K-Means clustering and XGBoost[J]. Mathematics 10(3):449. https://doi.org/10.3390/math10030449
An L, Huang Y, Zhang Q (2022) Verifiable speech retrieval algorithm based on KNN secure hashing[J]. Multimedia Tools and Applications 1–22. https://doi.org/10.1007/s11042-022-13387-w
Zhang Q, Zhao X, Zhang Q et al (2022) Content-based encrypted speech retrieval scheme with deep hashing[J]. Multimed Tools Appl 81(7):10221–10242. https://doi.org/10.1007/s11042-022-12123-8
Huang Y, Wang Y, Li H et al (2022) Encrypted speech retrieval based on long sequence Biohashing[J]. Multimed Tools Appl 81(9):13065–13085. https://doi.org/10.1007/s11042-022-12371-8
Khoirom MS, Laiphrakpam DS, Tuithung T (2021) Audio encryption using ameliorated ElGamal public key encryption over finite field[J]. Wireless Pers Commun 117(2):809–823. https://doi.org/10.1007/s11277-020-07897-9
Shi C, Wang H, Hu Y et al (2019) A Speech Homomorphic Encryption Scheme with Less Data Expansion in Cloud Computing[J]. KSII Trans Internet Inf Syst (TIIS) 13(5):2588–2609. https://doi.org/10.3837/tiis.2019.05.020
Zhang QY, Jia YG (2022) A Speech Fully Homomorphic Encryption Scheme for DGHV Based on Multithreading in Cloud Storage [J]. Int J Netw Secur 24(6):1042–1055. https://doi.org/10.6633/IJNS.20221124(6).09
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61862041). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, Qy., Wen, Yw., Huang, Yb. et al. Secure speech retrieval method using deep hashing and CKKS fully homomorphic encryption. Multimed Tools Appl 83, 67469–67500 (2024). https://doi.org/10.1007/s11042-024-18113-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-024-18113-2