Abstract
The significance of viruses in various ecosystems necessitates a comprehensive examination of viral genomes across diverse datasets, underscoring the growing importance of automated and reliable identification of viruses in metagenomic data. However, the complex composition and large data volumes pose a crucial challenge to taxonomic analysis, particularly in deep hierarchies. In this paper, we propose a new hierarchical classification model, FOKHic, which allows hierarchical classification of viruses from kingdom to family. By combining \(k\)-mer frequency-based coding, principal component analysis, and attention fusion, FOKHic is capable of accomplishing rapid classification of metagenomic viral sequences. Benchmarked on the ICTV virus database, FOKHic exhibits overwhelming performance in terms of accuracy, recall, precision, and F1-score compared to currently popular virus-oriented tools. FOKHic is available at https://github.com/xiaozhangzhang123/FOKHic.
This work is supported by the National Natural Science Foundation of China under No. 61672325, No.61472222, and No.61732009.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Guo, J., Bolduc, B., Zayed, A., Varsani, A., Dominguez-Huerta, G., Delmont, T.O.: Virsorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA Riruses. Microbiome 9(1), 1–13 (2021)
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215(3), 403–410 (1990)
Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12(1), 59–60 (2015)
Langmead, B.: Aligning short sequencing reads with Bowtie. Curr. Protoc. Bioinformatics 32(supp.), 11.7 (2010)
Li, H., Durbin, R.: Fast and accurate short read alignment with burrows-wheeler transform. Bioinformatics 25(14), 1754–1760 (2009)
Roux, S., et al.: Metavir: a web server dedicated to Virome analysis. Bioinformatics 27(21), 3074–3075 (2011)
Zhao, G., et al.: VirusSeeker: a computational pipeline for virus discovery and Virome composition analysis. Virology 503, 21–30 (2017)
Wommack, K.E., et al.: Virome: a standard operating procedure for analysis of viral metagenome sequences. Stand. Genomic Sci. 6(3), 421–433 (2012)
Kim, D., Song, L., Breitwieser, F.P., Salzberg, S.L.: Centrifuge: rapid and sensitive classification of Metagenomic sequences. Genome Res. 26(12), 1721–1729 (2016)
Ahlgren, N.A., Jie, R., Young, L.Y., Fuhrman, J.A., Sun, F.: Alignment-Free d2∗ oligonucleotide frequency dissimilarity measure improves prediction of hosts from Metagenomically-derived viral sequences. Nucleic Acids Res. 45(1), 39–53 (2017)
Wood, D.E., Lu, J., Langmead, B.: Improved Metagenomic analysis with Kraken2. Genome Biol. 20, 257 (2019)
Mistry, J., Finn, R.D., Eddy, S.R., Bateman, A., Punta M.: Challenges in homology search: HMMER3 and convergent evolution of coiledcoil regions. Nucleic Acids Res. 41(12), e121 (2013)
Rosen, G., Garbarine, E., Caseiro, D., Polikar, R., Sokhansanj, B.: Metagenome fragment classification using k-mer frequency profiles. Adv. Bioinform. 2008, 1–12 (2008)
Ren, J., Ahlgren, N.A., Lu, Y., Fuhrman, J.A., Sun, F.: VirFinder: a novel k-mer based tool for identifying viral sequences from assembled Metagenomic data. BioMed Central. 5(1), 1–20 (2017)
Ren, J., et al.: Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020)
Zhang, Y., Li, C., Feng, H., Zhu, D.: DLmeta: a deep learning method for metagenomic identification. In: BIBM2022, 303–308 (2022)
Tampuu, A., Bzhalava, Z., Dillner, J., Vicente, R.: ViraMiner: deep learning on raw DNA sequences for identifying viral genomes in human samples. PLoS ONE 14(9), 1–17 (2019)
Fiannaca, A., et al.: Deep learning models for bacteria taxonomic classification of metagenomic data. BMC Bioinformatics 19, 61–76 (2018)
Shang, J., Sun, Y.: CHEER: hierarchical taxonomic classification for viral metagenomic data via deep learning. Methods 189, 95–103 (2021)
Delcher, A.L., Bratke, K.A., Powers, E.C., Salzberg, S.L.: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23(6), 673–679 (2007)
Brady, A., Salzberg, S.L.: Phymm and phymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)
Patil, K.R., Roune, L., McHardy A.C.: The phylopythias web server for taxonomic assignment of metagenome sequences. PloS One 7(6), e38581 (2012)
Ounit, R., Wanamaker, S., Close, T.J., Lonardi, S.: CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics 16(1), 1–13 (2015)
Ainsworth, D., Sternberg, M.J., Raczy, C., Butcher, S.A.: k-SLAM: accurate and ultra-fast taxonomic classification and gene identification for large metagenomic data sets. Nucleic Acids Res. 45(4), 1649–1656 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Y., Zhou, Y., Feng, H., Zhu, D. (2024). FOKHic: A Framework of \({\varvec{k}}\)-mer Based Hierarchical Classification. In: Huang, DS., Si, Z., Chen, W. (eds) Advanced Intelligent Computing Technology and Applications. ICIC 2024. Lecture Notes in Computer Science(), vol 14880. Springer, Singapore. https://doi.org/10.1007/978-981-97-5678-0_8
Download citation
DOI: https://doi.org/10.1007/978-981-97-5678-0_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5677-3
Online ISBN: 978-981-97-5678-0
eBook Packages: Computer ScienceComputer Science (R0)