CD-HIT: accelerated for clustering the next-generation sequencing data
- PMID: 23060610
- PMCID: PMC3516142
- DOI: 10.1093/bioinformatics/bts565
CD-HIT: accelerated for clustering the next-generation sequencing data
Abstract
Summary: CD-HIT is a widely used program for clustering biological sequences to reduce sequence redundancy and improve the performance of other sequence analyses. In response to the rapid increase in the amount of sequencing data produced by the next-generation sequencing technologies, we have developed a new CD-HIT program accelerated with a novel parallelization strategy and some other techniques to allow efficient clustering of such datasets. Our tests demonstrated very good speedup derived from the parallelization for up to ∼24 cores and a quasi-linear speedup for up to ∼8 cores. The enhanced CD-HIT is capable of handling very large datasets in much shorter time than previous versions.
Availability: http://cd-hit.org.
Contact: liwz@sdsc.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
Figures
Similar articles
-
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.Bioinformatics. 2006 Jul 1;22(13):1658-9. doi: 10.1093/bioinformatics/btl158. Epub 2006 May 26. Bioinformatics. 2006. PMID: 16731699
-
CD-HIT Suite: a web server for clustering and comparing biological sequences.Bioinformatics. 2010 Mar 1;26(5):680-2. doi: 10.1093/bioinformatics/btq003. Epub 2010 Jan 6. Bioinformatics. 2010. PMID: 20053844 Free PMC article.
-
Search and clustering orders of magnitude faster than BLAST.Bioinformatics. 2010 Oct 1;26(19):2460-1. doi: 10.1093/bioinformatics/btq461. Epub 2010 Aug 12. Bioinformatics. 2010. PMID: 20709691
-
Parallelization of MAFFT for large-scale multiple sequence alignments.Bioinformatics. 2018 Jul 15;34(14):2490-2492. doi: 10.1093/bioinformatics/bty121. Bioinformatics. 2018. PMID: 29506019 Free PMC article.
-
MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6. Bioinformatics. 2016. PMID: 26743509
Cited by
-
Ileal microbial microbiome and its secondary bile acids modulate susceptibility to nonalcoholic steatohepatitis in dairy goats.Microbiome. 2024 Nov 23;12(1):247. doi: 10.1186/s40168-024-01964-0. Microbiome. 2024. PMID: 39578870 Free PMC article.
-
Gap-free telomere-to-telomere haplotype assembly of the tomato hind (Cephalopholis sonnerati).Sci Data. 2024 Nov 22;11(1):1268. doi: 10.1038/s41597-024-04093-3. Sci Data. 2024. PMID: 39578472 Free PMC article.
-
Multiomics of yaks reveals significant contribution of microbiome into host metabolism.NPJ Biofilms Microbiomes. 2024 Nov 21;10(1):133. doi: 10.1038/s41522-024-00609-2. NPJ Biofilms Microbiomes. 2024. PMID: 39572587 Free PMC article.
-
The functions and factors governing fungal communities and diversity in agricultural waters: insights into the ecosystem services aquatic mycobiota provide.Front Microbiol. 2024 Nov 5;15:1460330. doi: 10.3389/fmicb.2024.1460330. eCollection 2024. Front Microbiol. 2024. PMID: 39564490 Free PMC article.
-
Changes in the structure of the microbial community within the phycospheric microenvironment and potential biogeochemical effects induced in the demise stage of green tides caused by Ulva prolifera.Front Microbiol. 2024 Nov 5;15:1507660. doi: 10.3389/fmicb.2024.1507660. eCollection 2024. Front Microbiol. 2024. PMID: 39564489 Free PMC article.
References
-
- Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010;26:2460–2461. - PubMed
-
- Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–1659. - PubMed
-
- Li W, et al. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics. 2001;17:282–283. - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources