BLAST+: architecture and applications

doi:10.1186/1471-2105-10-421

. 2009 Dec 15:10:421.

doi: 10.1186/1471-2105-10-421.

BLAST+: architecture and applications

Christiam Camacho¹, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, Thomas L Madden

Affiliations

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA. camacho@ncbi.nlm.nih.gov

PMID: 20003500
PMCID: PMC2803857
DOI: 10.1186/1471-2105-10-421

BLAST+: architecture and applications

Christiam Camacho et al. BMC Bioinformatics. 2009.

. 2009 Dec 15:10:421.

doi: 10.1186/1471-2105-10-421.

Authors

Christiam Camacho¹, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, Thomas L Madden

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894, USA. camacho@ncbi.nlm.nih.gov

PMID: 20003500
PMCID: PMC2803857
DOI: 10.1186/1471-2105-10-421

Abstract

Background: Sequence similarity searching is a very important bioinformatics task. While Basic Local Alignment Search Tool (BLAST) outperforms exact methods through its use of heuristics, the speed of the current BLAST software is suboptimal for very long queries or database sequences. There are also some shortcomings in the user-interface of the current command-line applications.

Results: We describe features and improvements of rewritten BLAST software and introduce new command-line applications. Long query sequences are broken into chunks for processing, in some cases leading to dramatically shorter run times. For long database sequences, it is possible to retrieve only the relevant parts of the sequence, reducing CPU time and memory usage for searches of short queries against databases of contigs or chromosomes. The program can now retrieve masking information for database sequences from the BLAST databases. A new modular software library can now access subject sequence data from arbitrary data sources. We introduce several new features, including strategy files that allow a user to save and reuse their favorite set of options. The strategy files can be uploaded to and downloaded from the NCBI BLAST web site.

Conclusion: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences. We have also improved the user interface of the command-line applications.

PubMed Disclaimer

Figures

**Figure 1**
**Schematic of a BLAST search**. The first phase is "setup". The query is read, low-complexity or other filtering might be applied to the query, and a "lookup" table is built. The next phase is "scanning". Each subject sequence is scanned for words ("hits") matching those in the lookup table. These hits are further processed, extended by gap-free and gapped alignments, and scored. Significant "preliminary" matches are saved for further processing. The final phase in the BLAST algorithm, called the "trace-back", finds the locations of insertions and deletions for alignments saved in the scanning phase.

**Figure 2**
**Speedup of BLASTX searches for differently sized queries with and without query splitting**. Different sized pieces of [Genbank:NC_007113.2] were searched against a set of human proteins. The query length in kbases is on the x-axis, with a log scale. On the y-axis is the fractional speedup, which is defined as (T_baseline/T_blastx) - 1. Three searches were performed with both the baseline and the blastx applications (for each data point), and the lowest time for each application was used.

**Figure 3**
**L2 data cache misses for BLASTX searches with and without query splitting**. Cache misses were measured by Cachegrind [24] and only misses reading from the cache are shown. On the x-axis are different query lengths in kbases. The number of L2 cache misses is shown on the y-axis. The top line is for the baseline application without query splitting, the bottom line is for the blastx application. The queries are different sized pieces of [Genbank:NC_007113.2] searched against the set of human proteins used for Figure 2.

**Figure 4**
**Scatter plot of MEGABLAST search times with and without partial retrieval**. 163 human ESTs from UniGene cluster 235935 were searched against all human chromosomes [22]. On the x-axis are times for the baseline application; on the y-axis are times for the new blastn application. Sequences with the best improvement are those furthest to the right, and they also matched the largest number of subject sequences. A word size of 24 was used for the runs as well as database masking with RepeatMasker. Three searches were done with both the baseline and blastn application for each data point, and the lowest time for each application was used.

See this image and copyright information in PMC

Cited by

Deciphering the anthocyanin metabolism gene network in tea plant (Camellia sinensis) through structural equation modeling.
Xia P, Chen M, Chen L, Yang Y, Ma L, Bi P, Tang S, Luo Q, Chen J, Chen H, Zhang H. Xia P, et al. BMC Genomics. 2024 Nov 15;25(1):1093. doi: 10.1186/s12864-024-11012-8. BMC Genomics. 2024. PMID: 39548396
Chromosome-level genome assembly of Indo-Pacific king mackerel (Scomberomorus guttatus).
Gao Y, Liu S, Gutang Q, Li C, Lin X, Liang B, Li P, Lin J, Liu W. Gao Y, et al. Sci Data. 2024 Nov 13;11(1):1224. doi: 10.1038/s41597-024-04110-5. Sci Data. 2024. PMID: 39537638 Free PMC article.
Virulence perspective genomic research unlocks the secrets of Rhizoctonia solani associated with banded sheath blight in Barnyard Millet (Echinochloa frumentacea).
Patro TSSK, Palanna KB, Jeevan B, Tatineni P, Poonacha TT, Khan F, Ramesh GV, Nayak AM, Praveen B, Divya M, Anuradha N, Rani YS, Nagaraja TE, Madhusudhana R, Satyavathi CT, Prasanna SK. Patro TSSK, et al. Front Plant Sci. 2024 Oct 28;15:1457912. doi: 10.3389/fpls.2024.1457912. eCollection 2024. Front Plant Sci. 2024. PMID: 39529934 Free PMC article.
Whole genome sequencing and de novo genome assembly of the Kazakh native horse Zhabe.
Assanbayev T, Akilzhanov R, Sharapatov T, Bektayev R, Samatkyzy D, Karabayev D, Gabdulkayum A, Daniyarov A, Rakhimova S, Kozhamkulov U, Sarbassov D, Akilzhanova A, Kairov U. Assanbayev T, et al. Front Genet. 2024 Oct 21;15:1466382. doi: 10.3389/fgene.2024.1466382. eCollection 2024. Front Genet. 2024. PMID: 39529846 Free PMC article. No abstract available.
Genomics sequence data of a drug-resistant Pseudomonas aeruginosa producing Tripoli Metallo-β-lactamase 1 isolated from Sudan.
Mohammed SE, Hamid O, Abdelrahim M, Ismail A, Smith AM, Allam M. Mohammed SE, et al. Data Brief. 2024 Oct 16;57:111040. doi: 10.1016/j.dib.2024.111040. eCollection 2024 Dec. Data Brief. 2024. PMID: 39525649 Free PMC article.

See all "Cited by" articles

References

1. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. - PubMed
1. Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
1. NCBI C toolkit. http://www.ncbi.nlm.nih.gov/IEB/ToolBox/SDKDOCS/INDEX.HTML
1. Zhang Z, Schäffer A, Miller W, Madden T, Lipman D, Koonin E, Altschul S. Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 1998;26(17):3986–3990. doi: 10.1093/nar/26.17.3986. - DOI - PMC - PubMed
1. Schäffer A, Wolf Y, Ponting C, Koonin E, Aravind L, Altschul S. IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics. 1999;15(12):1000–1011. doi: 10.1093/bioinformatics/15.12.1000. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. - PubMed

[2] Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–410. - PubMed

[3] Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed

[4] Altschul S, Madden T, Schäffer A, Zhang J, Zhang Z, Miller W, Lipman D. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25(17):3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed

[5] NCBI C toolkit. http://www.ncbi.nlm.nih.gov/IEB/ToolBox/SDKDOCS/INDEX.HTML

[6] NCBI C toolkit. http://www.ncbi.nlm.nih.gov/IEB/ToolBox/SDKDOCS/INDEX.HTML

[7] Zhang Z, Schäffer A, Miller W, Madden T, Lipman D, Koonin E, Altschul S. Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 1998;26(17):3986–3990. doi: 10.1093/nar/26.17.3986. - DOI - PMC - PubMed

[8] Zhang Z, Schäffer A, Miller W, Madden T, Lipman D, Koonin E, Altschul S. Protein sequence similarity searches using patterns as seeds. Nucleic Acids Res. 1998;26(17):3986–3990. doi: 10.1093/nar/26.17.3986. - DOI - PMC - PubMed

[9] Schäffer A, Wolf Y, Ponting C, Koonin E, Aravind L, Altschul S. IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics. 1999;15(12):1000–1011. doi: 10.1093/bioinformatics/15.12.1000. - DOI - PubMed

[10] Schäffer A, Wolf Y, Ponting C, Koonin E, Aravind L, Altschul S. IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics. 1999;15(12):1000–1011. doi: 10.1093/bioinformatics/15.12.1000. - DOI - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BLAST+: architecture and applications

Affiliation

BLAST+: architecture and applications

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials