Improved tools for biological sequence comparison
- PMID: 3162770
- PMCID: PMC280013
- DOI: 10.1073/pnas.85.8.2444
Improved tools for biological sequence comparison
Abstract
We have developed three computer programs for comparisons of protein and DNA sequences. They can be used to search sequence data bases, evaluate similarity scores, and identify periodic structures based on local sequence similarity. The FASTA program is a more sensitive derivative of the FASTP program, which can be used to search protein or DNA sequence data bases and can compare a protein sequence to a DNA sequence data base by translating the DNA data base as it is searched. FASTA includes an additional step in the calculation of the initial pairwise similarity score that allows multiple regions of similarity to be joined to increase the score of related sequences. The RDF2 program can be used to evaluate the significance of similarity scores using a shuffling method that preserves local sequence composition. The LFASTA program can display all the regions of local similarity between two sequences with scores greater than a threshold, using the same scoring parameters and a similar alignment algorithm; these local similarities can be displayed as a "graphic matrix" plot or as individual alignments. In addition, these programs have been generalized to allow comparison of DNA or protein sequences based on a variety of alternative scoring matrices.
Similar articles
-
Rapid and sensitive sequence comparison with FASTP and FASTA.Methods Enzymol. 1990;183:63-98. doi: 10.1016/0076-6879(90)83007-v. Methods Enzymol. 1990. PMID: 2156132
-
BLAST and FASTA similarity searching for multiple sequence alignment.Methods Mol Biol. 2014;1079:75-101. doi: 10.1007/978-1-62703-646-7_5. Methods Mol Biol. 2014. PMID: 24170396
-
Profile analysis: detection of distantly related proteins.Proc Natl Acad Sci U S A. 1987 Jul;84(13):4355-8. doi: 10.1073/pnas.84.13.4355. Proc Natl Acad Sci U S A. 1987. PMID: 3474607 Free PMC article.
-
An approach to searching protein sequences for superfamily relationships or chance similarities relevant to the molecular mimicry hypothesis: application to the basic proteins of myelin.J Neurochem. 1988 Oct;51(4):1267-73. doi: 10.1111/j.1471-4159.1988.tb03096.x. J Neurochem. 1988. PMID: 2458435 Review.
-
Numerical characterization and similarity analysis of DNA sequences based on 2-D graphical representation of the characteristic sequences.Comb Chem High Throughput Screen. 2003 Dec;6(8):795-9. doi: 10.2174/138620703771826900. Comb Chem High Throughput Screen. 2003. PMID: 14683485 Review.
Cited by
-
Expanding the diversity of origin of transfer-containing sequences in mobilizable plasmids.Nat Microbiol. 2024 Nov 8. doi: 10.1038/s41564-024-01844-1. Online ahead of print. Nat Microbiol. 2024. PMID: 39516559
-
CUDASW++4.0: ultra-fast GPU-based Smith-Waterman protein sequence database search.BMC Bioinformatics. 2024 Nov 2;25(1):342. doi: 10.1186/s12859-024-05965-6. BMC Bioinformatics. 2024. PMID: 39488701 Free PMC article.
-
Advancing microbial diagnostics: a universal phylogeny guided computational algorithm to find unique sequences for precise microorganism detection.Brief Bioinform. 2024 Sep 23;25(6):bbae545. doi: 10.1093/bib/bbae545. Brief Bioinform. 2024. PMID: 39441245 Free PMC article.
-
SAFARI: Pangenome Alignment of Ancient DNA Using Purine/Pyrimidine Encodings.bioRxiv [Preprint]. 2024 Oct 8:2024.08.12.607489. doi: 10.1101/2024.08.12.607489. bioRxiv. 2024. PMID: 39415996 Free PMC article. Preprint.
-
Physics-Based Protein Networks Might Recover Effectful Mutations─a Case Study on Cathepsin G.J Phys Chem B. 2024 Oct 17;128(41):10043-10050. doi: 10.1021/acs.jpcb.4c04140. Epub 2024 Oct 2. J Phys Chem B. 2024. PMID: 39357873 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources