The average common substring approach to phylogenomic reconstruction

doi:10.1089/cmb.2006.13.336

Comparative Study

. 2006 Mar;13(2):336-50.

doi: 10.1089/cmb.2006.13.336.

The average common substring approach to phylogenomic reconstruction

Igor Ulitsky¹, David Burstein, Tamir Tuller, Benny Chor

Affiliations

PMID: 16597244
DOI: 10.1089/cmb.2006.13.336

Comparative Study

The average common substring approach to phylogenomic reconstruction

Igor Ulitsky et al. J Comput Biol. 2006 Mar.

. 2006 Mar;13(2):336-50.

doi: 10.1089/cmb.2006.13.336.

Authors

Igor Ulitsky¹, David Burstein, Tamir Tuller, Benny Chor

Affiliation

¹ School of Computer Science, Tel Aviv University, Ramat Aviv, Israel.

PMID: 16597244
DOI: 10.1089/cmb.2006.13.336

Abstract

We describe a novel method for efficient reconstruction of phylogenetic trees, based on sequences of whole genomes or proteomes, whose lengths may greatly vary. The core of our method is a new measure of pairwise distances between sequences. This measure is based on computing the average lengths of maximum common substrings, which is intrinsically related to information theoretic tools (Kullback-Leibler relative entropy). We present an algorithm for efficiently computing these distances. In principle, the distance of two l long sequences can be calculated in O(l) time. We implemented the algorithm using suffix arrays our implementation is fast enough to enable the construction of the proteome phylogenomic tree for hundreds of species and the genome phylogenomic forest for almost two thousand viruses. An initial analysis of the results exhibits a remarkable agreement with "acceptable phylogenetic and taxonomic truth." To assess our approach, our results were compared to the traditional (single-gene or protein-based) maximum likelihood method. The obtained trees were compared to implementations of a number of alternative approaches, including two that were previously published in the literature, and to the published results of a third approach. Comparing their outcome and running time to ours, using a "traditional" trees and a standard tree comparison method, our algorithm improved upon the "competition" by a substantial margin. The simplicity and speed of our method allows for a whole genome analysis with the greatest scope attempted so far. We describe here five different applications of the method, which not only show the validity of the method, but also suggest a number of novel phylogenetic insights.

PubMed Disclaimer

Cited by

Pangenome comparison via ED strings.
Gabory E, Mwaniki MN, Pisanti N, Pissis SP, Radoszewski J, Sweering M, Zuba W. Gabory E, et al. Front Bioinform. 2024 Sep 26;4:1397036. doi: 10.3389/fbinf.2024.1397036. eCollection 2024. Front Bioinform. 2024. PMID: 39391331 Free PMC article.
CGRWDL: alignment-free phylogeny reconstruction method for viruses based on chaos game representation weighted by dynamical language model.
Wang T, Yu ZG, Li J. Wang T, et al. Front Microbiol. 2024 Mar 20;15:1339156. doi: 10.3389/fmicb.2024.1339156. eCollection 2024. Front Microbiol. 2024. PMID: 38572227 Free PMC article.
Alignment-free comparison of metagenomics sequences via approximate string matching.
Chen J, Yang L, Li L, Goodison S, Sun Y. Chen J, et al. Bioinform Adv. 2022 Oct 21;2(1):vbac077. doi: 10.1093/bioadv/vbac077. eCollection 2022. Bioinform Adv. 2022. PMID: 36388153 Free PMC article.
Insertions and deletions as phylogenetic signal in an alignment-free context.
Birth N, Dencker T, Morgenstern B. Birth N, et al. PLoS Comput Biol. 2022 Aug 8;18(8):e1010303. doi: 10.1371/journal.pcbi.1010303. eCollection 2022 Aug. PLoS Comput Biol. 2022. PMID: 35939516 Free PMC article.
An Information-Entropy Position-Weighted K-Mer Relative Measure for Whole Genome Phylogeny Reconstruction.
Wu YQ, Yu ZG, Tang RB, Han GS, Anh VV. Wu YQ, et al. Front Genet. 2021 Oct 22;12:766496. doi: 10.3389/fgene.2021.766496. eCollection 2021. Front Genet. 2021. PMID: 34745231 Free PMC article.

See all "Cited by" articles

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon
Other Literature Sources
- The Lens - Patent Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The average common substring approach to phylogenomic reconstruction

Affiliation

The average common substring approach to phylogenomic reconstruction

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources