iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: https://pubmed.ncbi.nlm.nih.gov/21697992
How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011:7:61-85.
doi: 10.4137/EBO.S7048. Epub 2011 Jun 7.

How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling

Affiliations

How Fitch-Margoliash Algorithm can Benefit from Multi Dimensional Scaling

Sylvain Lespinats et al. Evol Bioinform Online. 2011.

Abstract

Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon.TO STUDY HOW THESE FEATURES MIGHT INFLUENCE PHYLOGENY RECONSTRUCTIONS, WE EXAMINED A PARTICULAR POPULAR METHOD: the Fitch-Margoliash algorithm, which belongs to the Least Squares methods. We show that the Least Squares methods are closely related to Multi Dimensional Scaling. Indeed, criteria for Fitch-Margoliash and Sammon's mapping are somewhat similar. However, the prolific research in Multi Dimensional Scaling has definitely allowed outclassing Sammon's mapping.Least Square methods for tree reconstruction can now take advantage of these improvements. However, "false neighborhood" and "tears" are the two main risks in dimensionality reduction field: "false neighborhood" corresponds to a widely separated data in the original space that are found close in representation space, and neighbor data that are displayed in remote positions constitute a "tear". To address this problem, we took advantage of the concepts of "continuity" and "trustworthiness" in the tree reconstruction field, which limit the risk of "false neighborhood" and "tears". We also point out the concentration of measured phenomenon as a source of error and introduce here new criteria to build phylogenies with improved preservation of distances and robustness.The authors and the Evolutionary Bioinformatics Journal dedicate this article to the memory of Professor W.M. Fitch (1929-2011).

Keywords: Fitch-Margoliash; Least Square methods; Multi Dimensional Scaling; Sammon’s mapping; molecular phylogeny.

PubMed Disclaimer

Figures

Figure 1
Figure 1
200 random data are uniformly distributed in a unit cube (of a given dimensionality). Histograms of distances between every pairs of items (19 500 distances) are displayed according to space dimension. Distributions of distances for dimensions larger than 200 would have the same Gaussian-like shape, but their centers would be shifted to the right proportionally to the square root of the dimension.
Figure 2
Figure 2
Summary of the eight tested criteria and links between them. Every combination of criterion components is tested to evaluate each improvement. Components allows: penalizing either “tears” (as in original criteria) or “tears” and “false neighborhoods” together (as it is suggested here); emphasizing small distances either thanks to the traditional reverse function or while considering the concentration of measure phenomenon with more (cases ζ2 and ζ5) or less (cases ζ1 and ζ4) focusing on smaller distances. ζFM corresponds to the original Fitch-Margoliash’s criterion and ζSa to the Sanjuán and Wróbel’s one. ζ1 and ζ2 avoid both risks of “false neighborhoods” and risks related to the concentration of measure phenomenon. ζ3 and ζ6 avoid risks related to the concentration of measure phenomenon and risks of “tears” (but do not considered the risk of “false neighborhoods”). ζ4 and ζ5 avoid risks related to the concentration of measure phenomenon (while highly focusing on smaller distances in case of ζ5) but lightly penalize “false neighborhoods”.
Figure 3
Figure 3
Example of comparison between two tree-building methods: Tree built according to the classical Fitch-Margoliash method versus the tree built thanks to the new criterion. Original data (and the associate distance matrix) are displayed in the upper insert. Left and right inserts express the two trees. “Continuity” and “trustworthiness” can then be compared on the lower insert (the grey curve corresponds to the Fitch-Margoliash method and the black curve is related to the new method).
Figure 4
Figure 4
Evaluation of distance preservation by nine tree building methods (analysis on 200 sets of 15 random data in a two-dimensional space). The up and left insert shows the Robinson and Foulds distance between trees generated by the various methods: the distance that separates two methods in the graph accounts for the average Robinson and Foulds distance between methods’ trees (see Hillis et al) for a somewhat similar display of Robinson and Foulds distances). Other inserts express “continuity” (lefts inserts) and “trustworthiness” (right inserts). Every curve is reported on each graph to allow an easy comparison. On each insert, the related method corresponds to the black curve.
Figure 5
Figure 5
Evaluation of distance preservation (data in a 100-dimensional space). The up and left insert shows the Robinson and Foulds distance between trees generated by the various methods. Other inserts express “continuity” (lefts inserts) and “trustworthiness” (right inserts). Every curve is reported on each graph to allow an easy comparison. On each insert, the related method corresponds to the black curve.

Similar articles

Cited by

References

    1. Hitchcock E, Hitchcock CH. Elementary Geology. University of Michigan Library: Scholarly Publishing Office; 1840.
    1. Darwin C. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. London: John Murray; 1859. - PMC - PubMed
    1. Haeckel E. Generelle Morphologie der Organismen. Charleston, Carolina: Nabu Press; 1866.
    1. Brocchieri L. Phylogenetic inferences from molecular sequences: review and critique. Theor Popul Biol. 2001;59:27–40. - PubMed
    1. Edwards AWF, Cavalli-Sforza LL. The reconstruction of evolution. Ann Hum Genet. 1963;27:105–6.