iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: http://www.ncbi.nlm.nih.gov/pubmed/21556138
Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 27;6(4):e18910.
doi: 10.1371/journal.pone.0018910.

Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation

Affiliations

Representative proteomes: a stable, scalable and unbiased proteome set for sequence analysis and functional annotation

Chuming Chen et al. PLoS One. .

Abstract

The accelerating growth in the number of protein sequences taxes both the computational and manual resources needed to analyze them. One approach to dealing with this problem is to minimize the number of proteins subjected to such analysis in a way that minimizes loss of information. To this end we have developed a set of Representative Proteomes (RPs), each selected from a Representative Proteome Group (RPG) containing similar proteomes calculated based on co-membership in UniRef50 clusters. A Representative Proteome is the proteome that can best represent all the proteomes in its group in terms of the majority of the sequence space and information. RPs at 75%, 55%, 35% and 15% co-membership threshold (CMT) are provided to allow users to decrease or increase the granularity of the sequence space based on their requirements. We find that a CMT of 55% (RP55) most closely follows standard taxonomic classifications. Further analysis of this set reveals that sequence space is reduced by more than 80% relative to UniProtKB, while retaining both sequence diversity (over 95% of InterPro domains) and annotation information (93% of experimentally characterized proteins). All sets can be browsed and are available for sequence similarity searches and download at http://www.proteininformationresource.org/rps, while the set of 637 RPs determined using a 55% CMT are also available for text searches. Potential applications include sequence similarity searches, protein classification and targeted protein annotation and characterization.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Flow chart of the method used to select Representative Proteomes.
For details please see materials and methods section.
Figure 2
Figure 2. Stability and characteristics of RP55.
RPGs and RPs were determined for previous releases of UniProtKB. Histograms show the growth in the number of RPs relative to the number of complete proteomes. The percentage of species with strains found in multiple RPGs is given by the green line, while the percentages of RPGs and RPs that remained unchanged between the indicated release and the 2010_09 release are given by the orange and blue lines, respectively.
Figure 3
Figure 3. Sequence similarity searches against Representative Proteome sets.
3a) Time required to perform phmmer searches on 1000 randomly chosen UniParc sequences against RP15 (purple), RP35 (orange), RP55 (blue) and RP75 (red) or UniParc (green solid lines). The subset of sequences with no Representative Proteome (RP) hits were searched against the whole of UniParc and the two search times where summed (broken lines). 3b) Taxonomic breakdown of the subset of sequences without RP hit.
Figure 4
Figure 4. Browsing the Representative Proteome Groups (RPGs) and Representative Proteomes (RPs) at different thresholds.

Similar articles

Cited by

References

    1. Suzek BE, Huang H, McGarvey P, Mazumder R, Wu CH. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics. 2007;23:1282–1288. - PubMed
    1. Sammut SJ, Finn RD, Bateman A. Pfam 10 years on: 10,000 families and still growing. Brief Bioinform. 2008;9:210–219. - PubMed
    1. Sayers EW, Barrett T, Benson DA, Bolton E, Bryant SH, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2009;38:D5–16. - PMC - PubMed
    1. GO Consortium. The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol. 2009;5:e1000431. - PMC - PubMed
    1. Gabaldon T, Dessimoz C, Huxley-Jones J, Vilella AJ, Sonnhammer EL, et al. Joining forces in the quest for orthologs. Genome Biol. 2009;10:403. - PMC - PubMed

Publication types