Simple sequence-based kernels do not predict protein-protein interactions
- PMID: 20801913
- DOI: 10.1093/bioinformatics/btq483
Simple sequence-based kernels do not predict protein-protein interactions
Abstract
Motivation: A number of methods have been reported that predict protein-protein interactions (PPIs) with high accuracy using only simple sequence-based features such as amino acid 3mer content. This is surprising, given that many protein interactions have high specificity that depends on detailed atomic recognition between physiochemically complementary surfaces. Are the reported high accuracies realistic?
Results: We find that the reported accuracies of the predictions are significantly over-estimated, and strongly dependent on the structure of the training and testing datasets used. The choice of which protein pairs are deemed as non-interactions in the training data has a variable impact on the accuracy estimates, and the accuracies can be artificially inflated by a bias towards dominant samples in the positive data which result from the presence of hub proteins in the protein interaction network. To address this bias, we propose a positive set-specific method to create a 'balanced' negative set maintaining the degree distribution for each protein, leading to the conclusion that simple sequence-based features contain insufficient information to be useful for predicting PPIs, but that protein domain-based features have some predictive value.
Availability: Our method, named 'BRS-nonint', is available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/. All the datasets used in this study are derived from publicly available data, and are available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/PPI_RandomBalance.html
Contact: maozuguo@hit.edu.cn; d.r.westhead@leeds.ac.uk.
Similar articles
-
Prediction of protein-protein interactions based on PseAA composition and hybrid feature selection.Biochem Biophys Res Commun. 2009 Mar 6;380(2):318-22. doi: 10.1016/j.bbrc.2009.01.077. Epub 2009 Jan 24. Biochem Biophys Res Commun. 2009. PMID: 19171120
-
Computational prediction of protein-protein interactions.Methods Mol Biol. 2004;261:445-68. doi: 10.1385/1-59259-762-9:445. Methods Mol Biol. 2004. PMID: 15064475 Review.
-
Computational design, construction, and characterization of a set of specificity determining residues in protein-protein interactions.Proteins. 2012 Oct;80(10):2426-36. doi: 10.1002/prot.24127. Epub 2012 Jul 10. Proteins. 2012. PMID: 22674858
-
A discriminative approach for identifying domain-domain interactions from protein-protein interactions.Proteins. 2010 Apr;78(5):1243-53. doi: 10.1002/prot.22643. Proteins. 2010. PMID: 20027642
-
A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction.Curr Opin Struct Biol. 2005 Jun;15(3):285-9. doi: 10.1016/j.sbi.2005.05.011. Curr Opin Struct Biol. 2005. PMID: 15939584 Review.
Cited by
-
A protein sequence-based deep transfer learning framework for identifying human proteome-wide deubiquitinase-substrate interactions.Nat Commun. 2024 May 28;15(1):4519. doi: 10.1038/s41467-024-48446-3. Nat Commun. 2024. PMID: 38806474 Free PMC article.
-
Pitfalls of machine learning models for protein-protein interaction networks.Bioinformatics. 2024 Feb 1;40(2):btae012. doi: 10.1093/bioinformatics/btae012. Bioinformatics. 2024. PMID: 38200587 Free PMC article.
-
A robust protein language model for SARS-CoV-2 protein-protein interaction network prediction.Artif Intell Med. 2023 Aug;142:102574. doi: 10.1016/j.artmed.2023.102574. Epub 2023 May 6. Artif Intell Med. 2023. PMID: 37316102 Free PMC article.
-
Assessment of community efforts to advance network-based prediction of protein-protein interactions.Nat Commun. 2023 Mar 22;14(1):1582. doi: 10.1038/s41467-023-37079-7. Nat Commun. 2023. PMID: 36949045 Free PMC article.
-
Computational Methods and Deep Learning for Elucidating Protein Interaction Networks.Methods Mol Biol. 2023;2553:285-323. doi: 10.1007/978-1-0716-2617-7_15. Methods Mol Biol. 2023. PMID: 36227550
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources