MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction

doi:10.1093/bioinformatics/btx496

. 2017 Dec 15;33(24):3909-3916.

doi: 10.1093/bioinformatics/btx496.

MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction

Duolin Wang^{1

2}, Shuai Zeng², Chunhui Xu², Wangren Qiu^{2

3}, Yanchun Liang^{1

4}, Trupti Joshi^{2

5}, Dong Xu^{1

2}

Affiliations

¹ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
² Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
³ Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333403, China.
⁴ Department of Computer Science and Technology, Zhuhai College of Jilin University, Zhuhai 519041, China.
⁵ Department of Health Management and Informatics, School of Medicine, University of Missouri, Columbia, MO 65211, USA.

PMID: 29036382
PMCID: PMC5860086
DOI: 10.1093/bioinformatics/btx496

MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction

Duolin Wang et al. Bioinformatics. 2017.

. 2017 Dec 15;33(24):3909-3916.

doi: 10.1093/bioinformatics/btx496.

Authors

Duolin Wang^{1

2}, Shuai Zeng², Chunhui Xu², Wangren Qiu^{2

3}, Yanchun Liang^{1

4}, Trupti Joshi^{2

5}, Dong Xu^{1

2}

Affiliations

¹ Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China.
² Department of Electrical Engineering and Computer Science, Informatics Institute, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.
³ Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333403, China.
⁴ Department of Computer Science and Technology, Zhuhai College of Jilin University, Zhuhai 519041, China.
⁵ Department of Health Management and Informatics, School of Medicine, University of Missouri, Columbia, MO 65211, USA.

PMID: 29036382
PMCID: PMC5860086
DOI: 10.1093/bioinformatics/btx496

Abstract

Motivation: Computational methods for phosphorylation site prediction play important roles in protein function studies and experimental design. Most existing methods are based on feature extraction, which may result in incomplete or biased features. Deep learning as the cutting-edge machine learning method has the ability to automatically discover complex representations of phosphorylation patterns from the raw sequences, and hence it provides a powerful tool for improvement of phosphorylation site prediction.

Results: We present MusiteDeep, the first deep-learning framework for predicting general and kinase-specific phosphorylation sites. MusiteDeep takes raw sequence data as input and uses convolutional neural networks with a novel two-dimensional attention mechanism. It achieves over a 50% relative improvement in the area under the precision-recall curve in general phosphorylation site prediction and obtains competitive results in kinase-specific prediction compared to other well-known tools on the benchmark data.

Availability and implementation: MusiteDeep is provided as an open-source tool available at https://github.com/duolinwang/MusiteDeep.

Contact: xudong@missouri.edu.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
Deep-learning architecture of MusiteDeep. The input layer is the one-of-K coding of a 33-residue protein fragment centered at the prediction site. Multi-layer CNN is used as the feature extractor but no pooling layers are used. The last hidden state of multi-layer CNN is copied twice, where one directly inputs into the attention mechanism (attention-1) and the other first trans-positioned and then inputs into another attention mechanism (attention-2). The output of the two attention mechanisms is combined and input into the fully connected neural network layers. The final layer is a single neural network layer with the softmax output

**Fig. 2.**
Graphical illustration of the attention-based decoder on the feature map dimension. It decodes the feature maps (h₁, h₂…, *h_T*) from the last hidden state of multi-layer CNN into a single target representation (H'). All the parameters within each layer are scaled between 0 and 1. The grey scale is shown according to the values of parameters

**Fig. 3.**
ROC and precision-recall curves comparing MusiteDeep with Musite and other deep-learning architectures by five-fold cross-validation

**Fig. 4.**
ROC and precision-recall curves comparing MusiteDeep with other well-known general phosphorylation site prediction tools on the testing set

**Fig. 5.**
ROC and precision-recall curves comparing MusiteDeep with other well-known kinase-specific phosphorylation site prediction tools by five-fold cross-validation of CDK (left) and PKA (right)

**Fig. 6.**
t-SNE plot of the merged representation and the original one-of-K representation

See this image and copyright information in PMC

Cited by

GBMPhos: A Gating Mechanism and Bi-GRU-Based Method for Identifying Phosphorylation Sites of SARS-CoV-2 Infection.
Huang G, Xiao R, Chen W, Dai Q. Huang G, et al. Biology (Basel). 2024 Oct 6;13(10):798. doi: 10.3390/biology13100798. Biology (Basel). 2024. PMID: 39452107 Free PMC article.
Sitetack: a deep learning model that improves PTM prediction by using known PTMs.
Gutierrez CS, Kassim AA, Gutierrez BD, Raines RT. Gutierrez CS, et al. Bioinformatics. 2024 Nov 1;40(11):btae602. doi: 10.1093/bioinformatics/btae602. Bioinformatics. 2024. PMID: 39388212 Free PMC article.
VUStruct: a compute pipeline for high throughput and personalized structural biology.
Moth CW, Sheehan JH, Mamun AA, Sivley RM, Gulsevin A, Rinker D, Capra JA, Meiler J. Moth CW, et al. bioRxiv [Preprint]. 2024 Aug 7:2024.08.06.606224. doi: 10.1101/2024.08.06.606224. bioRxiv. 2024. PMID: 39149406 Free PMC article. Preprint.
Serotype-Specific Regulation of Dengue Virus NS5 Protein Subcellular Localization.
Cheng CX, Tan MJA, Chan KWK, Choy MMJ, Roman N, Arnold DDR, Bifani AM, Kong SYZ, Bist P, Nath BK, Swarbrick CMD, Forwood JK, Vasudevan SG. Cheng CX, et al. ACS Infect Dis. 2024 Jun 14;10(6):2047-2062. doi: 10.1021/acsinfecdis.4c00054. Epub 2024 May 29. ACS Infect Dis. 2024. PMID: 38811007 Free PMC article.
TransPTM: a transformer-based model for non-histone acetylation site prediction.
Meng L, Chen X, Cheng K, Chen N, Zheng Z, Wang F, Sun H, Wong KC. Meng L, et al. Brief Bioinform. 2024 Mar 27;25(3):bbae219. doi: 10.1093/bib/bbae219. Brief Bioinform. 2024. PMID: 38725156 Free PMC article.

See all "Cited by" articles

References

1. Alipanahi B. et al. (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838. - PubMed
1. Bahdanau D. et al. (2014) Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv: 1409.0473.
1. Bairoch A. et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res., 33, D154–D159. - PMC - PubMed
1. Blom N. et al. (2004) Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics, 4, 1633–1649. - PubMed
1. Caruana R. (1995) Learning many related tasks at the same time with backpropagation. In: Advances in Neural Information Processing Systems, 7, pp. 657–664.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

R01 GM100701/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

[1] Alipanahi B. et al. (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838. - PubMed

[2] Alipanahi B. et al. (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol., 33, 831–838. - PubMed

[3] Bahdanau D. et al. (2014) Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv: 1409.0473.

[4] Bahdanau D. et al. (2014) Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv: 1409.0473.

[5] Bairoch A. et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res., 33, D154–D159. - PMC - PubMed

[6] Bairoch A. et al. (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res., 33, D154–D159. - PMC - PubMed

[7] Blom N. et al. (2004) Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics, 4, 1633–1649. - PubMed

[8] Blom N. et al. (2004) Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics, 4, 1633–1649. - PubMed

[9] Caruana R. (1995) Learning many related tasks at the same time with backpropagation. In: Advances in Neural Information Processing Systems, 7, pp. 657–664.

[10] Caruana R. (1995) Learning many related tasks at the same time with backpropagation. In: Advances in Neural Information Processing Systems, 7, pp. 657–664.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction

Affiliations

MusiteDeep: a deep-learning framework for general and kinase-specific phosphorylation site prediction

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources