Classification of Ligase Function Based on Multi-parametric Feature Extracted from Protein Sequence

Lee, Bum Ju; Lee, Heon Gyu; Shin, Moon Sun; Ryu, Keun Ho

doi:10.1007/978-3-540-69848-7_87

Bum Ju Lee¹,
Heon Gyu Lee¹,
Moon Sun Shin² &
…
Keun Ho Ryu¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5073))

Included in the following conference series:

International Conference on Computational Science and Its Applications

1599 Accesses
4 Citations

Abstract

One of the important goals of bioinformatics is to classify and predict the functions of proteins that have no sequence homolog of known functions. The purpose of this paper is to classify protein function by using multi-parametric feature, without sequence similarity. Firstly, we propose a method for generating novel features that present various local information of protein sequence based on positively and negatively charged residues. Then, we introduce a process of making optimal feature subset through combination of traditional and novel features extracted from protein sequence. Finally, we classify ligase enzymes by support vector machine (SVM). In experiment, only 375 out of 483 features were selected by feature selection, and the classification accuracy for 4^th sub-classes in Enzyme Commission (EC) number is 98.35%. Our results demonstrate that most of novel features are valuable for specific enzyme function classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy

Article 23 January 2021

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes

Article Open access 30 January 2024

References

Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence data bank and its supplement TrEMBL in 1999. Nucleic Acids Res. 27, 49–54 (1999)
Article Google Scholar
Bairoch, A.: The Enzyme Database in 2000. Nucleic Acids Res. 28, 304–305 (2000)
Article Google Scholar
Cai, C.Z., Wang, W.L., Sun, L.Z., Chen, Y.Z.: Protein function classification via support vector machine approach. Math. Biosci. 185, 111–122 (2003a)
Article MATH MathSciNet Google Scholar
Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res. 31, 3692–3697 (2003b)
Article Google Scholar
Wang, X., Schroeder, D., Dobbs, D., Honavar, V.: Automated data-driven discovery of motif-based protein function classifiers. Inf. Sci (ISCI) 155, 1–18 (2003)
Article MathSciNet Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410 (1990)
Google Scholar
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 35, 3389–3402 (1997)
Article Google Scholar
Eidhammer, I., Jonassen, I., Taylor, W.R.: Protein Structure Comparison and Structure Patterns. J. Comput. Biol. 7, 685–716 (2000)
Article Google Scholar
Syed, U., Yona, G.: Enzyme function prediction with interpretable models. In: Methods in Molecular Biology: Computational Systems Biology, pp. 1–33. Humana Press (2007)
Google Scholar
Dobson, P.D., Doig, A.J.: Predicting Enzyme Class from Protein Structure without Alignments. J. Mol. Biol. 345, 187–199 (2005)
Article Google Scholar
Han, L.Y., Cai, C.Z., Ji, Z.L., Cao, Z.W., Cui, J., Chen, Y.Z.: Predicting functional family of novel enzymes irrespective of sequence similarity: a statistical learning approach. Nucleic Acids Res. 32, 6437–6444 (2004)
Article Google Scholar
Noble, W.S., Ben-Hur, A.: Integrating Informmation for protein function prediction, Bioinformatics-From Genomes Therapies. In: Lengauer, T. (ed.), WILE-VCH, Weinheim, vol. 3, pp. 1297–1314 (2007)
Google Scholar
Guyon, I.: An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Article MATH Google Scholar
Chawla, N.V.: C4.5 and Imbalanced Data sets: Investigating the effect of sampling method, probabilistic estimate, and decision tree structure. In: Proc. International Conference on Machine Learning (ICML), Workshop on learning from imbalanced datasets II (2003)
Google Scholar
Borro, L.C., Oliveira, S.R.M., Yamagishi, M.E.B., Mancini, A.L., Jardine, J.G., Mazoni, I., Santos, E.H.D., Higa, R.H., Kuser, P.R., Neshich, G.: Predicting enzyme class from protein structure using Bayesian classification. Genet. Mol. Res. 5, 193–202 (2006)
Google Scholar
Ian, H.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), http://www.cs.waikato.ac.nz/ml/weka/
MATH Google Scholar
Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M.R., Appel, R.D., Bairoch, A.: Protein Identification and Analysis Tools on the ExPASy Server. In: John, M.W. (ed.) The Proteomics Protocols Handbook, pp. 571–607. Humana Press (2005)
Google Scholar
Al-Shahib, A., Breitling, R., Gilbert, D.: Feature Selection and the class imbalance problem in predicting protein function from sequence. Appl. Bioinformatics 4, 195–203 (2005a)
Google Scholar
Al-Shahib, A., Breitling, R., Gilbert, D.: FRANKSUM: New feature selection method for protein funciton prediction. Int. J. Neural Syst. 15, 250–275 (2005b)
Article Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. In: Goldstein, et al. (eds.), pp. 163–298. Addison Wesley, Reading (2006)
Google Scholar
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U.S.A. 85, 2444–2448 (1988)
Article Google Scholar
Rost, B.: Twilight zone of protein sequence alignments. Protein Eng. 12, 85–94 (1999)
Article Google Scholar
Kawabata, T.: MATRAS: a program for protein 3D structure comparison. Nucleic Acids Res. 31, 3367–3369 (2003)
Article Google Scholar
Holm, L., Sande, C.: Dali: a network tool for protein structure comparison. Trends Biochem. Sci. 20, 478–480 (1995)
Article Google Scholar
Drummond, C., Holte, R.C.: C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Proc. International Conference on Machine Learning (ICML), Workshop on Learning from Imbalanced Datasets II (2003)
Google Scholar
Lapinsh, M., Gutcaits, A., Prusis, P., Post, C., Lundstedt, T., Wikberg, J.E.S.: Classification of G-protein coupled receptors by alignment-independent extraction of principal chemical properties of primary amino acid sequences. Protein Sci. 11, 795–805 (2002)
Article Google Scholar
Claeyssens, M., Henrissat, B.: Specificity mapping of cellulolytic enzymes: Classification into families of structurally related proteins confirmed by biochemical analysis. Protein Sci. 1, 1293–1297 (1992)
Article Google Scholar
Jensen, L.J., Gupta, R., Blom, N., Devos, D., Tamames, J., Kesmir, C., Nielsen, H., Stærfeldt, H.H., Rapacki, K., Workman, C., Andersen, C.A.F., Knudsen, S., Krogh, A., Valencia, A., Brunak, S.: Prediction of Human Protein Function from Post-translational Modifications and Localization Features. J. Mol. Biol. 319, 1257–1265 (2002a)
Article Google Scholar
Jensen, L.J., Skovgaard, M., Brunak, S.: Prediction of novel archaeal enzymes from sequence-derived features. Protein Sci. 3, 2894–2898 (2002b)
Article Google Scholar
Truniger, V., Lazaro, J.M., Esteban, F.J., Blanco, L., Salas, M.: A positively charged residue of φ29 DNA polymerase, highly conserved in DNA polymerases from families A and B, is involoved in binding the incoming nucleotide. Nucleic Acids Res. 30, 1483–1492 (2002)
Article Google Scholar
Pawlowski, K., Jaroszewski, L., Rychlewski, L., Godzik, A.: Sensitive sequence comparison as protein function predictor. In: Proc. pacific Symposium on Biocomputing, vol. 5, pp. 42–53 (2000)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines, Software (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
Yasser E.L.M.: WLSVM (2005), http://www.cs.iastate.edu/~yasser/wlsvm/
Bendtsen, J.D., Jensen, L.J., Blom, N., Heijne, G.V., Brunak, S.: Feature-based prediction of non-classical and leaderless protein secretion. Protein Eng. Des. Sel. 17, 349–356 (2004)
Article Google Scholar
Russell, R.B., Saqi, M.A., Bates, P.A., Sayle, R.A., Sternberg, M.J.: Recognition of analogous and homologous protein folds-assessment of prediction success and associated alignment accuracy using empirical substitution matrices. Protein Eng. 11, 1–9 (1998)
Article Google Scholar
Todd, A.E., Orengo, C.A., Thornton, J.M.: Evolution of Function in Protein Superfamilies, from a Structural Perspective. J. Mol. Biol. 307, 1113–1143 (2001)
Article Google Scholar
Hall, M.A.: Correlation-based feature selection for machine learning, Ph.D. thesis, Department of Computer Science, University of Waikato, Hamilton, New Zealand (1998)
Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proc. of the 17th Int. Conf. on Machine Learning (ICML2000), pp. 359–366. Morgan Kaufmann Publishers Inc., San Francisco (2000)
Google Scholar
Hall, M.A., Holmes, G.: Benchmarking Feature Selection Techniques for Discrete Class Data Mining. IEEE Transactions on Knowledge and Data Engineering 15, 1–16 (2003)
Article Google Scholar
Lee, B.J., Lee, H.G., Lee, J.Y., Ryu, K.H.: Classification of Enzyme Function from Protein Sequence based on Feature Representation. In: Proc. of the 7th IEEE Int. Conf. on Bioinformatics and Bioengineering (BIBE 2007), vol. 2, pp. 741–752 (2007)
Google Scholar
Lee, B.J., Lee, H.G., Kim, D.S., Ryu, K.H.: Feature Extraction in Spatially-Conserved Regions and Protein Functional Classification. In: Proc. of the 2th Int. Conf. on Frontiers in the Convergence of Bioscience and Information Technologies (FBIT 2007), vol. 1, pp. 165–170 (2007)
Google Scholar
Kim, S.S., Kang, J.W., Chung, Y.J., Li, J.Y., Ryu, K.H.: Clustering orthologous proteins across phylogenetically distant species. Proteins 71, 1113–1122 (2008)
Article Google Scholar
Kim, S.S., Jung, K.S., Ryu, K.H.: Automatic Orthologous-Protein-Clustering from Multiple Complete-Genomes by the Best Reciprocal BLAST Hits. In: Li, J., Yang, Q., Tan, A.-H. (eds.) BioDM 2006. LNCS (LNBI), vol. 3916, pp. 60–70. Springer, Heidelberg (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Database/Bioinformatics Lab., Chungbuk National University, 12 Gaeshindong, Cheongju, Chungbuk, South Korea, 361-763
Bum Ju Lee, Heon Gyu Lee & Keun Ho Ryu
Dept. of Computer Science, Konkuk University, 322 Danwol-Dong, Chungju-Si, Chungbuk, South Korea, 380-701
Moon Sun Shin

Authors

Bum Ju Lee
View author publications
You can also search for this author in PubMed Google Scholar
Heon Gyu Lee
View author publications
You can also search for this author in PubMed Google Scholar
Moon Sun Shin
View author publications
You can also search for this author in PubMed Google Scholar
Keun Ho Ryu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Osvaldo Gervasi Beniamino Murgante Antonio Laganà David Taniar Youngsong Mun Marina L. Gavrilova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, B.J., Lee, H.G., Shin, M.S., Ryu, K.H. (2008). Classification of Ligase Function Based on Multi-parametric Feature Extracted from Protein Sequence. In: Gervasi, O., Murgante, B., Laganà, A., Taniar, D., Mun, Y., Gavrilova, M.L. (eds) Computational Science and Its Applications – ICCSA 2008. ICCSA 2008. Lecture Notes in Computer Science, vol 5073. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69848-7_87

Download citation

DOI: https://doi.org/10.1007/978-3-540-69848-7_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69840-1
Online ISBN: 978-3-540-69848-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Classification of Ligase Function Based on Multi-parametric Feature Extracted from Protein Sequence

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Classification of Ligase Function Based on Multi-parametric Feature Extracted from Protein Sequence

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

A Machine Learning Methodology for Enzyme Functional Classification Combining Structural and Protein Sequence Descriptors

IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation