Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology

Kuo-Chen      Chou

doi:10.2174/157016409789973707

Abstract

With the avalanche of protein sequences generated in the post-genomic age, it is highly desired to develop automated methods for efficiently identifying various attributes of uncharacterized proteins. This is one of the most important tasks facing us today in bioinformatics, and the information thus obtained will have important impacts on the development of proteomics and system biology. To realize that, one of the keys is to find an effective model to represent the sample of a protein. The most straightforward model in this regard is its entire amino acid sequence; however, the entire sequence model would fail to work when the query protein did not have significant homology to proteins of known characteristics. Thus, various non-sequential models or discrete models were proposed. The simplest discrete model is the amino acid (AA) composition. Using it to represent a protein, however, all the sequence-order information would be completely lost. To cope with such a dilemma, the concept of pseudo amino acid (PseAA) composition was introduced. Its essence is to keep using a discrete model to represent a protein yet without completely losing its sequence-order information. Therefore, in a broad sense, the PseAA composition of a protein is actually a set of discrete numbers that is derived from its amino acid sequence and that is different from the classical AA composition and able to harbour some sort of sequence order or pattern information. Ever since the first PseAA composition was formulated to predict protein subcellular localization and membrane protein types, it has stimulated many different modes of PseAA composition for studying various kinds of problems in proteins and proteins-related systems. In this review, we shall give a brief and systematic introduction of various modes of PseAA composition and their applications. Meanwhile, the challenges for finding the optimal PseAA composition are also briefly discussed.

Keywords: Protein attributes, sequential model, discrete model, PseAAC, functional domain, gene ontology, sequential evolution, optimal PseAAC