Discrimination of GO term annotated proteins based on amino acid occurrence and composition
datasetposted on 21.11.2017 by Taguchi, Y. H., Gromiha, M. Michael
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
In this paper, we have applied linear discriminant analysis and support vector machine for predicting GO term annotated proteins using amino acid occurrence/composition in uniref50 data set, i.e., uniprot with less than 50 % sequence identity.We found that our method could discriminate between proteins with at least one known GO term and those without any annotation at an AUC of 0.82 using three-fold cross validation test. Discrimination of the 38 most frequent GO terms is achieved with the maximum AUC of 0.91. Our method is solely based on amino acid sequence and hence it will be useful to predict GO term associations of newly obtained amino acid sequence without any annotated known homolog. PRIB 2008 proceedings found at: http://dx.doi.org/10.1007/978-3-540-88436-1 Contributors: Monash University. Faculty of Information Technology. Gippsland School of Information Technology ; Chetty, Madhu ; Ahmad, Shandar ; Ngom, Alioune ; Teng, Shyh Wei ; Third IAPR International Conference on Pattern Recognition in Bioinformatics (PRIB) (3rd : 2008 : Melbourne, Australia) ; Coverage: Rights: Copyright by Third IAPR International Conference on Pattern Recognition in Bioinformatics. All rights reserved.