Universiti Teknologi Malaysia Institutional Repository

Application of string kernels in protein sequence classification

Zaki, Nazar M. and Deris, Safaai and Illias, Rosli Md (2005) Application of string kernels in protein sequence classification. Applied Bioinformatics, 4 . pp. 45-52. ISSN 11755636

Full text not available from this repository.

Official URL: http://dx.doi.org/10.2165/00822942-200504010-00005

Abstract

Introduction: The production of biological information has become much greater than its consumption. The key issue now is how to organise and manage the huge amount of novel information to facilitate access to this useful and important biological information. One core problem in classifying biological information is the annotation of new protein sequences with structural and functional features. Method: This article introduces the application of string kernels in classifying protein sequences into homogeneous families. A string kernel approach used in conjunction with support vector machines has been shown to achieve good performance in text categorisation tasks. We evaluated and analysed the performance of this approach, and we present experimental results on three selected families from the SCOP (Structural Classification of Proteins) database. We then compared the overall performance of this method with the existing protein classification methods on benchmark SCOP datasets. Results: According to the F1 performance measure and the rate of false positive (RFP) measure, the string kernel method performs well in classifying protein sequences. The method outperformed all the generative-based methods and is comparable with the SVM-Fisher method. Discussion: Although the string kernel approach makes no use of prior biological knowledge, it still captures sufficient biological information to enable it to outperform some of the state-of-the-art methods.

Item Type:Article
Uncontrolled Keywords:protein, amino acid sequence, article, bioinformatics, medical information, protein analysis, protein database, protein structure, sequence analysis, sequence homology, structure analysis algorithms, amino acid motifs, artificial intelligence, cluster analysis, conserved sequence, molecular sequence data, pattern recognition, automated, proteins, sequence alignment, sequence analysis, protein, sequence homology, amino acid
Subjects:T Technology > T Technology (General)
T Technology > TP Chemical technology
Divisions:Computer Science and Information System (Formerly known)
ID Code:12428
Deposited By: S.N.Shahira Dahari
Deposited On:01 Jun 2011 02:12
Last Modified:13 Jun 2011 05:57

Repository Staff Only: item control page