Universiti Teknologi Malaysia Institutional Repository

Remote protein homology detection and fold recognition using two-layer support vector machine classifiers

M. Muda, Hilmi and Saad, Puteh and M. Othman, Razib (2011) Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. Computers in Biology and Medicine, 41 (8). pp. 687-699. ISSN 0010-4825

PDF (Abstract) - Published Version

Official URL: http://dx.doi.org/10.1016/j.compbiomed.2011.06.004


Remote protein homology detection and fold recognition refer to detection of structural homology in proteins where there are small or no similarities in the sequence. To detect protein structural classes from protein primary sequence information, homology-based methods have been developed, which can be divided to three types: discriminative classifiers, generative models for protein families and pairwise sequence comparisons. Support Vector Machines (SVM) and Neural Networks (NN) are two popular discriminative methods. Recent studies have shown that SVM has fast speed during training, more accurate and efficient compared to NN. We present a comprehensive method based on two-layer classifiers. The 1st layer is used to detect up to superfamily and family in SCOP hierarchy using optimized binary SVM classification rules. It used the kernel function known as the Bio-kernel, which incorporates the biological information in the classification process. The 2nd layer uses discriminative SVM algorithm with string kernel that will detect up to protein fold level in SCOP hierarchy. The results obtained were evaluated using mean ROC and mean MRFP and the significance of the result produced with pairwise t-test was tested. Experimental results show that our approaches significantly improve the performance of remote protein homology detection and fold recognition for all three different version SCOP datasets (1.53, 1.67 and 1.73). We achieved 4.19% improvements in term of mean ROC in SCOP 1.53, 4.75% in SCOP 1.67 and 4.03% in SCOP 1.73 datasets when compared to the result produced by well-known methods. The combination of first layer and second layer of BioSVM-2L performs well in remote homology detection and fold recognition even in three different versions of datasets

Item Type:Article
Uncontrolled Keywords:fold recognition, remote protein homology detection, support vector machines, two-layer classifiers, bio-inspired kernel
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:24555
Deposited By: Kamariah Mohamed Jong
Deposited On:16 Apr 2012 03:42
Last Modified:16 Apr 2012 03:42

Repository Staff Only: item control page