Universiti Teknologi Malaysia Institutional Repository

The development of machine learning based software for predicting protein-protein interactions and protein function from protein primary structure

Othman, Muhamad Razib and Deris, Safaai and Alashwal, Hany Taher Ahmed and Md. Illias, Rosli and Mat Yatim, Safie (2007) The development of machine learning based software for predicting protein-protein interactions and protein function from protein primary structure. Project Report. Faculty of Computer Science and Information System, Skudai, Johor. (Unpublished)

[img]
Preview
PDF (Full Text)
2895Kb

Abstract

Understanding proteins functions is a major goal in the post-genomic era. Proteins usually work in context of other proteins and rarely function alone. Therefore, it is highly relevant to study the interaction partners of a protein in order to understand its function. For this reason, the main objective of this thesis is to predict protein-protein interactions based only on protein primary structure. Using the Support Vector Machines (SVM), different protein features have been studied and examined. These features include protein domain structures, hydrophobicity and amino acid compositions. The results imply that the protein domain structure is the most informative feature for predicting protein-protein interactions. It also requires much lower running time compared to the other features. However, using normal binary SVM requires positive and negative data samples. Although it is easy to get a dataset of interacting proteins as positive examples, there are no experimentally confirmed non-interacting proteins to be considered as negative examples. Previous researches cope with this problem by artificially generate random set of proteins pairs that are not listed in the Database of Interacting Proteins (DIP) as negative examples. This approach can be used for comparing features because the error will be uniform. In this research, we consider this problem as a one-class classification problem and solve it using the One-Class SVM. Using only positive examples (interacting protein pairs) in training phase, the one-class SVM achieves accuracy of 80%. These results imply that protein-protein interaction can be predicted using one-class classifier with comparable accuracy to the binary classifiers that use artificially constructed negative examples. Finally, a Bayesian Kernel for SVM was implemented to incorporate the probabilistic information about protein-protein interactions that were compiled from different sources. The probabilistic output from the Bayesian Kernel can assist the biologist to conduct more research on the highly predicted interactions.

Item Type:Monograph (Project Report)
Uncontrolled Keywords:Protein function prediction, computational biology, machine learning, artificial intelligence
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:4140
Deposited By: Noor Aklima Harun
Deposited On:18 Feb 2008 08:39
Last Modified:01 Jun 2010 03:15

Repository Staff Only: item control page