Universiti Teknologi Malaysia Institutional Repository

A study on component-based technology for development of complex bioinformatics software

Ali Shah, Zuraini and Deris, Safaai and Othman, Muhamad Razib and Zakaria, Zalmiyah and Saad, Puteh and Hassan, Rohayanti and Muda, Mohd Hilmi and Kasim, Shahreen and Roslan, Rosfuzah (2004) A study on component-based technology for development of complex bioinformatics software. Project Report. Faculty of Computer Science and Information System, Skudai, Johor. (Unpublished)



In the first chapter, entitled “Enhancement of Support Vector Machines for Remote Protein Homology Detection and Fold Recognition,” M. Hilmi Muda, Puteh Saad and Razib M. Othman present a comprehensive method based on two-layer multiclass classifiers. The first layer is used to detect up to superfamily and family in SCOP hierarchy, by using optimized binary SVM classification rules directly to ROC-Area. The second layer uses discriminative SVM algorithm with a state-of-the-art string kernel based on PSI-BLAST profiles that is used to leverage the unlabeled data. It will detect up to fold in SCOP hierarchy. They evaluated the results obtained using mean ROC and mean MRFP. Experimental results show that their approaches significantly improve the performance of protein remote protein homology detection for all three different datasets (SCOP 1.53, 1.67 and 1.73). They achieved 0.03% improvement in term of mean ROC in dataset SCOP 1.53, 1.17% in dataset SCOP 1.67 and 0.33% in dataset SCOP 1.73 when compared to the results produced by state-of-the-art methods. In the second chapter “Hybrid Clustering Support Vector Machines by Incorporating Protein Residue Information for Protein Local Structure Prediction,” Rohayanti Hassan, Puteh Saad, and Razib M. Othman develop a predictive algorithm named R-HCSVM to predict protein local structure that works with following steps. Firstly, pre-process the input information for RHCSVM. There are two types of input information needed namely protein residue score and protein secondary structure class. ResiduePatchScore information has been introduced as new method to pre-process protein residue score by combining protein conservation score that conserved rich functional information and protein propensity score that conserved rich secondary structural information. Hence, the protein residue score possess strength information that able to avoid bias scoring. Secondly, segment protein sequences into nine continuous length of protein subsequence. Next step which is highlighted another novel part in their study whereas a hybrid clustering SVM is introduced to reduce the training complexity. SOM and K-Means are integrated as a clustering algorithm to produce a granular input, while SVM is then used as a classifier. Based on the protein sequence datasets obtained from PISCES database, they found iii that the R-HCSVM performs outstanding result in predicting protein local structure from a given protein subsequence compared to other methods. In the third chapter “Incorporating Gene Ontology with Conditional-based Clustering to Analyze Gene Expression Data,” Shahreen Kasim, Safaai Deris, and Razib M. Othman proposed a clustering algorithm named BTreeBicluster. The BTreeBicluster starts with the development of GO tree and enriching it with expression similarity from the Sacchromyces genes. From the enriched GO tree, the BTreeBicluster algorithm is applied during the clustering process. The BTreeBicluster takes subset of conditions of gene expression dataset using discretized data. Therefore, the annotation in the GO tree is already determined before the clustering process starts which gives major reflect to the output clusters. Their results of this study have shown that the BTreeBicluster produces better consistency of the annotation. In the final chapter “Improving Protein-Protein Interaction Prediction by a False Positive Filtration Process,” Rosfuzah Roslan and Razib M. Othman aimed to enhance the overlap between computational predictions and experimental results with the effort to partially remove the false positive pairs from the computational predicted PPI datasets. The usage of protein function prediction based on shared interacting domain patterns named PFP() for the purpose of aiding the Gene Ontology Annotation (GOA) is introduced in their study. They used GOA and PFP() as agents in the filtration process to reduce the false positive in computationally predicted PPI pairs. The functions predicted by PFP() which are in Gene Ontology (GO) IDs that were extracted from cross-species PPI data were used to assign novel functional annotations for the uncharacterized proteins and also as additional functions for those that are already characterized by GO. As known by them, GOA is an ongoing process and protein normally executes a variety of functions in different processes, so with the implementation of PFP(), they have increased the chances of finding matching function annotation for the first rule in the filtration process as much as 20%. Their results after the filtration process showed that huge sums of false positive pairs were removed from the predicted datasets. They used signal-to-noise ratio as a measure of improvement made by applying the proposed filtration process. While strength values were used to evaluate the applicability of the whole proposed computational framework to all the different computational PPI prediction methods.

Item Type:Monograph (Project Report)
Uncontrolled Keywords:Biological information system, python molecule viewer
Subjects:Z Bibliography. Library Science. Information Resources > ZA Information resources > ZA4050 Electronic information resources
Q Science > QA Mathematics > QA76 Computer software
Divisions:Computer Science and Information System (Formerly known)
ID Code:4388
Deposited By: Azrin Ariffin
Deposited On:25 Jun 2008 03:21
Last Modified:21 Jun 2010 08:40

Repository Staff Only: item control page