Universiti Teknologi Malaysia Institutional Repository

Biological-based semi-supervised clustering algorithm to improve gene function prediction

Kasim, Shahreen and Deris, Safaai and M. Othman, Razib and Hashim, Rathiah (2011) Biological-based semi-supervised clustering algorithm to improve gene function prediction. Journal of Computing, 3 (4). pp. 1-11. ISSN 21519617

[img] PDF (Abstract)
10Kb

Abstract

Analysis of simultaneous clustering of gene expression with biological knowledge has now become an importanttechnique and standard practice to present a proper interpretation of the data and its underlying biology. However, commonclustering algorithms do not provide a comprehensive approach that look into the three categories of annotations; biologicalprocess, molecular function, and cellular component, and were not tested with different functional annotation database formats.Furthermore, the traditional clustering algorithms use random initialization which causes inconsistent cluster generation and areunable to determine the number of clusters involved. In this paper, we present a novel computational framework called CluFA(Clustering Functional Annotation) for semi-supervised clustering of gene expression data. The framework consists of threestages: (i) preparation of Gene Ontology (GO) datasets, functional annotation databases, and testing datasets, (ii) a fuzzy c -means clustering to find the optimal clusters; and (iii) analysis of computational evaluation and biological validation from theresults obtained. With combination of the three GO term categories (biological process, molecular function, and cellularcomponent) and functional annotation databases (Saccharomyces Genome Database (SGD), the Yeast Database at MunichInformation Centre for Protein Sequences (MIPS), and Entrez), the CluFA is able to determine the number of clusters andreduce random initialization. In addition, CluFA is more comprehensive in its capability to predict the functions of unknowngenes. We tested our new computational framework for semi-supervised clustering of yeast gene expression data based onmultiple functional annotation databases. Experimental results show that 76 clusters have been identified via GO slim dataset.By applying SGD, Entrez, and MIPS functional annotation database to reduce random initialization, performance on bothcomputational evaluation and biological validation were improved. By the usage of comprehensive GO term categories, thelowest compactness and separation values were achieved. Therefore, from this experiment, we can conclude that CluFA hadimproved the gene function prediction through the utilization of GO and gene expression values using the fuzzy c -meansclustering algorithm by cross referencing it with the latest SGD annotation.

Item Type:Article
Uncontrolled Keywords:fuzzy c -means, gene expression, gene ontology, gene function prediction, semi-supervised clusterin
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:6964
Deposited By: Liza Porijo
Deposited On:15 Dec 2008 07:14
Last Modified:14 Jul 2014 03:00

Repository Staff Only: item control page