Universiti Teknologi Malaysia Institutional Repository

Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction

Kasim, Shahreen (2011) Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction. PhD thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.

[img]
Preview
PDF
388kB

Abstract

Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues that need to be addressed in gene function prediction, namely: solving multiple fuzzy clusters using biological knowledge and biological annotations in some existing databases. This includes, handling the high level expression and low level expression values. Therefore, this research was aimed at clustering gene expressions by incorporating biological knowledge in order to handle these issues. The basic Fuzzy c-Means (FCM) algorithm was introduced to address multiple fuzzy clusters in gene expression. Clustering Functional Annotation (CluFA) was developed to deal with insufficient knowledge via incorporating Gene Ontology (GO) datasets and multiple functional annotation databases. The GO datasets were used to determine number of clusters as well as clusters for genes. Meanwhile, the evidence codes in functional annotation databases were used to compute the strength of the association between data element and a particular cluster. The multi stage filtering-CluFA (msf-CluFA) was implemented by conducting filtering stages and applying an enhanced apriori algorithm in order to handle the high level expression and low level expression values. The performance of the proposed method was evaluated in terms of compactness and separation, consistency, and accuracy, using Eisen and Gasch datasets. Biological validation was also used to validate the gene function prediction, by cross checking them with the most recent annotation database. The results show that the proposed computational method achieved better results compared with other methods such as GOFuzzy, FuzzyK, and FuzzySOM in predicting unknown gene function.

Item Type:Thesis (PhD)
Additional Information:Thesis (Ph.D (Sains Komputer)) - Universiti Teknologi Malaysia, 2011; Supervisors : Prof. Dr. Safaai Deris, Dr. Muhamad Razib Othman
Uncontrolled Keywords:gene function, expression values, gene ontology
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System
ID Code:32110
Deposited By: Narimah Nawil
Deposited On:16 Aug 2013 03:52
Last Modified:27 May 2018 07:11

Repository Staff Only: item control page