Kasim, Shahreen (2011) Fuzzy c-means clustering by incorporating biological knowledge and multi-stage filtering to improve gene function prediction. PhD thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.
|
PDF
388kB |
Abstract
Gene expression is a process by which information from a gene is used in the synthesis of a functional gene product. Comprehensive studies of gene expression are useful for predicting gene functions, which includes predicting annotations for unknown gene functions. However, there are several issues that need to be addressed in gene function prediction, namely: solving multiple fuzzy clusters using biological knowledge and biological annotations in some existing databases. This includes, handling the high level expression and low level expression values. Therefore, this research was aimed at clustering gene expressions by incorporating biological knowledge in order to handle these issues. The basic Fuzzy c-Means (FCM) algorithm was introduced to address multiple fuzzy clusters in gene expression. Clustering Functional Annotation (CluFA) was developed to deal with insufficient knowledge via incorporating Gene Ontology (GO) datasets and multiple functional annotation databases. The GO datasets were used to determine number of clusters as well as clusters for genes. Meanwhile, the evidence codes in functional annotation databases were used to compute the strength of the association between data element and a particular cluster. The multi stage filtering-CluFA (msf-CluFA) was implemented by conducting filtering stages and applying an enhanced apriori algorithm in order to handle the high level expression and low level expression values. The performance of the proposed method was evaluated in terms of compactness and separation, consistency, and accuracy, using Eisen and Gasch datasets. Biological validation was also used to validate the gene function prediction, by cross checking them with the most recent annotation database. The results show that the proposed computational method achieved better results compared with other methods such as GOFuzzy, FuzzyK, and FuzzySOM in predicting unknown gene function.
Item Type: | Thesis (PhD) |
---|---|
Additional Information: | Thesis (Ph.D (Sains Komputer)) - Universiti Teknologi Malaysia, 2011; Supervisors : Prof. Dr. Safaai Deris, Dr. Muhamad Razib Othman |
Uncontrolled Keywords: | gene function, expression values, gene ontology |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Divisions: | Computer Science and Information System |
ID Code: | 32110 |
Deposited By: | Narimah Nawil |
Deposited On: | 16 Aug 2013 03:52 |
Last Modified: | 27 May 2018 07:11 |
Repository Staff Only: item control page