Universiti Teknologi Malaysia Institutional Repository

Comparative study of feature selection method of microarray data for gene classification

Ghazali, Nurulhuda (2009) Comparative study of feature selection method of microarray data for gene classification. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information Systems.



Recent advances in biotechnology such as microarray, offer the ability to measure the levels of expression of thousands of genes in parallel. Analysis of microarray data can provide understanding and insight into gene function and regulatory mechanisms. This analysis is crucial to identify and classify cancer diseases. Recent technology in cancer classification is based on gene expression profile rather than on morphological appearance of the tumor. However, this task is made more difficult due to the noisy nature of microarray data and the overwhelming number of genes. Thus, it is an important issue to select a small subset of genes to represent thousands of genes in microarray data which is referred as informative genes. These informative genes will then be classified according to its appropriate classes. To achieve the best solution to the classification issue, we proposed an approach of minimum Redundancy-Maximum Relevance feature selection method together with Probabilistic Neural Network classifier. The minimum Redundancy- Maximum Relevance feature selection method is used to select the informative genes while the Probabilistic Neural Network classifier acts as the classifier. This approach has been tested on a well-known cancer dataset which is Leukemia. The results achieved shows that the gene selected had given high classification accuracy. This reduction of genes helps take out some burdens from biologist and better classification accuracy can be used widely to detect cancer in early stage.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (sains Komputer)) - Universiti Teknologi Malaysia, 2009; Supervisor : Assoc. Prof. Dr. Puteh Saad
Uncontrolled Keywords:biotechnology, cancer classification, gene expression, microarray data
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
R Medicine > RC Internal medicine > RC0254 Neoplasms. Tumors. Oncology (including Cancer)
Divisions:Computer Science and Information System (Formerly known)
ID Code:11502
Deposited By: Zalinda Shuratman
Deposited On:17 Dec 2010 17:49
Last Modified:20 Sep 2017 18:00

Repository Staff Only: item control page