Universiti Teknologi Malaysia Institutional Repository

An interative GA-Based approach : gene selection and classification of lung cancer data

Mohamad, Mohd. Saberi and Omatu, Sigeru and Deris, Safaai and Yoshioka, Michifumi (2008) An interative GA-Based approach : gene selection and classification of lung cancer data. In: Advances in Bioinformatics. Penerbit UTM , Johor, 121-131 . ISBN 978-983-52-0624-5



Advances in the area of microarray-based gene expression analyses have led to a promising future of cancer diagnosis using new molecular-based approaches. This microarray technology is used to measure the expression levels of thousands of genes simultaneously, and finally produce microarray data. A comparison between the gene expression levels of cancerous and normal tissues can also be done. This comparison is useful to select those genes that might anticipate the clinical behaviour of cancers. Thus, there is a need to select informative genes that contribute to a cancerous state. However, the gene selection process poses a major challenge because of the characteristics of microarray data: the huge number of genes compared to the small number of samples (higher-dimensional data), irrelevant genes, and noisy data. To overcome the challenge, a gene selection method is used to select a subset of genes that increases the classifier’s ability to classify samples more accurately. The gene selection method has several advantages such as improving classification accuracy, reducing the dimensionality of data, and removing irrelevant and noisy genes. There are two types of gene selection methods (Li et al., 2008; Mohamad et al., 2005): if a gene selection method is carried out independently from a classifier, it belongs to the filter approach; otherwise, it is said to follow a hybrid (wrapper) approach. In the early era of microarray analysis, most previous works have used the filter approach to select genes because it is computationally more efficient than the hybrid approach. However, the hybrid approach usually provides greater accuracy than the filter approach since the genes are selected by considering and optimising relations among genes (Saeys et al., 2007). Until now, several hybrid methods, especially a combination between a genetic algorithm (GA) and a support vector machine (SVM) classifier (GASVM), have been implemented to select informative genes (Li et al., 2008; Mohamad et al., 2005; Mohamad et al., 2009; Peng et al., 2003). The drawbacks of the hybrid methods (GASVM-based methods) in the previous works are: 1) intractable to efficiently produce a near-optimal subset of informative genes when the total number of genes is too large (higher-dimensional data) due to the drawback of binary chromosome representation; 2) the high risk of over-fitting problems. The over-fitting problem that occurred on hybrid methods (e.g., GASVM-based methods) was also reported in a review paper in Saeys et al. (Saeys et al., 2007). In order to overcome the limitations of the previous works and solve the problems derived from microarray data, we propose an iterative approach based on multi-objective GASVM (MOGASVM). The ultimate goal of this paper is to automatically select a near-optimal (smaller) subset of informative genes that is most relevant for the cancer classification. To achieve the goal, we adopt the proposed method. It is evaluated on real microarray data set, namely lung cancer data set.

Item Type:Book Section
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:16785
Deposited By: Liza Porijo
Deposited On:27 Oct 2011 09:53
Last Modified:27 Oct 2011 09:53

Repository Staff Only: item control page