Universiti Teknologi Malaysia Institutional Repository

A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification

Algamal, Zakariya Yahya and Lee, Muhammad Hisyam (2019) A two-stage sparse logistic regression for optimal gene selection in high-dimensional microarray data classification. Advances in Data Analysis and Classification, 13 (3). pp. 753-771. ISSN 1862-5347

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1007/s11634-018-0334-1


The common issues of high-dimensional gene expression data are that many of the genes may not be relevant, and there exists a high correlation among genes. Gene selection has been proven to be an effective way to improve the results of many classification methods. Sparse logistic regression using least absolute shrinkage and selection operator (lasso) or using smoothly clipped absolute deviation is one of the most widely applicable methods in cancer classification for gene selection. However, this method faces a critical challenge in practical applications when there are high correlations among genes. To address this problem, a two-stage sparse logistic regression is proposed, with the aim of obtaining an efficient subset of genes with high classification capabilities by combining the screening approach as a filter method and adaptive lasso with a new weight as an embedded method. In the first stage, sure independence screening method as a screening approach retains those genes representing high individual correlation with the cancer class level. In the second stage, the adaptive lasso with new weight is implemented to address the existence of high correlations among the screened genes in the first stage. Experimental results based on four publicly available gene expression datasets have shown that the proposed method significantly outperforms three state-of-the-art methods in terms of classification accuracy, G-mean, area under the curve, and stability. In addition, the results demonstrate that the top selected genes are biologically related to the cancer type. Thus, the proposed method can be useful for cancer classification using DNA gene expression data in real clinical practice.

Item Type:Article
Uncontrolled Keywords:Cancer classification, Gene selection, Lasso, SCAD, Sparse logistic regression
Subjects:Q Science > QA Mathematics
ID Code:96970
Deposited By: Widya Wahid
Deposited On:06 Sep 2022 15:24
Last Modified:06 Sep 2022 15:24

Repository Staff Only: item control page