Universiti Teknologi Malaysia Institutional Repository

Gene selection and classification in autism gene expression data

Al-Jaf, Shilan Sameen Hameed (2017) Gene selection and classification in autism gene expression data. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing.

[img]
Preview
PDF
551kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Autism spectrum disorders (ASD) are neurodevelopmental disorders that are currently diagnosed on the basis of abnormal stereotyped behaviour as well as observable deficits in communication and social functioning. Although a variety of candidate genes have been attributed to the disorder, no single gene is applicable to more than 1–2% of the general ASD population. Despite extensive efforts, definitive genes that contribute to autism susceptibility have yet to be identified. The major problems in dealing with the gene expression dataset of autism include the presence of limited number of samples and large noises due to errors of experimental measurements and natural variation. In this study, a systematic combination of three important filters, namely t-test (TT), Wilcoxon Rank Sum (WRS) and Feature Correlation (COR) are applied along with efficient wrapper algorithm based on geometric binary particle swarm optimization-support vector machine (GBPSO-SVM), aiming at selecting and classifying the most attributed genes of autism. A new approach based on the criterion of median ratio, mean ratio and variance deviations is also applied to reduce the initial dataset prior to its involvement. Results showed that the most discriminative genes that were identified in the first and last selection steps concluded the presence of a repetitive gene (CAPS2), which was assigned as the most ASD risk gene. The fused result of genes subset that were selected by the GBPSO-SVM algorithm increased the classification accuracy to about 92.10%, which is higher than those reported in literature for the same autism dataset. Noticeably, the application of ensemble using random forest (RF) showed better performance compared to that of previous studies. However, the ensemble approach based on the employment of SVM as an integrator of the fused genes from the output branches of GBPSO-SVM outperformed the RF integrator. The overall improvement was ascribed to the selection strategies that were taken to reduce the dataset and the utilization of efficient wrapper based GBPSO-SVM algorithm.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia, 2019; Supervisor : Dr. Rohayanti Hassan
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:86999
Deposited By: Fazli Masari
Deposited On:31 Oct 2020 12:16
Last Modified:31 Oct 2020 12:16

Repository Staff Only: item control page