Universiti Teknologi Malaysia Institutional Repository

Ensemble filters with harmonize algorithm for optimal solutions in medical datasets

Tengku Ab. Hamid, Tengku Mazlin (2021) Ensemble filters with harmonize algorithm for optimal solutions in medical datasets. Masters thesis, Universiti Teknologi Malaysia.

[img] PDF
535kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Explosive increases of features in high dimensional datasets remains a challenge for data analysis in various research fields, especially the medical diagnosis sector, as it may affects the treatment received by the patients. Besides data dimensionality, classifiers such as Support Vector Machine (SVM) still lacks consistency in achieving an optimal performance due to improper kernel parameter settings. Commonly, the filter algorithm is frequently used for selecting relevant features due to its simple ranking strategies. However, most independent filter algorithms do not consider the intercorrelation between features, where a less dependent feature is the leading cause of why some features render irrelevant. Consequently, an imbalance number of features that could degrade the classification accuracy was produced. This problem can be alleviated using ensemble feature selection approach to identify the appropriate number of features by considering features dependency. In this study, an ensemble filters feature selection with harmonize classification algorithm has been proposed. The ensemble filters using Information Gain, Gain Ratio, Chi-squared and Relief-F are utilized with occurrence rate evaluation to identify the initial top-ranked features relevant for classification. A harmonize classification method is implemented using Particle Swarm Optimization (PSO) and SVM to synchronously determine the optimum kernel parameters and significant features as the optimal solution. The proposed method is evaluated on four medical datasets with different sizes in terms of accuracy, sensitivity, specificity, and Area under the Curve (AUC). Experimental results showed that the accuracy of the proposed method successfully increases significantly in each dataset by 96.15%, 95.41%, 96.62% and 96.50% with an optimal solution than conventional SVM. Via 10-fold cross-validation, the proposed method also signifies better classification performance compared to other existing methods. Therefore, the proposed method applies to handle high dimensional medical datasets for accurate disease prediction.

Item Type:Thesis (Masters)
Uncontrolled Keywords:Support Vector Machine (SVM), Particle Swarm Optimization (PSO), Area under the Curve (AUC)
Subjects:Q Science > QA Mathematics > QA76 Computer software
Divisions:Computing
ID Code:102978
Deposited By: Widya Wahid
Deposited On:12 Oct 2023 08:34
Last Modified:12 Oct 2023 08:34

Repository Staff Only: item control page