Universiti Teknologi Malaysia Institutional Repository

Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm

Wibowo, M. and Noviyanto, F. (2019) Machine learning technique for enhancing classification performance in data summarization using rough set and genetic algorithm. International Journal of Scientific and Technology Research, 8 (10). pp. 1108-1119. ISSN 2277-8616

Full text not available from this repository.

Official URL: http://www.ijstr.org/final-print/oct2019/Machine-L...


The number of data will grow rapidly and showed a significant increase every day. This data comes from different resources and services that produce a big volume of data that need to manage and reuse or some analytical aspects of the data. These heterogeneous sources of information are able to lead to important challenges for calibration of the model, as the data is often possible to be imprecise, uncertain, ambiguous, and incomplete. Therefore, it needs big storages and this volume of makes operations such as analytical operations, process operations, retrieval operations real difficult and hugely time-consuming. One of the solutions to overcome these difficult problems is to have data summarized to make less storage and extremely shorter time to get processed and retrieved. Data summarization techniques aim than to produce the best quality of summaries. In this study, Rough Set (RS) is proposed to obtain the accuracy, effectiveness and appropriate summary result. However, RS can extract decision rules effectively from given datasets, two processes data discretization and finding reducts are required in order to generate decision rules based on the values. Both processes are known to be Non-Polynomials (NP) problem and are also related to the dimensionality reduction problem. To solve two problems, Genetic Algorithm (GA) is applied to search both the cut points for discretization and the reducts in order to discover the optimal rules. Moreover, the reduction and transformation of the data may shorten the running time, while also allowing the system to obtain more generalized results and improve the predictive accuracy. Therefore, this study proposes the hybrid approach of RS and GA to improve lack of the rough set to ensure of better result. Hybridization of the proposed method hybrid RS-GA is going to overcome the short come of data summarization method. In order to find the efficiency of the proposed work, the classification accuracy obtained using these methods are compared with the accuracy of the proposed hybrid approach. The ML methods were analyzed by comparing the prediction accuracy: Rough Set (RS), NaÏve Bayes (NB), J48, Random Tree (RT) and Projective Adaptive Resonance Theory (PART). The finding shows that RS-GA approach achieved the highest prediction accuracy with 99.95% and produce the lowest error based on API values from Malaysia and Singapore respectively compared to the other ML methods. For that, it was proved that RS-GA is the best performance and the most significant method compared to other methods.

Item Type:Article
Uncontrolled Keywords:machine learning, prediction, data summarization
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
ID Code:90847
Deposited By: Narimah Nawil
Deposited On:31 May 2021 21:20
Last Modified:31 May 2021 21:20

Repository Staff Only: item control page