Mohamad Noor, Nurul Fathia and Sipail, Herold Sylvestro and Ahmad, Norulhusna and Annanurov, Bayram and Mohd. Noor, Norliza (2023) COVID-19: symptoms clustering and severity classification using machine learning approach. International Journal of Integrated Engineering, 15 (3). pp. 1-14. ISSN 2229-838X
PDF
771kB |
Official URL: http://dx.doi.org/10.30880/ijie.2023.15.03.001
Abstract
COVID-19 is an extremely contagious illness that causes illnesses varying from either the common cold to more chronic illnesses or even death. The constant mutation of a new variant of COVID-19 makes it important to identify the symptom of COVID-19 in order to contain the infection. The use of clustering and classification in machine learning is in mainstream use in different aspects of research, especially in recent years to generate useful knowledge on COVID-19 outbreak. Many researchers have shared their COVID-19 data on public database and a lot of studies have been carried out. However, the merit of the dataset is unknown and analysis need to be carried by the researchers to check on its reliability. The dataset that is used in this work was sourced from the Kaggle website. The data was obtained through a survey collected from participants of various gender and age who had been to at least ten countries. There are four levels of severity based on the COVID-19 symptom, which was developed in accordance to World Health Organization (WHO) and the Indian Ministry of Health and Family Welfare recommendations. This paper presented an inquiry on the dataset utilising supervised and unsupervised machine learning approaches in order to better comprehend the dataset. In this study, the analysis of the severity group based on the COVID-19 symptoms using supervised learning techniques employed a total of seven classifiers, namely the K-NN, Linear SVM, Naive Bayes, Decision Tree (J48), Ada Boost, Bagging, and Stacking. For the unsupervised learning techniques, the clustering algorithm utilized in this work are Simple K-Means and Expectation-Maximization. From the result obtained from both supervised and unsupervised learning techniques, we observed that the result analysis yielded relatively poor classification and clustering results. The findings for the dataset analysed in this study do not appear to be providing the correct result for the symptoms categorized against the severity level which raises concerns about the validity and reliability of the dataset.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | classification; COVID-19 symptom; machine learning. |
Subjects: | Q Science > Q Science (General) |
Divisions: | Razak School of Engineering and Advanced Technology |
ID Code: | 105718 |
Deposited By: | Muhamad Idham Sulong |
Deposited On: | 13 May 2024 07:17 |
Last Modified: | 13 May 2024 07:17 |
Repository Staff Only: item control page