Universiti Teknologi Malaysia Institutional Repository

Improving hate speech detection using machine and deep learning techniques: A preliminary study

Ahmed Siddiqui, Jawaid and Yuhaniz, Siti Sophiayati and Memon, Zulfiqar Ali and Amin, Yumna (2021) Improving hate speech detection using machine and deep learning techniques: A preliminary study. Open International Journal of Informatics (OIJI), 9 (2). pp. 21-34. ISSN 2289-2370

[img]
Preview
PDF
382kB

Official URL: https://oiji.utm.my/index.php/oiji/article/view/14...

Abstract

The increasing use of social media and information sharing has given major benefits to humanity. However, this has also given rise to a variety of challenges including the spreading and sharing of hate speech messages. Thus, to solve this emerging issue in social media, recent studies employed a variety of feature engineering techniques and machine learning or deep learning algorithms to automatically detect the hate speech messages on different datasets. However, most of the studies classify the hate speech related message using existing feature engineering approaches and suffer from the low classification results. This is because, the existing feature engineering approaches suffer from the word order problem and word context problem. In this research, identifying hateful content from latest tweets of twitter and classify them into several categories is studied. The categories identified are; Ethnicity, Nationality, Religion, Gender, Sexual Orientation, Disability and Other. These categories are further classified to identify the targets of hate speech such as Black, White, Asian belongs to Ethnicity and Muslims, Jews, Christians can be classified from Religion Category. An evaluation will be performed among the hateful content identified using deep learning model LSTM and traditional machine learning models which includes Linear SVC, Logistic Regression, Random Forest and Multinomial Nai¨ve Bayes to measure their accuracy and precision and their comparison on the live extracted tweets from twitter which will be used as our test dataset.

Item Type:Article
Uncontrolled Keywords:Hate speech, machine learning, Classification, Categorization, Random Forest, Logistic Regression and Multinomial Naïve Bayes
Subjects:T Technology > T Technology (General)
Divisions:Razak School of Engineering and Advanced Technology
ID Code:98421
Deposited By: Widya Wahid
Deposited On:08 Jan 2023 01:58
Last Modified:08 Jan 2023 01:58

Repository Staff Only: item control page