Universiti Teknologi Malaysia Institutional Repository

Automatic construction of generic stop words list for hausa text

Bichi, Abdulkadir Abubakar and Samsudin, Ruhaidah and Hassan, Rohayanti (2022) Automatic construction of generic stop words list for hausa text. Indonesian Journal of Electrical Engineering and Computer Science, 252 (3). pp. 1501-1507. ISSN 2502-4752

[img]
Preview
PDF
348kB

Official URL: http://dx.doi.org/10.11591/ijeecs.v25.i3.pp1501-15...

Abstract

Stop-words are words having the highest frequencies in a document without any significant information. They are characterized by having common relations within a cluster. They are the noise of the text that are evenly distributed over a document. Removal of stop words improve the performance and accuracy of information retrieval algorithms and machine learning at large. It saves the storage space by reducing the vector space dimension, and helps in effective documents indexing. This research generated a list of hausa stop words automatically using aggregated method by combining frequency and statistics methods. The experiments are conducted using a primarily collected hausa corpus consisting of 841 hausa news articles of size 646862 words and finally a list of distinct 81 hausa stop words is generated.

Item Type:Article
Uncontrolled Keywords:document pre-processing, hausa stop words, natural language processing
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:98676
Deposited By: Narimah Nawil
Deposited On:30 Jan 2023 04:50
Last Modified:30 Jan 2023 04:50

Repository Staff Only: item control page