Universiti Teknologi Malaysia Institutional Repository

Topic modelling used to improve arabic web pages clustering

Alghamdi, H. and Selamat, A. (2015) Topic modelling used to improve arabic web pages clustering. In: 1st International Conference on Cloud Computing, ICCC 2015, 26 - 29 April 2015, Riyadh, Saudi Arabia.

Full text not available from this repository.

Official URL: http://dx.doi.org/10.1109/CLOUDCOMP.2015.7149662

Abstract

Topic modelling main purpose is to have machine- understandable and semantic annotation to textual contents of Web.It aim to extract knowledge rather than unrelated information. In this paper, we evaluate the impact of using topic model (which intended to represent the documents like a combination of topics where each topic is a mix of vectors) in improving documents clustering results. We have compared the results of clustering using PLSA or LSA. The experiments performed on a set of common newspaper websites that have highly dimensional data and we use Purity, Mean intra-cluster distance (MICD) and Davies-Bouldin index (DBI) for clustering evaluation. Thus, we acquired favorable clustering results, especially in the context of the Arabic language as PLSA were effective in minimizing MICD, expanding purity and bringing down DBI.

Item Type:Conference or Workshop Item (Paper)
Uncontrolled Keywords:Arabic Text, k-means
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:59525
Deposited By: Haliza Zainal
Deposited On:18 Jan 2017 01:50
Last Modified:26 Sep 2021 15:26

Repository Staff Only: item control page