Universiti Teknologi Malaysia Institutional Repository

Hybrid differential evolution based automatic single document text summarization

Mohammed Ali Abuobieda, Albaraa Abuobieda (2013) Hybrid differential evolution based automatic single document text summarization. PhD thesis, Universiti Teknologi Malaysia, Faculty of Computing.

[img]
Preview
PDF
309kB

Abstract

Automatic single document text summarization is a process of condensing an input text document. In this process, a summary extraction approach summarizes a document by extracting the most informative sentences in a document. To select such sentences, a sentence scoring approach is used to assign a score for each input sentence before ranking them accordingly. Based on user defined summary ratio, only top ranked sentences are selected to be part of the summary and selecting the most informative sentences is a challenge for extractive based automatic text summarization researchers. Thus, this research proposed extraction based automatic single document text summarization methods by investigating a single meta-heuristic evolutionary algorithm called Differential Evolution (DE) to generate high quality summaries. The DE algorithm is used (i) to find out the best feature weight score to discriminate between important and non-important features, (ii) to perform as a cluster machine learning method using Normalized Google Distance and Jaccard similarity measures to generate a highly diversed summary, (iii) to employ opposition-based learning (OBL) approach to improve the performance of the DE algorithm and (iv) to develop a hybrid model used to investigate the adavantages of the combination of feature weighting, diversity and OBL approaches. To evaluate the proposed methods, the standard dataset from Document Understanding Conference (DUC) 2002 and the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as the standard evaluation measurement toolkit were used. Experimental results showed that the hybrid models as well as all the proposed individual methods performed well for text summarization as compared to four benchmark methods: Microsoft Word, Copernic, the best DUC 2002, the worst DUC 2002 summarizers and a human against another human summarizer. In addition, the proposed methods in the DE algorithm outperformed Genetic Algorithm and fuzzy swarm diversity based methods evolutionary based algorithms. The results of the experiments have proven that the proposed hybrid models generate better quality text-summaries.

Item Type:Thesis (PhD)
Additional Information:Thesis (Ph.D (Sains Komputer)) - Universiti Teknologi Malaysia, 2013; Supervisor : Prof. Dr. Naomie Salim
Uncontrolled Keywords:text processing (computer science), computational linguistics
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:38967
Deposited By: Fazli Masari
Deposited On:23 Jun 2014 08:31
Last Modified:22 Jun 2017 02:47

Repository Staff Only: item control page