Universiti Teknologi Malaysia Institutional Repository

Online forum thread retrieval using data fusion

Abdullah Albahem, Ameer Tawfik (2013) Online forum thread retrieval using data fusion. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing.

[img]
Preview
PDF
1MB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Online forums empower people to seek and share information via discussion threads. However, finding threads satisfying a user information need is a daunting task due to information overload. In addition, traditional retrieval techniques do not suit the unique structure of threads because thread retrieval returns threads, whereas traditional retrieval techniques return text messages. A few representations have been proposed to address this problem; and, in some representations aggregating query relevance evidence is an essential step. This thesis proposes several data fusion techniques to aggregate evidence of relevance within and across thread representations. In that regard, this thesis has three contributions. Firstly, this work adapts the Voting Model from the expert finding task to thread retrieval. The adapted Voting Model approaches thread retrieval as a voting process. It ranks a list of messages, then it groups messages based on their parent threads; also, it treats each ranked message as a vote supporting the relevance of its parent thread. To rank parent threads, a data fusion technique aggregates evidence from threads’ ranked messages. Secondly, this study proposes two extensions of the voting model: Top K and Balanced Top K voting models. The Top K model aggregates evidence from only the top K ranked messages from each thread. The Balanced Top K model adds a number of artificial ranked messages to compensate the difference if a thread has less than K ranked messages (a padding step). Experiments with these voting models and thirteen data fusion methods reveal that summing relevance scores of the top K ranked messages from each thread with the padding step outperforms the state of the art on all measures on two datasets. The third contribution of this thesis is a multi-representation thread retrieval using data fusion techniques. In contrast to the Voting Model, data fusion methods were used to fuse several ranked lists of threads instead of a single ranked list of messages. The thread lists were generated by five retrieval methods based on various thread representations; the Voting Model is one of them. The first three methods assume a message to be the unit of indexing, while the latter two assume the title and the concatenation of the thread message texts to be the units of indexing respectively. A thorough evaluation of the performance of data fusion techniques in fusing various combinations of thread representations was conducted. The experimental results show that using the sum of relevance scores or the sum of relevance scores multiplied by the number of retrieving methods to develop multi-representation thread retrieval improves performance and outperforms all individual representations

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia, 2013; Supervisor : Prof. Dr. Naomie Salim
Uncontrolled Keywords:internet in education education, computer network resources
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computing
ID Code:37016
Deposited By: Fazli Masari
Deposited On:09 Mar 2014 08:39
Last Modified:22 Jun 2017 01:47

Repository Staff Only: item control page