Universiti Teknologi Malaysia Institutional Repository

Comparison and fusion of retrieval schemes based on different structures, similarity measures and weighting schemes

Wahlan, Mohammed Salem Farag (2006) Comparison and fusion of retrieval schemes based on different structures, similarity measures and weighting schemes. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computer Science and Information System.

[img] PDF (Full text)
Restricted to Repository staff only

1362Kb
[img] PDF
38Kb
[img] PDF
110Kb
[img] PDF
25Kb

Abstract

Many retrieval models and techniques can be applied to retrieve theses that are most relevant to certain queries or concepts. It has been found that different retrieval methods often retrieve different sets of relevant documents. It is therefore anticipated that a particular retrieval method will usually retrieve some relevant theses not retrieved by other methods. Therefore in this study, different methods are used in the theses retrieval, based on different thesis structures, different similarity measures and different weighting schemes. The theses used in this study are collected from FSKSM postgraduate library. Many operations have been applied on the collected theses such as digitizing, stop words removal, stemming and building index. The results from these operations are stored in a database. In this study, 85 theses and 30 queries are used. The comparisons between query and theses were made using five similarity measures with seven weighting schemes using different thesis structures. The results show that the use of bibliography gives poorer results compared to the use of title and abstract alone. In the weighting schemes combinations, the results show that weighting schemes using Cosine and Tanimoto perform well individually but did not do well in the combinations and weighting schemes using Forbes and Russell similarity measures do not do well individually but did well in the combination. In the similarity measures combinations, the results show that the best combination was Cosine using LTU weighting scheme with Russell using LOGG weighting scheme using title structure but using abstract structure, the best combination was Cosine using TFIDF weighting scheme with Forbes using ATFA weighting scheme but it has less performance than the combination of Cosine using LTU weighting scheme with Russell using LOGG weighting scheme using title structure. The overall results show that the best thesis structure is title and the best similarity measure is Cosine with LTU weighting scheme.

Item Type:Thesis (Masters)
Additional Information:Thesis (Master of Science (Computer Science) - Universiti Teknologi Malaysia, 2006; Supervisor : Assoc. Prof. Dr. Naomie Salim
Uncontrolled Keywords:Information retrieval system; term weighting systems; weighting schemes; similarity measures
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:4067
Deposited By: Ms Zalinda Shuratman
Deposited On:25 Jul 2007 01:33
Last Modified:11 Jul 2012 03:59

Repository Staff Only: item control page