Universiti Teknologi Malaysia Institutional Repository

The development of semantic meta-database: an ontology based semantic integration of biological databases

Samsudin, Ruhaidah and Deris, Safaai and Othman, Muhammad Razib and Md. Illias, Rosli (2007) The development of semantic meta-database: an ontology based semantic integration of biological databases. Project Report. Faculty of Computer Science and Information System, Skudai, Johor. (Unpublished)

[img] PDF (Full Text)
1410Kb

Abstract

Protein sequence annotation is important for the preservation and reuse of knowledge, for content-based queries, and for the understanding of its function. Traditional wet-lab methods are labor intensive and prone to human error. Alternatively, existing tools are time intensive and require high investment in computing facilities for offline usage. On the other hand, these tools are highly dependent on internet stability and speed for online usage. Therefore, a simple and practical computational method that is more accurate, faster, easy to configure and use, and bears low computing cost is needed particularly for offline usage. In this study, a Gene Ontology (GO) based protein sequence annotation tool named extended UTMGO is developed to meet these features. The GO is selected because of its ability to provide dynamic, precisely defined, structured, and controlled terms that describe genes and their functions and products in any organism. Furthermore, the GO terms are linked with gene products and their protein sequences from various species provided by Gene Ontology Annotation (GOA). Thus, assigning highly correlated GO terms of annotated protein sequences to partially annotated or newly discovered protein sequences can be made. The tool comprises two intelligent algorithms. The first algorithm combines parallel genetic algorithm with the split-and-merge algorithm. The idea is to cluster the GO terms into number k of clusters in order to split the monolithic GO RDF/XML file into smaller files. Thus, it enables protein sequences and Inferred from Electronic Annotation (IEA) evidence associations to be included in those files. The second algorithm incorporates parallel genetic algorithm with the semantic similarity measure algorithm. The motive is to search for a set of semantically similar GO terms from the fragmented GO RDF/XML files to a given query. In addition, its basic version which is a GO browser based on semantic similarity search is also introduced to overcome the weaknesses of conventional approach: the keyword matching.

Item Type:Monograph (Project Report)
Uncontrolled Keywords:Ontology, database integration, semantic database, molecular biological database, bioinformatics
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System (Formerly known)
ID Code:4141
Deposited By: Noor Aklima Harun
Deposited On:18 Feb 2008 08:40
Last Modified:01 Jun 2010 03:15

Repository Staff Only: item control page