Universiti Teknologi Malaysia Institutional Repository

Shape-based two dimensional descriptor for searching molecular database

Hamza, Hentabli (2014) Shape-based two dimensional descriptor for searching molecular database. Masters thesis, Universiti Teknologi Malaysia, Faculty of Computing.

[img]
Preview
PDF
288kB

Official URL: http://dms.library.utm.my:8080/vital/access/manage...

Abstract

Biological functions of compounds can be predicted from similarity of their chemical structures to discover new compounds for drug development. Molecular similarity can also be used to infer unknown functions and side effects of existing drugs. A multitude of molecular similarity methods based on different molecular representations have been used to perform virtual screenings. The molecules are transformed into descriptors to create a chemical database which allows mathematical manipulation and searching of the chemical information contained in the molecules. In this research, a new Shape based Descriptor of Molecule (SBDM) was developed based on the 2-dimensional shape of a chemical compound. The outline shape of a molecule is split into parts that are related in graph connectivity. The first atom in the molecule is determined using the Morgan algorithm. The molecular features, such as atom name, bond type, angle and rings are represented using specific symbols based on some specification rules. Subsequent atoms are scanned in a clockwise direction with respect to the first atom. The scan is repeated until the first atom is reached again. Two similarity measures were used to evaluate the performance of the molecular descriptors, which are the Basic Local Alignment Search Tool (BLAST) and the Tanimoto coefficient. The performance of the SBDM is compared with six standard molecular descriptors. Simulation of virtual screening experiments with the MDL Drug Data Report database show the superiority of the shape-based descriptor, with 19.32 % and 34.13 % in terms of average recall rates for the top of 1 % and 5 % retrieved molecules, respectively, compared to the six standard descriptors mentioned earlier.

Item Type:Thesis (Masters)
Additional Information:Thesis (Sarjana Sains (Sains Komputer)) - Universiti Teknologi Malaysia, 2014
Subjects:Q Science > QA Mathematics > QA76 Computer software
Divisions:Computing
ID Code:48607
Deposited By: Haliza Zainal
Deposited On:15 Oct 2015 01:09
Last Modified:02 Mar 2020 02:34

Repository Staff Only: item control page