Universiti Teknologi Malaysia Institutional Repository

A soft hierarchical algorithm for the clustering of multiple bioactive chemical compounds

Salim, Naomie and Shah, J. Z. (2007) A soft hierarchical algorithm for the clustering of multiple bioactive chemical compounds. In: Bioinformatics Research and Development. Springer Berlin / Heidelberg, pp. 140-153. ISBN 978-3-540-71232-9

[img] PDF
Restricted to Repository staff only

595kB

Official URL: http://dx.doi.org/10.1007/978-3-540-71233-6_12

Abstract

Most of the clustering methods used in the clustering of chemical structures such as Wards, Group Average, K- means and Jarvis-Patrick, are known as hard or crisp as they partition a dataset into strictly disjoint subsets; and thus are not suitable for the clustering of chemical structures exhibiting more than one activity. Although, fuzzy clustering algorithms such as fuzzy c-means provides an inherent mechanism for the clustering of overlapping structures (objects) but this potential of the fuzzy methods which comes from its fuzzy membership functions have not been utilized effectively. In this work a fuzzy hierarchical algorithm is developed which provides a mechanism not only to benefit from the fuzzy clustering process but also to get advantage of the multiple membership function of the fuzzy clustering. The algorithm divides each and every cluster, if its size is larger than a pre-determined threshold, into two sub clusters based on the membership values of each structure. A structure is assigned to one or both the clusters if its membership value is very high or very similar respectively. The performance of the algorithm is evaluated on two bench mark datasets and a large dataset of compound structures derived from MDL MDDR database. The results of the algorithm show significant improvement in comparison to a similar implementation of the hard c-means algorithm.

Item Type:Book Section
Uncontrolled Keywords:cluster analysis, chemoinformatics, fuzzy c-means, bioinformatics, chemical information systems
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System
ID Code:9630
Deposited By: Salasiah M Said
Deposited On:14 Jan 2010 01:01
Last Modified:03 Sep 2017 10:00

Repository Staff Only: item control page