Universiti Teknologi Malaysia Institutional Repository

On the significance of topological-indices based non-binary molecular similarity measures

N., Salim and Holliday, John and Willett, Peter (2004) On the significance of topological-indices based non-binary molecular similarity measures. Sains Malaysiana, 33 (2). ISSN 0126-6039

[img] PDF - Published Version
14Kb

Official URL: http://www.ukm.my/jsm/english_journals/vol33num2_2...

Abstract

This paper describes experiments to study on how well the whole range of topological indices-based non-binary similarity values represents the physicochemical similarities between compounds. Measured log P values have been compared with the log P values predicted from compounds at different range of similarities calculated based on various topological indices of the compounds. Analysis shows that the non-binary Cosine, Simpson and Pearson coefficients might give misleading results when certain compounds are compared. Similarity values involving 1% most similar compounds based on the non-binary Tanimoto or Euclidean coefficients has been found to be able to represent physicochemical similarities between the molecules compared. Therefore, for searches requiring around 1% most similar compounds, rational selection methods based on the non-binary Tanimoto or Euclidean coefficients are likely to produce better results than random selection. Similarity values involving 5% most dissimilar compounds based on the non-binary Tanimoto coefficients has also been found to be able to represent physicochemical dissimilarities between the molecules compared. Therefore, for diverse selection requiring less than 5% most dissimilar compounds, rational selection methods based on the non-binary Tanimoto coefficient is likely to produce better results than random selection. However, in both focused and diverse selection using the coefficients mentioned, as more and more compounds are selected, the selection becomes more and more like random selection in terms of physicochemical properties similarity and dissimilarity. Kertas kerja ini membincangkan mengenai kajian untuk melihat sejauh mana nilai keserupaan bukan binari yang dihasilkan melalui perbandingan indeks topologi sebatian mampu mewakili perbezaan atau keserupaan ciri fizikal dan kimia sebatian yang dibandingkan. Di dalam kajian ini, nilai log P yang diperolehi daripada ujikaji makmal telah dibandingkan dengan nilai log P jangkaan yang diambil daripada purata log P sebatian yang mempunyai pelbagai julat nilai keserupaan tertinggi berdasarkan perbandingan indeks tolopologi kesemua sebatian di dalam pangkalan data dengan sebatian berkenaan. Analisa menunjukkan yang pengiraan keserupaan bukan binari menggunakan angkali Cosine, Simpson dan Pearson boleh memberikan nilai keserupaan yang mengelirukan apabila sesetengah jenis sebatian dibandingkan. Nilai keserupaan yang melibatkan 1% sebatian paling serupa berdasarkan angkali Tanimoto atau Euclidean didapati mampu menggambarkan keserupaan ciri fizikal dan kimia sebatian yang dibandingkan. Justeru, carian atau pemilihan berfokus bagi mendapatkan 1% sebatian paling serupa dengan sesuatu sebatian menggunakan angkali Tanimoto dan Euclidean ke atas perwakilan bukan binari sebatian dijangka berkecenderungan memberikan hasil yang lebih memuaskan berbanding dengan pemilihan rambang. Nilai keserupaan yang melibatkan 5% sebatian paling berbeza berdasarkan angkali Tanimoto juga didapati mampu menggambarkan perbezaan ciri fizikal dan kimia molekul yang dibandingkan. Ini menunjukkan yang pemilihan rasional berdasarkan angkali Tanimoto bagi memilih subset yang terdiri daripada 5% molekul paling rencam dari sebuah pangkalan data molekul yang mempunyai perwakilan bukan binari berkecenderungan untuk memberikan hasil yang lebih baik daripadapemilihan secara rambang. Walau bagaimanapun. di dalam kedua-dua pemilihan berfokus atau rencam menggunakan angkali yang dinyatakan, semakin banyak sebatian yang dipilih, hasil yang didapati semakin menyerupai pemilihan secara rawak dari segi keserupaan atau kerencaman ciri fizikal dan kimia.

Item Type:Article
Uncontrolled Keywords:physicochemical properties,molecular
Subjects:T Technology > T Technology (General)
Divisions:Computer Science and Information System (Formerly known)
ID Code:28187
Deposited By: Nurul Asilah Mahmood
Deposited On:18 Sep 2012 06:14
Last Modified:18 Sep 2012 06:14

Repository Staff Only: item control page