Universiti Teknologi Malaysia Institutional Repository

Deep learning based methods for molecular similarity searching: a systematic review

Nasser, Maged and Yusof, Umi Kalsom and Salim, Naomie (2023) Deep learning based methods for molecular similarity searching: a systematic review. Processes, 11 (5). pp. 1-27. ISSN 2227-9717

[img] PDF
597kB

Official URL: http://dx.doi.org/10.3390/pr11051340

Abstract

In rational drug design, the concept of molecular similarity searching is frequently used to identify molecules with similar functionalities by looking up structurally related molecules in chemical databases. Different methods have been developed to measure the similarity of molecules to a target query. Although the approaches perform effectively, particularly when dealing with molecules with homogenous active structures, they fall short when dealing with compounds that have heterogeneous structural compounds. In recent times, deep learning methods have been exploited for improving the performance of molecule searching due to their feature extraction power and generalization capabilities. However, despite numerous research studies on deep-learning-based molecular similarity searches, relatively few secondary research was carried out in the area. This research aims to provide a systematic literature review (SLR) on deep-learning-based molecular similarity searches to enable researchers and practitioners to better understand the current trends and issues in the field. The study accesses 875 distinctive papers from the selected journals and conferences, which were published over the last thirteen years (2010–2023). After the full-text eligibility analysis and careful screening of the abstract, 65 studies were selected for our SLR. The review’s findings showed that the multilayer perceptrons (MLPs) and autoencoders (AEs) are the most frequently used deep learning models for molecular similarity searching; next are the models based on convolutional neural networks (CNNs) techniques. The ChEMBL dataset and DrugBank standard dataset are the two datasets that are most frequently used for the evaluation of deep learning methods for molecular similarity searching based on the results. In addition, the results show that the most popular methods for optimizing the performance of molecular similarity searching are new representation approaches and reweighing features techniques, and, for evaluating the efficiency of deep-learning-based molecular similarity searching, the most widely used metrics are the area under the curve (AUC) and precision measures.

Item Type:Article
Uncontrolled Keywords:deep learning, drug design, drug discovery, molecular similarity searching, virtual screening
Subjects:Q Science > Q Science (General)
Divisions:Science
ID Code:106538
Deposited By: Yanti Mohd Shah
Deposited On:09 Jul 2024 06:48
Last Modified:09 Jul 2024 06:48

Repository Staff Only: item control page