Universiti Teknologi Malaysia Institutional Repository

A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data

Wang, W. and Cheng, K. K. and Deng, L. and Xu, J. and Shen, G. and Griffin, J. L. and Dong, J. (2017) A clustering-based preprocessing method for the elimination of unwanted residuals in metabolomic data. Metabolomics, 13 (1). ISSN 1573-3882

Full text not available from this repository.

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....


Introduction: The metabolome of a biological system is affected by multiple factors including factor of interest (e.g. metabolic perturbation due to disease) and unwanted factors or factors which are not primarily the focus of the study (e.g. batch effect, gender, and level of physical activity). Removal of these unwanted data variations is advantageous, as the unwanted variations may complicate biological interpretation of the data. Objectives: We aim to develop a new unwanted variations elimination (UVE) method called clustering-based unwanted residuals elimination (CURE) to reduce metabolic variation caused by unwanted/hidden factors in metabolomic data. Methods: A mean-centered metabolomic dataset can be viewed as a combination of a studied factor matrix and a residual matrix. The CURE method assumes that the residual should be normally distributed if it only contains inter-individual variation. However, if the residual forms multiple clusters in feature subspace of principal components analysis or partial least squares discriminant analysis, the residual may contain variation due to unwanted factors. This unwanted variation is removed by doing K-means data clustering and removal of means for each cluster from the residuals. The process is iterated until the residual no longer forms multiple clusters in feature subspace. Results: Three simulated datasets and a human metabolomic dataset were used to demonstrate the performance of the proposed CURE method. CURE was found able to remove most of the variations caused by unwanted factors, while preserving inter-individual variation between samples. Conclusion: The CURE method can effectively remove unwanted data variation, and can serve as an alternative UVE method for metabolomic data.

Item Type:Article
Uncontrolled Keywords:Data Analysis, Metabolomics
Subjects:T Technology > TP Chemical technology
Divisions:Chemical Engineering
ID Code:76957
Deposited By: Fazli Masari
Deposited On:30 Apr 2018 22:27
Last Modified:30 Apr 2018 22:27

Repository Staff Only: item control page