Universiti Teknologi Malaysia Institutional Repository

Improved cluster partition in principal component analysis guided clustering

Shaharudin, S. M. and Ahmad, Norhaiza and Yusof, Fadhilah (2013) Improved cluster partition in principal component analysis guided clustering. International Journal of Computer Applications, 75 (11). pp. 23-25. ISSN 0975-8887

Full text not available from this repository.

Official URL: http://dx.doi.org/10.5120/13156-0839


Principal component analysis (PCA) guided clustering approach is widely used in high dimensional data to improve the efficiency of K- means cluster solutions. Typically, Pearson correlation is used in PCA to provide an eigen-analysis to obtain the associated components that account for most of the variations in the data. However, PCA based Pearson correlation can be sensitive on non-Gaussian distributed data, which involve skewed observations such as outlying values. Thus, applying PCA based Pearson correlation on such data could affect cluster partitions and generate extremely imbalanced clusters in a high dimensional space. In this study, Tukey's biweight correlation based on M-estimate approach in PCA is used as an alternative to Pearson correlation. This approach is more resistant to outlying values as it examines each observation and down weight those that lie far from the center of the data. In particular two major features are highlighted: (1) fewer components are retained and imbalanced clusters at the recommended cumulative percentage of variation threshold is avoided; (2) the cluster quality with respect to external, internal and relative criteria as shown in Rand, Silhouette and Davies-Bouldin indices, outperform that of the clusters from PCA based Pearson correlation.

Item Type:Article
Uncontrolled Keywords:Tukey's biweight, k-means, principal component analysis
Subjects:Q Science
ID Code:40261
Deposited By: Narimah Nawil
Deposited On:19 Aug 2014 11:38
Last Modified:17 Mar 2019 12:21

Repository Staff Only: item control page