Universiti Teknologi Malaysia Institutional Repository

The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression

Mohd. Azmi, Nurulhuda Firdaus and Midi, Habshah and Ismail, Noranita Fairus (2006) The performance of clustering approach with robust mm-estimator for multiple outlier detection in linear regression. Jurnal Teknologi C (45C). pp. 15-28. ISSN 0126-9797

[img]
Preview
PDF
142kB

Official URL: http://www.penerbit.utm.my/onlinejournal/45/C/JTDI...

Abstract

Identifying outlier is a fundamental step in the regression model building process. Outlying observations should be identified because of their potential effect on the fitted model. As a result of the need to identify outliers, numerous outlying measures such as residuals and hat matrix diagonal are built. However, these outlying measures works well when a regression data set contains only a single outlying point and it is well established that regression real data sets may have multiple outlying observations that individually are not easy to identify by the same measures. In this paper, an alternative approach is proposed, that is clustering technique incorporated with robust estimator for multiple outlier identification. The robust estimator proposes is MM-Estimator. The performance of clustering approach with proposed estimator is compared with other estimator that is the classical estimator namely Least Square (LS) and other robust estimator that is Least Trimmed Square (LTS). The evaluation of the estimator performance is carried out through analyses on a classical multiple outlier data sets found in the literature and simulated multiple outlier data sets. Additionally, the analysis of Root Mean Square Error (RMSE) value and coverage probabilities of Bootstrap Bias Corrected and Accelerated (BCa) confidence interval are also being conducted to identify the best estimator in identification of multiple outliers. From the analysis, it has been revealed that the MMEstimator performed excellently on the classical multiple outlier data sets and a wide variety of simulated data sets with any percentage of outliers, any number of regressor variables and any sample sizes followed by LTS and LS. The analysis also showed that the value of RMSE of the proposed estimator is always smaller than the other two estimators. Whereupon, the coverage probabilities of BCa confidence interval also conclude that the MM-Estimator confidence interval have all the criteria’s to be the best estimator since it has a good coverage probabilities, good equatailness and the shortest average confident length followed by LTS and LS.

Item Type:Article
Uncontrolled Keywords:Multiple outliers, linear regression, robust estimator, MM-Estimator, Bootstrap Bias Corrected and Accelerated (BCa) confidence interval
Subjects:Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Divisions:Computer Science and Information System
ID Code:7941
Deposited By: Norhayati Abu Ruddin
Deposited On:26 Feb 2009 08:44
Last Modified:09 Nov 2010 09:51

Repository Staff Only: item control page