Comparisons of Some Outlier Detection Methods in Linear Regression Model
Keywords:
Mahalanobis distance, Cooks’ distance, Masking Effect, DFBETAS, DFFITSAbstract
Empirical evidence suggests unusual or outlying observations in data sets are much more prevalent than one might expect and therefore this paper addresses multiple outliers in linear regression model. Although reliable for a single or a few outliers, standard diagnostic techniques from an Ordinary Least Squares (OLS) fit can fail to identify multiple outliers. The parameter estimates, diagnostic quantities and model inferences from the contaminated data set can be significantly different from those obtained with the clean data. A regression outlier is an observation that has an unusual value of the dependent variable Y, conditional on its value of the independent variable X. Four procedures for detecting outliers in linear regression were compared; the Cook’s, DFFITS, DFBETAS, and Mahalanobi’s distances. DFBETAS is most efficient in outlier detection for small sample and small percentage of outliers but has low sensitivity when the sample size is large. Mahalanobi has more power of detection of small percentage of outliers regardless of sample size.