Comments

Published in Computational Statistics & Data Analysis 32, 119-134.

Abstract

High breakdown estimation (HBE) addresses the problem of getting reliable parameter estimates in the face of outliers that may be numerous and badly placed. In multiple regression, the standard HBE's have been those defined by the least median of squares (LMS) and the least trimmed squares (LTS) criteria. Both criteria lead to a partitioning of the data set's n cases into two “halves” – the covered “half” of cases are accommodated by the fit, while the uncovered “half”, which is intended to include any outliers, are ignored. In LMS, the criterion is the Chebyshev norm of the residuals of the covered cases, while in LTS the criterion is the sum of squared residuals of the covered cases. Neither LMS nor LTS is entirely satisfactory. LMS has a statistical efficiency of zero if the true residuals are normal, and so is unattractive, particularly for large data sets. LTS is preferable on efficiency grounds, but its exact computation turns out to involve an intolerable computational load in any but quite small data sets.

The criterion of least trimmed sum of absolute deviations (LTA) is found by minimizing the sum of absolute residuals of the covered cases. This criterion is not new, but has not been used as widely as we believe it should. We show in this article that LTA is an attractive alternative to LMS and LTS, particularly for large data sets. It has a statistical efficiency that is not much below that of LTS for outlier-free normal data and better than LTS for more peaked error distributions. As its computational complexity is of a lower order than LMS and LTS, it can also be evaluated exactly in much larger samples than either LMS or LTS. Finally, just as its full-sample equivalent, the L1 norm, is robust against outliers on low leverage cases, LTA is able to cover larger subsets than LTS in those data sets where not all outliers are on high leverage cases.

For samples too large for exact evaluation of the LTA, we outline a “feasible solution algorithm”, which provides excellent approximations to the exact LTA solution using quite modest computation.