Seminar

Powerful and accurate multivariate outlier detection with high-breakdown estimators

Andrea Cerioli (University of Parma)

June 1, 2010, 14:00–15:30

Toulouse

Room MF 323

Statistics Seminar

Abstract

An effective method for outlier detection should both identify a large portion of the outliers when they are present in the data, and provide a small number of false alarms when there is no contamination. However basic they may seem, these conflicting requirements are the enemy brothers of statistically principled outlier detection rules. We describe a compromise strategy between them in the multivariate framework, when location and scatter are estimated by the Reweighted Minimum Covariance Determinant (RMCD) method. For this purpose, we address two basic issues. First, we describe an approximation to the exact distribution of robust distances from which reliable cut-off values can be obtained even in small samples. Second, we investigate multiplicity issues arising when several outliers are present. We describe how careful choice of the error rate which is controlled during the outlier detection process can yield the required compromise, when alternatives to strong control of the Family Wise Error Rate are considered.