### On data depth with application to regression models and tests

### Robin Wellmann

#####
ISBN 978-3-8325-1901-8

161 pages, year of publication: 2008

price: 40.50 €

This thesis investigates notions of data depth and how they can be used to derive asymptotic tests for general linear regression. These depth notions are the tangent depth and the global depth of Mizera, which are generalizations of the regression depth of Rousseeuw and Hubert, but also a generalization of the simplicial depth of Liu is considered. Properties of these depths are derived as general as possible and partially hold not only for regression.

A disadvantage of the simplicial depth of Liu and also of the simplicial regression depth is that the simplicial depth is much higher at certain hyperplanes of the parameter space than in the surrounding of the hyperplanes. A harmonized depth and a harmonized simplicial depth are introduced to overcome the problem of hyperplanes with high depth. For a rather general model it is shown that global depth, tangent depth and harmonized depth are equal with probability one. The boundaries of the level sets of the tangent depth are characterized in the general case, which is helpful for calculating the maximum tangent depth or the maximum simplicial depth within subsets of the parameter space, which are regarded in test problems.

The simplicial depth, which is defined as the U-statistic with the harmonized depth as the kernel function, is degenerated in most cases. Degenerated U-statistics have asymptotically the same distribution as an infinite linear combination of chi-square-distributed random variables, which can be complicated to derive. The asymptotic distribution of the simplicial depth is characterized under general assumptions for extended linear regression. This representation shows, that the asymptotic distribution does not depend on the underlying parameter, if the distribution of the regressors does not depend on it, so that asymptotic tests can be derived. The characterization is spezified for polynomial regression, some cases of multiple regression and multiple regression through the origin.

Based on the asymptotic distributions, tests are proposed for testing arbitrary hypotheses about the unkown parameter. These parametric tests are distribution free or have only few preconditions on the distribution. Algorithms for the calculation of the test statistics are given, where the null hypothesis corresponds to an arbitrary affine subspace or polyhedron within the parameter space.