Experiments show that they are very robust with respect to common natural image transformations, such as scaling, rotation and the introduction of noise and clutter. Based on these features Chapter 3 presents two strategies to build robust mid-level image representations. First, a novel feature grouping method is introduced. The scheme offers a powerful way to combine the advantages of shape-centered interest points, namely robustness and a tight connection to a unique shape, and corner-based interest points, namely strong descriptors. Furthermore, Chapter 3 introduces a novel set of medial feature superpixels, that represent a feed-forward way to divide the image into small, visually-homogeneous regions offering a compact and efficient mid-level representation of the image information. Finally, Chapter 4 bridges the gap between computer vision and the human observer by introducing three applications that employ the shape-centered representations from the two previous Chapters. First a multi-class scene labeling scheme is presented that produces dense annotations of images, combining a local prediction step with a global optimization scheme. Then, Section 4.2 introduces a novel image retrieval tool that operates on highlevel semantic information. Such semantic annotations could be generated by automatic annotation schemes as the one described in the previous Section. Finally, the novel idea of predicting the detectability of a pedestrian in a driver assistance context is put forward and investigated. The different modules of this thesis are tightly connected and inter-dependent, in the framework of shape-centered representations. The connections between the modules avails the possibility to feed information back from higher to lower layers and optimize the design choices there. This thesis provides a framework looking at static phenomena but the presented approach could be extended to the analysis of dynamic scenes as well.