Our approach to overcome this limit are skeletons, also called combinators or templates. They are polymorphic program schemata that have an efficient, possibly parallel, implementation. Skeletons are embedded into a functional source language and receive a special treatment in the compilation of the program to target code. We distinguish two views of a skeleton: the user's view and the implementer's view. A skeleton appears in the program as a single function call which is specialized by the user with problem-specific customizing functions. We focus on skeletons for the divide-and-conquer (DC) paradigm, because DC algorithms provide a high potential for parallelism. In DC, two of the customizing functions describe how the problem is divided into subproblems and how the partial solutions are combined. Starting from a skeleton that represents a general kind of DC, we derive, successively, special cases which lead to an efficient parallel implementation. The functional definitions of the skeletons and the programs which use them are denoted in the language Haskell.
The implementer views a skeleton as an additional module of the compiler, which generates, for every skeleton application, an appropriate, optimized implementation in the target language. Thus, know-how of a problem domain, e.g., DC, can be added to the compiler. The straight-forward way to parallelize DC, by simply spawning independent tasks, can cause a serious load imbalance. Load balance can be established by an appropriate mapping of operations to time steps and processors. For the more specialized skeletons, this can be done completely at compile time. Otherwise, the space-time mapping can be expressed in dependence of run-time information. We transform the recursive definition of each of the DC skeletons presented to an index-based form in order to apply a space-time mapping to the set of operations. A transformation consists of a sequence of semantics-preserving rule applications. The power of recursion motivates the introduction of a computational model which is more sophisticated than the polytope model used for nested loops. The new model supports a potentially unbounded dimensionality of the index space.
A subset of the functional language Haskell, called HDC, is used for experiments with example programs. Contrary to Haskell, HDC is strict. This is necessary to guarantee that the space-time mapping made at compile-time is respected by the execution. We present compilation techniques that transform polymorphic, higher-order functional programs into functional programs which can easily be compiled into C and linked together with the skeleton implementations.
Experimental results are presented for Karatsuba's polynomial multiplication, the n queens problem, the maximum independent set problem, the convex hull computation and sorting. It turns out that significant speedups can be achieved for realistic applications, but under special conditions on the algorithmic structure. Some of the skeletons enforce these conditions by construction. For the others, there is still a high potential for optimization left for the future.