The special architecture of the Cell BE processor has made scientists revisit the problem of sorting. This paper implements and tests a variant of merge sort where a number of 2-to-1 mergers are connected in a pipelined tree.
For large trees there are many more such mergers than processors which means they must be mapped to the processors in some way. Optimized mappings are tested and results show that changing the model used when optimizing might be beneficiary. It is also shown that the small size of the local storages on the co-processors is not limiting the performance.
Source: Linköping University
Author: Hultén, Rikard