Signal processing in advanced radar systems requires high computation performance. Therefore, Multicore processor architectures are of increasing interest for science and industry, as an enabling technology for the implementation of radar signal processing.
Four versions of multicore processors studied in this thesis: 1) 8 cores with 1 shared cache, 2) 8 cores with 8 private caches, 3) 32 cores with 1 shared cache, and 4) 32 cores with 32 private caches.
The focusing of this study is to evaluate the performances of their memory architectures. The studied multicore architectures have been simulated using the ThreadSpotter tool and using threads as abstraction for concurrently executing cores.
In order to evaluate the cache architectures of the studied set of processors, we use four benchmarks, (Tilted Memory Read, Cubic interpolation, Bi-cubic interpolation and Covariance matrix estimation in STAP), based on HPEC, (High Performance Embedded Computing), Challenge Benchmark Suite.
These benchmarks have been chosen to simulate different kinds of memory access patterns in radar signal processing. The original benchmark code has been modified and implemented using OpenMP.
The selected benchmarks have been analysed using the ThreadSpotter tool. Conclusions have been drawn according to some indicators, for instance the Fetch Ratio and the Fetch Utilization, which is generated by the ThreadSpotter.
The processor with 8 cores and private caches achieved the best performance thanks to its private cache that can avoid some race conditions and false sharing effects. The processor with 32 cores and private caches obtained the worst performance during almost all the experiments, due to its smaller private caches, which do not have enough capacity to hold useful cache lines.
Source: Halmstad University
Authors: Wu, Jinfeng | Lv, Gaofei