Prefetching Vs The Memory System : Optimizations for Multi-core Server Platforms
Files
Publication or External Link
Date
Authors
Advisor
Citation
DRUM DOI
Abstract
This dissertation investigates prefetching scheme for servers with respect to realistic memory systems. A large body of research work has been done in prefetching, even for server workloads that have sparse locality. Real systems disable prefetching in server settings, suggesting that there is a fundamental disconnect between research and practice. Our theory, a major point of this thesis, is that this disconnect is due to the use of simplistic memory models -- and our experimental results show that, among other things, using simplistic models can over-predict the system performance by up to 65%. Our investigation proceeds as follows:
(In)Accuracy of Simplistic Memory Models. We demonstrate the degrees of inaccuracy of models commonly used in system design: in particular, simple models are reasonably accurate when applied to simple systems (e.g. uniprocessors), but they become increasingly inaccurate as the level of complexity of the system grows -- as cores are added, and as prefetching is added.
Memory side prefetching. We then perform a detailed case study of a well known server oriented prefetch scheme -- memory-side sequential prefetch -- to develop understanding of the interaction between prefetch scheme and memory systems. In particular, we find that the projected performance gains fail to materialize due to the lack of locality in the server benchmarks and the bandwidth constraints introduced by the prefetch requests. We conclude that prefetching studies so far have been using the wrong metric to gauge idleness of the memory subsystem and consequently saturate the bus with prefetch requests.
Multi-core Server Prefetching. We use our newfound understanding of prefetch and memory systems interplay to develop a novel scheme for prefetching in server platforms that does interact well with real memory systems. We find that tuning the aggressiveness of prefetching to the average memory latency, which depends on the available bandwidth, performs the best in server platforms.