|
Abstract : |
Technology trends are making communication, both on and off the microprocessor chip, more expensive relative to computation. In this dissertation, it is shown how a current-generation microprocessor spends over two-thirds of its time performing no useful work, stalled for memory. For the aggressive, modern processors that were measured, over half of the stalls due to memory result from insufficient memory bandwidth, as opposed to bank access or data transmission latency. While bandwidth limitations can be obviated by paying a sufficiently high price, in this dis-sertation hardware techniques to mitigate bandwidth-related performance losses are explored. The efficiency of caches is measured, showing that the fraction of useful data in the cache over time is generally under 20%. A theoretical lower bound is placed on the amount of bus traffic that a cache may produce, and it is shown that current caches generally produce one to two orders of magnitude more traffic than is necessary. A number of solutions are proposed for reducing traffic to improve performance. Two tech-niques are measured that dynamically adapt what is fetched upon a block miss, filtering, |