Home

Hardware Support for Control Transfers in Code Cache


Author(s) : James E. Smith Ho-seop Kim, 
Publisher : N/A
Publication Date : 2003
ISSN : N/A
Abstract : Many dynamic optimization and/or binary translation systems hold optimized/translated superblocks in a code cache. Conventional code caching systems suffer from overheads when control is transferred from one cached superblock to another, especially via register-indirect jumps. The basic problem is that instruction addresses in the code cache are different from those in the original program binary. Therefore, performance for register-indirect jumps depends on the ability to translate efficiently from source binary PC values to code cache PC values. We analyze several key aspects of superblock chaining and find that a conventional baseline code cache with software jump target prediction results in 14.6 % IPC loss versus the original binary. We identify the inability to use a conventional return address stack as the most significant performance limiter in code cache systems. We introduce a modified software prediction technique that reduces the IPC loss to 11.4%. This technique is based on a technique used in threaded code interpreters. A number of hardware mechanisms, including a specialized return address stack and a hardware cache for translated jump target addresses, are studied for efficiently supporting register-indirect jumps. Once all the chaining overheads are removed by these support mechanisms, a superblock-based code cache improves performance due to a better branch prediction rate, improved I-cache locality, and increased chances of straight-line fetches. Simulation results show a 7.7 % IPC improvement over a current generation 4-way superscalar processor. 1.,