Technology & Research

Intel® Technology Journal Home

Volume 12, Issue 03

Original 45nm Intel® Core™ Microarchitecture


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1203.03

  • Volume 12
  • Issue 03
  • Published November 7, 2008

Original 45nm Intel® Core™ Microarchitecture

  Section 11 of 15  

Improvements in the Intel® Core™2 Processor Family Architecture and Microarchitecture

RENAMED RSB

A Renamed Return Stack Buffer (RRSB) was added to improve performance by increasing return prediction accuracy. The goal was to supplement the existing RSB by providing a recovery mechanism from a common source of RSB corruption.

Background

A single function (or procedure) can be called from multiple places within a program by using a “CALL” instruction. Exiting the function back to the calling program can be done with a “RET” (return) instruction. The CALL instruction is similar to a direct jump that also pushes the RET address onto the stack (in memory). The RET instruction is an indirect jump whose target address is popped from the stack.

The processor's Branch Prediction Unit (BPU) shares both its bimodal prediction resources to accurately predict the existence of CALL or RET instructions and also its Branch Target Buffer (BTB) to predict the target of a direct CALL. However, the target of a RET instruction is dependent on the CALL, so the Return Stack Buffer (RSB) is used.

All P6 microprocessors have implemented the RSB as a simple push pop stack structure. This “classic” RSB (CRSB) has the following basic behavior:

  1. The BPU uses its Linear Instruction Pointer (LIP) to predict a CALL instruction.
  2. The BPU “pushes” the CALL's Next Linear Instruction Pointer (NLIP) onto the CRSB stack.
  3. The BPU predicts the target of the CALL from the BTB and redirects the instruction flow.
  4. Later, the BPU predicts a RET instruction based on its LIP.
  5. The BPU predicts the target of the RET from the CRSB and redirects the instruction flow.



Figure 10: CRSB vs. RRSB

CRSB corruption

Useful RET predictions in the CRSB are sometimes overwritten by bogus speculative updates. These bogus updates should be corrected after a branch misprediction to ensure accuracy. This requires saving the CRSB state for each potential misprediction and restoring that state after misprediction recovery. Practically, however, we can save only the CRSB Top-Of-Stack (TOS) pointer that is stored in the Branch Information Table (BIT). When the CRSB TOS is restored from the BIT, the contents may have been overwritten while traversing down the bogus path. For instance, if the bogus path has a RET followed by a CALL, a valid return address will be overwritten that will later result in a performance penalty. The TOS pointer will be restored, but the CRSB contents are corrupted. (Figure 10) (top) describes this common CRSB corruption scenario.

Renamed RSB implementation

To address this corruption, we added the “Renamed RSB” (RRSB) to the family of processors. The RRSB is similar to the CRSB, but it incorporates an additional pointer (Alloc) and a linked-list structure for updating the TOS. (Figure 10) (bottom) shows how the RRSB is able to recover from bogus updates. The pointers are updated as follows:

  • The CALL NLIP is written to the Alloc entry. The TOS pointer is adjusted to point to the Alloc entry, and then the Alloc pointer is incremented (Column 3 in Figure 10 ). The Alloc pointer never decrements. The TOS linked-list is updated to retain the previous TOS.
  • The RET target is read from the TOS entry and uses the linked-list to adjust the TOS pointer to the previous TOS. The Alloc pointer is not updated on RET instructions.

The CALL NLIP is never overwritten and therefore retains entries that may be lost by the CRSB.

While the RRSB is more accurate on the speculative path, it overflows (wraps) more quickly since Alloc never decrements. Therefore, the return prediction defaults to the CRSB when RRSB detects the wrap condition. We added a 16-entry RRSB to the architecture that works in conjunction with the 16-entry CRSB as shown in (Figure 11) .



Figure 11: RRSB implementation

  Section 11 of 15  

Back to Top

In this article

Download a PDF of this article.