Technology & Research

Intel® Technology Journal Home

Volume 12, Issue 03

Original 45nm Intel® Core™ Microarchitecture


Intel Technology Journal - Featuring Intel's recent research and development

ISSN 1535-864X DOI 10.1535/itj.1203.03

  • Volume 12
  • Issue 03
  • Published November 7, 2008

Original 45nm Intel® Core™ Microarchitecture

  Section 1 of 15  

Improvements in the Intel ® Core™2 Processor Family Architecture and Microarchitecture

ABSTRACT

James Coke, Mobile Microprocessor Group, Intel Corporation

Harikrishna Baliga, Mobile Microprocessor Group, Intel Corporation

Niranjan Cooray, Mobile Microprocessor Group, Intel Corporation

Edward Gamsaragan, Mobile Microprocessor Group, Intel Corporation

Peter Smith, Mobile Microprocessor Group, Intel Corporation

Ki Yoon, Mobile Microprocessor Group, Intel Corporation

James Abel, Software Solutions Group, Intel Corporation

Antonio Valles, Software Solutions Group, Intel Corporation

Index words: SSE4.1, super-shuffle, radix-16, MOVNTDQA, streaming reads, CLI, STI, return stack buffer, super shuffle, SMC detection, Inclusion filter

Citations for this paper: Harikrishna Baliga, Niranjan Cooray, Edward Gamsaragan, Peter Smith, Ki Yoon, James Abel, Antonio Valles "Original 45nm Intel® Core™2 Processor Performance" Intel Technology Journal.http://www.intel.com/technology/itj/2008/
v12i3/3-paper/1-abstract.htm
(October 2008).

ABSTRACT

Intel Corporation continuously seeks to improve the performance of each Intel Architecture microprocessor generation through architectural initiatives as well as process and circuit improvements. The predecessor to the family of processors, the 65nm Intel Core microarchitecture, codename Merom, led the competition in performance. This paper illustrates architecture techniques used by Intel in the family of processors to maintain this leadership position.

The new SSE ISA improvements (dubbed SSE4.1) are discussed, and we look at how the family of processors was able to utilize the Merom SSE enhancements to both enable SSE4.1 and improve legacy instructions. The instruction set is also examined to determine how instructions were targeted to improve various super-scalar workloads.

The paper explains how in the family of processors, the divide instructions are updated from Radix-4 to Radix-16. To minimize the hardware investment, integer divides are handled as floating point divides, so conversion techniques between integer and floating point are also discussed.

There were many other changes to improve the performance of the family of processors including improved data forwarding from stores to loads, removal of serialization from Set Interrupt Flag Clear Interrupt Flag (STI CLI), enhanced Self-modifying Code (SMC) detection, and “renaming” of the Return Stack Buffer.

  Section 1 of 15  

Back to Top

In this article

Download a PDF of this article.