- Home ›
- Technology and Research ›
- Intel Technology Journal ›
- Original 45nm Intel® Core™ Microarchitecture
Original 45nm Intel® Core™ Microarchitecture
Improvements in the Intel ® Core™2 Processor Family Architecture and Microarchitecture
ABSTRACT
James Coke, Mobile Microprocessor Group, Intel Corporation
Harikrishna Baliga, Mobile Microprocessor Group, Intel Corporation
Niranjan Cooray, Mobile Microprocessor Group, Intel Corporation
Edward Gamsaragan, Mobile Microprocessor Group, Intel Corporation
Peter Smith, Mobile Microprocessor Group, Intel Corporation
Ki Yoon, Mobile Microprocessor Group, Intel Corporation
James Abel, Software Solutions Group, Intel Corporation
Antonio Valles, Software Solutions Group, Intel Corporation
Index words: SSE4.1, super-shuffle, radix-16, MOVNTDQA, streaming reads, CLI, STI, return stack buffer, super shuffle, SMC detection, Inclusion filter
Citations for this paper: Harikrishna Baliga, Niranjan Cooray, Edward Gamsaragan, Peter Smith, Ki Yoon, James Abel, Antonio Valles "Original 45nm Intel® Core™2 Processor Performance" Intel Technology Journal.http://www.intel.com/technology/itj/2008/
v12i3/3-paper/1-abstract.htm
(October 2008).
ABSTRACT
Intel Corporation continuously seeks to improve the performance of each Intel Architecture microprocessor generation through architectural initiatives as well as process and circuit improvements. The predecessor to the family of processors, the 65nm Intel Core microarchitecture, codename Merom, led the competition in performance. This paper illustrates architecture techniques used by Intel in the family of processors to maintain this leadership position.
The new SSE ISA improvements (dubbed SSE4.1) are discussed, and we look at how the family of processors was able to utilize the Merom SSE enhancements to both enable SSE4.1 and improve legacy instructions. The instruction set is also examined to determine how instructions were targeted to improve various super-scalar workloads.
The paper explains how in the family of processors, the divide instructions are updated from Radix-4 to Radix-16. To minimize the hardware investment, integer divides are handled as floating point divides, so conversion techniques between integer and floating point are also discussed.
There were many other changes to improve the performance of the family of processors including improved data forwarding from stores to loads, removal of serialization from Set Interrupt Flag Clear Interrupt Flag (STI CLI), enhanced Self-modifying Code (SMC) detection, and “renaming” of the Return Stack Buffer.
