Intel Haswell and its top secret weapon

Intel Slider 2

Intel Haswell and its secret weapon.

For those of you who have been following the latest over about Intel Haswell, I have some interesting news to report.  While doing my roundup of all the Haswell data I could find last night, I kept running across something about the L3 cache.  I finally got around to doing some digging today and am pleased to report back to you!

 

Intel Haswell

The old Nehalem

 

The old Nehalem chips introduced an L3 cache that was kept on the die, with a smaller, low latency L2 cache.  The integrated graphics core at the time however was stored off-die.  With that comes latency issues due to the integrated graphics being located somewhere off the die.  To remedy this, Intel designed two separate clock domains for the CPU (core & uncore), and another that was the off die integrated graphics core.  The core clock was designed for the CPU cores, and the uncore clock controlled the L3 cache.  The idea behind this was that the L3 cache would ramp up in frequency when it was needed, however the bulk of the work would be handled by the L1 and L2 cache.

 

The Sandy Bridge

 

With the Sandy Bridge, Intel moved to a single domain for the core and uncore, and kept another clock for the now on-die graphics core.  The issue here was the locked frequencies of the cache.  Should you ever need to ramp up the integrated graphics, the core had to follow suit.  However if the core was in a low frequency state you were left with reduced performance by on board graphics.  Your options were to force the CPU and L3 cache into a higher frequency, or to remain at a low frequency to prevent from waking up CPU cores.  Obviously the decisions here, but could be done better.  Intel Haswell fixed this issue, hence why we have so much better performance with integrated graphics on the Intel Haswell than we do on previous generations.

The mighty Intel Haswell

 

With the Intel Haswell series, all caches reside on-die.  The CPU cores run at the same frequency, the on-die GPU runs at another frequency and the L3 with addition to the ring bus run at their own frequency.  Because of this design, the L3 cache will have a slightly higher latency than Sandy Bridge.  This may contribute to the slower speeds we are currently seeing with the Intel Haswell.

So why did they do it?

 

If you live in any other country other than the United States, or perhaps run a server that includes high power consumption CPU’s (I don’t know why you’d do this), then you understand power cost money.  With the Intel Haswell series, the GPU tells the ring bus to give/get data, and no longer needs to ramp up the CPU frequency.

There were also some changes made to the L3 cache on the Haswell architecture.  Even though the L3 Cache latency is up, the bandwidth available has been increased, and there are dedicated pipes for both data and non-data to the last level cache.

The secret weapon.

 

So there are some underlying possibilities there with Haswell and DRAM.  Haswell has an improved memory controller that allows better write throughput to DRAM.  Word on the street is, Intel has been telling memory makers to push for higher DRAM frequencies in preparation for Haswell.  Is this perhaps why we saw Corsair release DRAM clocked at 3000Mhz?

 

Source


Leave a Reply