Intel Xeon E5-2600 V3 Codename Haswell-EP Launch

It’s that time of year again when Intel finally release their latest enterprise processor for the dual processor segment to the eagerly awaiting professional market. Following on in the traditional early September launch time frame, the Xeon E5-2600 V3 processor series, codename Haswell-EP has been officially launched, finally allowing us at Bostonlabs to go through all the exciting details of the processors which we’ve been testing secretly in our labs for some time.

1

A Xeon E5-2600 V2 (left) pictured with the new Xeon E5-2600 V3 Series (right)

As a “Tock” in the Intel ”Tick/Tock” release cycle,  Haswell is an all new core microarchitecture, meaning that the core has been vastly redesigned and introduces many new features, yet stays on the same 22nm lithography as its predecessor – Ivy Bridge.

2

Intel®’s Tick / Tock roadmap to date from Nehalem to present

 

The Haswell microarchitecture itself, has actually been around for more than 12 months in the guise of the lower end Xeon E3-1200 V3 series processor, so the processor feature and instruction set is somewhat familiar, however Haswell-EP takes the changes that its entry level sibling delivered and expands on them significantly.

A new platform generation codename “Grantley” was also launched to compliment Haswell –EP, replacing “Romley”. This brings in a new processor socket “Socket R3” which is similar to the outgoing “Socket R” and uses the same 2011 LGA pins, but these are laid out in a slightly different configuration to avoid confusion and accidental insertion of the older model. As a result, and due to other factors such as the newer memory controller found in Haswell-EP, the two platforms are not compatible and the E5-2600 V3 processor cannot be installed into an older “Romley” platform motherboard.

3

A Xeon E5-2600 V2 (left) pictured with the new Xeon E5-2600 V3 Series (right)

 

Additionally, “Grantley“ brings in a new Platform Controller Hub (PCH); the C612 series, codenamed “Wellsburg”, which delivers well received new features over the previous C600 series,  including up to 6 x USB 3.0 ports and up to a massive 10 x S-ATA 3 (6Gbit/s) ports

4
The Intel® C612 Chipset Block Diagram

Returning our focus back to the processor itself, there are numerous major enhancements to both the microarchitecture and the feature set. Below is a comparison of the E5-2600 V2 with the V3, detailing the major changes between these 2 generations –

5

 

 

As you can see, one of the major enhancements is the increase in core count to 18, a huge leap of 50% over the previous generation. The addition of DDR-4 support with a frequency of up to 2133MHz – an increase of 14% in raw clock speed, alongside the 20% increase in QPI bandwidth, it’s clear to see that the V3 series brings significant performance enhancements over the V2 generation.

6

Haswell-EP Simplified Core Layout

Moving onto the simplified core layout above, it’s possible to see more of the key differences between the two generations clearly by focussing on the blue segments.

The inclusion of an integrated voltage regulator in the CPU has moved some of the componentry, which is usually found onto the motherboard onto the processor package itself, freeing up motherboard space. This means that the CPU has better control of its own power input and can control this better, enabling a more stable and efficient utilisation. The downside to this integration is that the processor itself requires more power and as a result there is an increase in the TDP of the processor range. This is compensated however by the removal of the voltage regulators from the motherboard, resulting in a negligible effect on the overall platform power consumption.

New power management features additionally help to lower the overall power consumption of the processor using clever new techniques:

 

Per Core P-State (PCPS) allows different processor cores to run at different frequencies and voltages to other cores, saving power over the traditional method of running all cores at the same level as the highest requirements.

Energy Efficient Turbo (EET) monitors core throughput and monitors in case of stalling and additionally only increases core frequency, only if it is energy efficient to do so

Uncore Voltage / Frequency Scaling (UFS) enables the core and uncore (processor components not on the core but essential for processor performance) to be treated independently and run at different states. For example an LLC / Memory bound application no longer drives the main core frequency high for no reason, wasting energy.

7

A DDR3 Module (top) compared to a DDR4 Module (below) Notice the elongated contacts in the centre

 

DDR4 not only brings  an increase in performance from the increased frequency which it delivers to the platform, but also a decrease in power consumption through its lower operating voltage of just 1.2V.

8

This lower voltage can make as much as 4W difference per DIMM when measured at the wall, so this is not an insignificant change, given that almost all systems will ship with 8 DIMMS, making a 32W saving per system.

 

Additionally, there is a smaller decrease in speed when using multiple DIMMs per channel, resulting in higher performance for large memory deployments of up to 50% over DDR-3  –

 

DIMMs / Channel DDR3 1.5v DDR3 1.35v DDR4 RDIMM DDR4 LRDIMM
1 1866 1600 2133 2133
2 1600 1333 1866 2133
3 1066 800 1600 1600

 

 

Finally, but not least significantly, the inclusion of AVX 2.0 and the Haswell New Instructions (HNI) bring huge performance increases to the platform.

 

Intel sum these up in the following bullet points:

 

~10% higher IPC over IvyBridge Core (not counting new instructions)

  • Better branch prediction
  • Deeper buffers
  • Larger TLBs
  • More execution units
  • Improved front-end

Core improvements to feed FLOPs

  • 2x L1 & L2 cache BW
  • Better misaligned memory operations (important for vectorization)

Increasing per core performance via power efficient features

  • Virtualization: Lower VMX round trip latency
  • Synchronization (multi-threaded/core scaling): Lower cache lock latency
  • Fused Multiply Add (FMA): 2x FLOPs/core vs. Sandy Bridge/Ivy Bridge
  • AVX-Integer: Extend 256-bit vector operations to include integer
  • Big Number Arithmetic (Crypto): Accelerating RSA and GMP (GNU Multiple Precision Arithmetic)

 

The key enhancement driving increase performance listed above is the introduction of the AVX 2.0 instruction set and its FMA feature to this processor architecture. This particular feature was actually already launched in the Xeon E3-1200 V3 series, and has been very successful in blazing a trail for new code in the ISV market.

 

As stated above, this delivers double the number of FLOPs per clock than in the previous generation processor architecture, increasing performance by as much as 100% where these new instructions apply. In testing for example, Intel’s own optimised linpack benchmark was previously seeing high scores of sub 500 GFLOPs for the range topping processor in the V2 range; whereas with the V3 range, the E5-2699 V3 is able to deliver just over 1 TFLOP with current optimisations – a huge leap in performance.

 

There is one caveat to the delivery of AVX2 with this processor generation however, in that the AVX2 operational core frequency is actually lower than the standard core frequency due to increased thermal requirements of the relevant portions of the core. This is detailed in the below diagram, courtesy of Intel –

 

9

AVX Frequency Range Example – E5-2699 V3

 

Therefore calculating theoretical performance is a little more complicated than with previous generations as the frequency is less predictable.

 

Moving onto the SKU’s which Intel launched with you can see that they’ve stayed with tradition and have a replacement model for most of the existing V2 SKU’s, the majority of which have an increased core count or frequency over the outgoing models which enables them to typically perform better than the existing model.

 

There are still the same segments of processor to choose from, Basic, Standard, Advanced and Segment Optimised, with each layer adding more features like Turbo, Hyper Threading and an increasing clock speed or number of cores to enhance performance.

 

CPU Model Cores Frequency Max Memory Frequency L3 Cache TDP
v2 v3 v2 v3 v2 v3 v2 v3 v2 v3
E5-2603 4 6 1.8GHz 1.6GHz 1333MHz 1600MHz 10MB 15MB 80w 85w
E5-2609 4 6 2.5GHz 1.9GHz 1333MHz 1600MHz 10MB 15MB 80w 85w
E5-2620 6 6 2.1GHz 2.4GHz 1600MHz 1866MHz 15MB 15MB 80w 85w
E5-2623 x 4 x 3.0GHz x 2133MHz x 10MB x 105w
E5-2630 6 8 2.6GHz 2.4GHz 1600MHz 1866MHz 15MB 20MB 80w 85w
E5-2630L 6 8 2.4GHz 1.8GHz 1600MHz 1866MHz 15MB 20MB 60w 55w
E5-2637 4 4 3.5GHz 3.5GHz 1866MHz 2133MHz 15MB 15MB 130w 135w
E5-2640 8 8 2GHz 2.6GHz 1600MHz 1866MHz 20MB 20MB 95w 90w
E5-2643 6 6 3.5GHz 3.4GHz 1866MHz 2133MHz 25MB 20MB 130w 135w
E5-2650 8 10 2.6GHz 2.3GHz 1866MHz 2133MHz 20MB 20MB 95w 105w
E5-2650L 10 12 1.7GHz 1.8GHz 1600MHz 2133MHz 25MB 25MB 70w 65w
E5-2660 10 10 2.2GHz 2.6GHz 1866MHz 2133MHz 25MB 25MB 95w 105w
E5-2667 8 8 3.3GHz 3.2GHz 1866MHz 2133MHz 25MB 20MB 130w 135w
E5-2670 10 12 2.5GHz 2.3GHz 1866MHz 2133MHz 25MB 30MB 115w 120w
E5-2680 10 12 2.8GHz 2.5GHz 1866MHz 2133MHz 25MB 30MB 115w 120w
E5-2683 x 14 x 2GHz x 2133MHz x 35MB x 120w
E5-2687W 8 10 3.4GHz 3.1GHz 1866MHz 2133MHz 20MB 25MB 150w 160w
E5-2690 10 12 3GHz 2.6GHz 1866MHz 2133MHz 25MB 30MB 130w 135w
E5-2695 12 14 2.4GHz 2.3GHz 1866MHz 2133MHz 30MB 35MB 115w 120w
E5-2697 12 14 2.7GHz 2.6GHz 1866MHz 2133MHz 30MB 35MB 130w 145w
E5-2698 x 16 x 2.3GHz x 2133MHz x 40MB x 135w
E5-2699 x 18 x 2.3GHz x 2133MHz x 45MB x 145w

 

 

Across the range, we generally see that increase in TDP mentioned earlier, due to the integrated voltage regulator, the largest of which coming from the workstation only Xeon E5-2687W V3 at a staggering 160W. Cooling such a processor is obviously a concern, so Intel deem it only suitable for workstation use, where there is plenty of room for cooling and larger heat sinks. At Boston, we already have a SKU with liquid cooling ready and validated for the E5-2687W V3, the Boston Venom 2401-12T.

 

This full tower system designed for creative professionals also sports a Quadro K6000 graphics card, up to 1TB of DDR-4 memory, a Blu-ray rewriter drive and up to 6 x S-ATA HDD’s or SSD’s.

10

11

The Boston Venom 2401-12T and its liquid cooled E5-2687W V3 processors

In traditional tier one designed, a 1U rack mount chassis will be restricted to 135W TDP processors, and up to 145W TDP processors in a more spacious 2U enclosure. This means that if you want to use 2 of the range topping Xeon E5-2699 V3 processor with 18 cores each, you will need to consider a 2U or above chassis. However, here at Boston we have new range of server solutions available with data centre optimised cooling enabling us to use the range topping 160W TDP models in a 1U or 2U form factor. The clever cooling design features the processors situated side by side; instead of inline. This helps decrease the temperature of the 2nd CPU by up to 10 degrees Celsius, prolonging the CPU lifetime, lowering fan speed, lowering noise and increasing turbo performance.

 

Another example solution released today is the Boston Value Series VS360p. This has support for 2 x Xeon E5-2600 V3 processors of up to 145W TDP, 1TB of DDR-4 Registered memory, 8 x SAS 3 2.5” HDD’s or SSD’s and additionally has 2 x 2.5” NVME SSD bays, all within a space saving 1U package.

 

 

Right from launch, Boston have a range of solutions available from launch which are Xeon E5-2600 V3 series ready and have been thoroughly tested and validated with the latest operating systems.

 

For more details on the range, please download our exclusive X10 solutions catalogue from our website at the following link-

 

http://download.boston.co.uk/downloads/7/b/4/7b4c48b3-f5e5-4e5f-b6a9-7757db1d0a14/Intel-Haswell-customer-brochure.pdf

 

If you would like some more information or a custom quotation please contact us on +44(0)1727 876 100 or contact us to speak to one of our technical account managers who will gladly make sure you get the best personalised solution for you.

Stay tuned to bostonlabs.co.uk for our next article in the coming days where we put the Xeon E5-2600 V3 series through its paces with our standard benchmark suite and see how it compares to the V2.

 

DJ