Better Core in Zen 2

Just in case you have missed it, in our microarchitecture analysis article Ian has explained in great detail why AMD claims that its new Zen2 is significantly better architecture than Zen1:

  • a different second-stage branch predictor, known as a TAGE predictor
  • doubling of the micro-op cache
  • doubling of the L3 cache
  • increase in integer resources
  • increase in load/store resources
  • support for two AVX-256 instructions per cycle (instead of having to combine two 128 bit units).

All of these on-paper improvements show that AMD is attacking its key markets in both consumer and enterprise performance. With the extra compute and promised efficiency, we can surmise that AMD has the ambition to take the high-performance market back too. Unlike the Xeon, the 2nd gen EPYC does not declare lower clocks when running AVX2 - instead it runs on a power aware scheduler that supplies as much frequency as possible within the power constraints of the platform.

Users might question, especially with Intel so embedded in high performance and machine learning, why AMD hasn't gone with an AVX-512 design? As a snap back to the incumbent market leader, AMD has stated that not all 'routines can be parallelized to that degree', as well as a very clear signal that 'it is not a good use of our silicon budget'. I do believe that we may require pistols at dawn. Nonetheless, it will be interesting how each company approaches vector parallelisation as new generations of hardware come out. But as it stands, AMD is pumping its FP performance without going full-on AVX-512.

In response to AMD's claims of an overall 15% IPC increase for Zen 2, we saw these results borne out of our analysis of Zen 2 in the consumer processor line, which was released last month. In our analysis, Andrei checked and found that it is indeed 15-17% faster. Along with the performance improvements, there have been also security hardening updates, improved virtualization support, and new but proprietary instructions for cache and memory bandwidth Quality of Service (QoS). (The QoS features seem very similar to what Intel has introduced in Broadwell/Xeon E5 version 4 and Skylake - AMD is catching up in that area).

Rome Layout: Simple Makes It a Lot Easier

When we analyzed AMD's first generation of EPYC, one of the big disadvantages was the complexity. AMD had built its 32-core Naples processors by enabling four 8-core silicon dies, and attaching each one to two memory channels, resulting in a non-uniform memory architecutre (NUMA). Due to this 'quad NUMA' layout, a number of applications saw quite a few NUMA balancing issues. This happened in almost every OS, and in some cases we saw reports that system administrators and others had to do quite a bit optimization work to get the best performance out of the EPYC 7001 series.  

The New 2nd Gen EPYC, Rome, has solved this. The CPU design implements a central I/O hub through which all communications off-chip occur. The full design uses eight core chiplets, called Core Complex Dies (CCDs), with one central die for I/O, called the I/O Die (IOD). All of the CCDs communicate with this this central I/O hub through dedicated high-speed Infinity Fabric (IF) links, and through this the cores can communicate to the DRAM and PCIe lanes contained within, or other cores.

The CCDs consist of two four-core Core CompleXes (1 CCD = 2 CCX). Each CCX consist of a four cores and 16 MB of L3 cache, which are at the heart of Rome. The top 64-core Rome processors overall have 16 CCX, and those CCX can only communicate with each other over the central I/O die. There is no inter-chiplet CCD communication.

This is what this diagram shows. On the left we have Naples, first Gen EPYC, which uses four Zepellin dies each connected to the other with an IF link. On the right is Rome, with eight CCDs in green around the outside, and a centralized IO die in the middle with the DDR and PCIe interfaces.

As Ian reported, while the CCDs are made at TSMC, using its latest 7 nm process technology. The IO die by contrast is built on GlobalFoundries' 14nm process. Since I/O circuitry, especially when compared to caching/processing and logic circuitry, is notoriously hard to scale down to smaller process nodes, AMD is being clever here and using a very mature process technology to help improve time to market, and definitely has advantages.

This topology is clearly visible when you take off the hood. 

AMD Rome chip

Main advantage is that the 2nd Gen 'EPYC 7002' family is much easier to understand and optimize for, especially from a software point of view, compared to Naples. Ultimately each processor only has one memory latency environment, as each core has the same latency to speak to all eight memory channels simultanously  - this is compared to the first generation Naples, which had two NUMA regions per CPU due to direct attached memory.

As seen in the image below, this means that in a dual socket setup, a Naples processor will act like a traditional NUMA environment that most software engineers are familiar with.

Ultimately the only other way to do this is with a large monolithic die, which for smaller process nodes is becoming less palatable when it comes to yields and pricing. In that respect, AMD has a significant advantage in being able to develop small 7nm silicon with high yields and also provide a substantial advantage when it comes to binning for frequency.

How a system sees the new NUMA environment is quite interesting. For the Naples EPYC 7001 CPUs, this was rather complicated in a dual socket setup: 

Here each number shows the 'weighting' given to the delay to access each of the other NUMA domains. Within the same domain, the weighting is light at only 10, but then a NUMA domain on the same chip was given a 16. Jumping off the chip bumped this up to 32.

This changed significantly on Rome EPYC 7002: 

Although there are situations where the EPYC 7001 CPUs communicated faster, but the fact that the topology is much simpler from the software point of view is worth a lot. It makes getting good performance out of the chip much easier for everyone that has to used it, which will save a lot of money in Enterprise, but also help accelerate adoption. 

The First Boot Experience Rome and PCIe 4.0
POST A COMMENT

184 Comments

View All Comments

  • cyberguyz - Thursday, August 8, 2019 - link

    I was also a senior software engineer (retired after 30 years) supporting mostly fortune 1000 companies. I have to tell you that the the vast majority ones I have dealt with use a mixed server environment of Windows server, Linux (RHEL), zLinux, and AIX along with Java as the language of choice along with Javascript as the web interface language. This experience comes from digging through their heap and system dumps, poring through thousands of lines of server source code and building/releasing middleware server development software for those companies. Except for those on zLinux the rest are on multiprocessor x86 systems. Reply
  • Null666666 - Friday, August 9, 2019 - link

    Hardly, but then what do I know, only been tuing corporate large scale databases since 91..

    Linux is for any scale any size.

    Friends don't let friends do windows. Admittedly, it's gotten better. But for high available you just can't do "the windows solution", power off power on.
    Reply
  • sleepeeg3 - Friday, August 9, 2019 - link

    Um... is your background in Windows Server? That might skew your bias. Reply
  • eek2121 - Saturday, August 10, 2019 - link

    This is 100% false, even Microsoft themselves has stated as much. Linux owns the internet. Windows owns the office. Reply
  • Vatharian - Saturday, August 17, 2019 - link

    Not every server in existence is meant to carry and forward mails from accounting to marketing. Most of IT in non-IT focused enterprises are indeed meant as office backend will run WS, but virtually every single workhorse beside that will be Linux running. Between hosting, compute and big data Windows has no place simply because of too high overhead, no flexibility on low level optimization, and extremely high cost of initial driver development. I.e. hardware my company makes (specialized accelerators) has 3x time to market on Windows platform. We now shift to FPGA, and we dropped support for Windows, because of bugs that our vendor can't fix for months. Not to mention, that some of our clients run IBM, therefore, Linux. Reply
  • healthymosquito - Wednesday, October 2, 2019 - link

    Being part of a 10 figure company's infrastructure team, I can say that what you are saying it patently false for electronics Manufacturing. Sure Windows has most of the office desktops, but all engineer stations, as well as all heavy lifting servers in my corp run Linux globally, That isn't counting our 100% Linux AWS and Google Cloud presense. Having worked in hosting recently as a side gig, Web presence for Windows is just as dismal. No one is paying money for an IIS server or MSQL to run websites. Windows numbers on the Internet are extremely low. Reply
  • nobodyblog - Thursday, August 8, 2019 - link

    Windows is used in military..
    Additionally, about Java, I doubt it is as good as .Net even in 2019. And Linux is norm in Big Companies OR embedded market only. Medium/small size are all on Windows - FACT. Additionally, there is no real Antivirus for Linux, and opensource softwares aren't very reliable..

    Thanks!
    Reply
  • Arnulf - Thursday, August 8, 2019 - link

    Antivirus? How old are you?

    I work for a small/medium business (8 figures in EUR) and we have same usage profile as described by Deshi - Linux is running all our key stuff while we have a lone Windows server for AD and related crap.
    Reply
  • FreckledTrout - Thursday, August 8, 2019 - link

    Say what no Antivirus for Linux? Two I know of in use at corporations right now are ESET and Trendmicro. Reply
  • zmatt - Thursday, August 8, 2019 - link

    Completely baseless claims. I have worked large scale government and military IT and Windows servers are the most common by far. There were some Linux but they we a minority. Where you see Linux thrive in servers is cloud providers and in companies that provide primarily web based products. Microsoft even offers their own Linux options through Azure, and everyone knows about AWS and their own totally-not-a-ripoff-of-RHEL distro. But Cloud infrastructure doesn't have to be Windows, people dont use it for the same thing usually.

    Linux still doesn't have an equivalent to Active Directory and that has been in my experience one of the largest infrastructure uses in self hosted environments. Domain controllers and servers that support them made up and continue to make up the bulk. Until Linux has a competitor to it (and I doubt they will because most Linux devs refuse to "copy" anything Microsoft does) then Windows servers will stick around.
    Reply

Log in

Don't have an account? Sign up now