Performance and Deployments

As part of the discussion points, Intel stated that it has integrated its BF16 support into its usual array of supported frameworks and utilities that it normally defines as ‘Intel DL Boost’. This includes PyTorch, TensorFlow, OneAPI, OpenVino, and ONNX. We had a discussion with Wei Li, who heads up Intel’s AI Software Group at Intel, who confirmed to us that all these libraries have already been updated for use with BF16.  For the high level programmers, these libraries will accept FP32 data and do the data conversion automatically to BF16, however the functions will still require an indication to use BF16 over INT8 or something similar.

When speaking with Wei Li, he confirmed that all the major CSPs who have taken delivery of Cooper Lake are already porting workloads onto BF16, and have been for quite some time. That isn’t to say that BF16 is suitable for every workload, but it provides a balance between the accuracy of FP32 and the computational speed of FP16. As noted in the slide above, over FP32, BF16 implementations are achieving up to ~1.9x speedups on both training and inference with Intel’s various CSP customers.

Normally we don’t post too many graphs of first party performance numbers, however I did want to add this one.

Here we see Intel’s BF16 DL Boost at work for Resnet-50 in both training and inference. Resnet-50 is an old training set at this point, but is still used as a reference point for performance given its limited scope in layers and convolutions. Here Intel is showing a 72% increase in performance with Cooper Lake in BF16 mode vs Cooper Lake in FP32 mode when training the dataset.

Inference is a bit different, because inference can take advantage of lower bit, high bandwidth data casting, such as INT8, INT4, and such. Here we see BF16 still giving 1.8x performance over normal FP32 AVX512, but INT8 has that throughput advantage. This is a balance of speed and accuracy.

It should be noted that this graph also includes software optimizations over time, not only raw performance of the same code across multiple platforms.

I would like to point out the standard FP32 performance generation on generation. For AI Training, Intel is showing a 1.82/1.64 = 11% gain, while for inference we see a 2.04/1.95 = 4.6 % gain in performance generation-on-generation. Given that Cooper uses the same cores underneath as Cascade, this is mostly due to core frequency increases as well as bandwidth increases.

Deployments

A number of companies reached out to us in advance of the launch to tell us about their systems.

Lenovo will be announcing the launch of its ThinkSystem SR860 V2 and SR850 V2 servers with Cooper Lake and Optane DCPMM. The SR860 V2 will support up to four double-wide 300W GPUs in a dual socket configuration.

The fact that Lenovo is offering 2P variants of Cooper Lake is quite puzzling, especially as Intel said these were aimed at 4P systems and up. Hopefully we can get one in for testing.

Also, GIGABYTE is announcing its R292-4S0 and R292-4S1 servers, both quad socket.

One of Intel’s partners stated to us that they were not expecting Cooper Lake to launch so soon – even within the next quarter. As a result, they were caught off guard and had to scramble to get materials for this announcement. It would appear that Intel had a need to pull in this announcement to now, perhaps because one of the major CSPs is ready to announce.

Socket, Silicon, and SKUs
Comments Locked

99 Comments

View All Comments

  • JayNor - Thursday, June 18, 2020 - link

    The SSD speed depends on the block sizes and whether the data is restored serially.

    The other issue is that the data may not have been stored to the SSD.
  • schujj07 - Thursday, June 18, 2020 - link

    I have and you are very wrong. Just SAP itself isn't an in RAM program, it is a set of different types of programs. SAP itself can run on multiple different DBs (Sybase, MSSQL, Oracle, MaxDB, DB2, or HANA) and with the exception of S4 HANA you need a separate system for your application server. Of those only HANA is a in RAM DB. Shutting down SAP doesn't take that long itself, shutting down HANA on the other hand can take a while depending on the storage subsystem you have. A 128GB RAM HANA DB can take up to 20 minutes to shutdown or restart on a 8Gb Fibre Channel SAN with 10k spinning disks. However, moving to a Software Defined Storage (SDS) array with NVMe disk and dual port 25Gb iSCSI interfaces changed that same shutdown & restart to less than 2 minutes. I have started a 1000GB HANA DB on that same SDS array in about 5 minutes. When you are restarting a physical HANA appliance the thing that takes the most time is the RAM check. I've restarted appliances with 2TB RAM and the RAM check itself can take about 10-20 minutes.

    Cramming more cores onto an Intel CPU is very difficult. The 28 core CPU is already near the top of the recital limit with an estimated size of 698mm2. https://www.anandtech.com/show/11550/the-intel-sky... That right there means that they cannot add more cores to their monolithic die. I can guarantee you that they would if it would fit.
  • Deicidium369 - Thursday, June 18, 2020 - link

    Those systems are running on large multi socket systems... so the individual socket core count is not really that big of a deal. Most ERP is more IO intensive than purely compute intensive.

    I haven't dealt with a large SAP install - last one I was involved with was a SAP R/3 on a Sun Starfire server... and my SAP HANA is well handled by available RAM, and we don't need to worry about scheduling downtime across multiple world time zones.

    1TB is not that large of an install - but larger than what I run... You have much more upto date experience than I do - i left the day to day years ago.
  • Spunjji - Friday, June 19, 2020 - link

    "Those systems are running on large multi socket systems... so the individual socket core count is not really that big of a deal."
    It is if it means they can run the same sized instance on cheaper systems with fewer sockets. :|

    "Most ERP is more IO intensive than purely compute intensive."
    Then having that computer power attached to the fastest *and* widest IO available surely counts for something? Especially if, once again, it means you can get the same IO bandwidth from fewer sockets.

    You're basically saying "AMD is bad for this" with a bunch of faux-authoritative statements based on outdated or inaccurate information, and then when you're called on it, you dissemble with a bunch of reasons which imply that in reality AMD could probably be quite a good fit for some people.
  • eek2121 - Thursday, June 18, 2020 - link

    I wish AMD had a quad socket offering available via DIY for EPYC. I wish bot AMD and Intel would consider a dual socket offering for HEDT. I suppose the power/cooling requirements might be too high.
  • Deicidium369 - Thursday, June 18, 2020 - link

    I was an Intel HEDT user - when the time came to replace our engineering workstations I looked into the HEDT socket 2066 offerings, and ultimately decided on going to a dual socket 3647 Xeon Scalable motherboard and CPU. More memory channels and the ability (in our case never used, due to upgrade from a small Pascal based DGX to 2 large Volta DGX-2s) a second socket. So the people who have needed the additional power have moved to Xeon already. So HEDT is largely dead - the i9900K/i10900K can handle the lower end parts of the market - and if ISV certifications are required for support (Autodesk/etc) - then the Intel/Nvidia is really the only game in town.

    a Dual socket AMD or Intel aren't really that power intensive - and active CPU coolers are available for both - so chilled datacenter air would not be required (most servers use a passive heat sink, due to DC air). So if your use case requires dual socket - it's not that hard to accomplish.
  • schujj07 - Thursday, June 18, 2020 - link

    "So HEDT is largely dead - the i9900K/i10900K can handle the lower end parts of the market - and if ISV certifications are required for support (Autodesk/etc) - then the Intel/Nvidia is really the only game in town." The HEDT all depends on what you are doing. If you are running applications that can be done with max 256GB RAM and scale above 10c/20t, then HEDT is still viable. Especially if you need maximum CPU performance. https://www.servethehome.com/amd-ryzen-threadrippe...

    The ISV certification claim you make is total BS. https://www.amd.com/en/support/certified-drivers (that is just Radeon Pro) For CPU that is simple since it is x86-64 and anything that runs x86-64 will work with it just fine.
  • Deicidium369 - Thursday, June 18, 2020 - link

    Sigh

    The vendor we used back then would only offer Support on end to end systems they supplied - so it was Intel and Nvidia - at this point AMD was not in a competitive position. Looking at the dates of the drivers - they were not certified at the time. We used Win 7 at that time.

    I don't know what you want - sorry if my experience is different that what the AMD website has to say. I chose Intel and will continue to choose Intel. I don't care what you choose as I don't care.

    Sorry that Intel has a dominant position in almost every single segment, also sorry that Nvidia has been destroying AMD in GPUs. Sorry that at the time when I purchased a system for the then new Window 7 that AMD was not an option for CPU or GPU. My businesses run off of Intel and Nvidia. When making the decisions for the now current system I evaluated TR and it came up short, WAY SHORT. I don't expect to eval TR or Epyc for the replacements early next year. Sorry that it somehow affects you.
  • schujj07 - Friday, June 19, 2020 - link

    I don't care what you choose, just don't come in and state things as fact when your information is 10 years old. Remember that when you make false and misleading claims people will call you out. There are a lot of IT Pros who read this website for the new tech that is coming out or because they are system builders as well. We know what we are talking about because our job is to stay on top of the trends.
  • Deicidium369 - Saturday, June 20, 2020 - link

    Yeah and you work for people like me. And I don't care what you think, believe or do.

Log in

Don't have an account? Sign up now