Performance and Deployments

As part of the discussion points, Intel stated that it has integrated its BF16 support into its usual array of supported frameworks and utilities that it normally defines as ‘Intel DL Boost’. This includes PyTorch, TensorFlow, OneAPI, OpenVino, and ONNX. We had a discussion with Wei Li, who heads up Intel’s AI Software Group at Intel, who confirmed to us that all these libraries have already been updated for use with BF16.  For the high level programmers, these libraries will accept FP32 data and do the data conversion automatically to BF16, however the functions will still require an indication to use BF16 over INT8 or something similar.

When speaking with Wei Li, he confirmed that all the major CSPs who have taken delivery of Cooper Lake are already porting workloads onto BF16, and have been for quite some time. That isn’t to say that BF16 is suitable for every workload, but it provides a balance between the accuracy of FP32 and the computational speed of FP16. As noted in the slide above, over FP32, BF16 implementations are achieving up to ~1.9x speedups on both training and inference with Intel’s various CSP customers.

Normally we don’t post too many graphs of first party performance numbers, however I did want to add this one.

Here we see Intel’s BF16 DL Boost at work for Resnet-50 in both training and inference. Resnet-50 is an old training set at this point, but is still used as a reference point for performance given its limited scope in layers and convolutions. Here Intel is showing a 72% increase in performance with Cooper Lake in BF16 mode vs Cooper Lake in FP32 mode when training the dataset.

Inference is a bit different, because inference can take advantage of lower bit, high bandwidth data casting, such as INT8, INT4, and such. Here we see BF16 still giving 1.8x performance over normal FP32 AVX512, but INT8 has that throughput advantage. This is a balance of speed and accuracy.

It should be noted that this graph also includes software optimizations over time, not only raw performance of the same code across multiple platforms.

I would like to point out the standard FP32 performance generation on generation. For AI Training, Intel is showing a 1.82/1.64 = 11% gain, while for inference we see a 2.04/1.95 = 4.6 % gain in performance generation-on-generation. Given that Cooper uses the same cores underneath as Cascade, this is mostly due to core frequency increases as well as bandwidth increases.

Deployments

A number of companies reached out to us in advance of the launch to tell us about their systems.

Lenovo will be announcing the launch of its ThinkSystem SR860 V2 and SR850 V2 servers with Cooper Lake and Optane DCPMM. The SR860 V2 will support up to four double-wide 300W GPUs in a dual socket configuration.

The fact that Lenovo is offering 2P variants of Cooper Lake is quite puzzling, especially as Intel said these were aimed at 4P systems and up. Hopefully we can get one in for testing.

Also, GIGABYTE is announcing its R292-4S0 and R292-4S1 servers, both quad socket.

One of Intel’s partners stated to us that they were not expecting Cooper Lake to launch so soon – even within the next quarter. As a result, they were caught off guard and had to scramble to get materials for this announcement. It would appear that Intel had a need to pull in this announcement to now, perhaps because one of the major CSPs is ready to announce.

Socket, Silicon, and SKUs
Comments Locked

99 Comments

View All Comments

  • schujj07 - Saturday, June 20, 2020 - link

    If you don't care what your CTO thinks, believes, or does for your infrastructure then you are a moron of a CEO. At that point why even have a CTO since you know best about everything. If I were to come to you and state I can save you $500K today on your upgrade all while increasing performance plus more savings on power and cooling you would be an idiot not to listen. However, since you don't want to trust you CTO you will just burn money.
  • Deicidium369 - Thursday, June 25, 2020 - link

    Wasn't aware that I hired you as my CTO. I am not the CEO - I am the owner of the business. I hired the CEO and CTO and COO for that matter. What my CTO says matters. What a forum poster on a tech forum says does not matter.
  • Spunjji - Friday, June 19, 2020 - link

    Dude, you've done this twice on the same article - waxed lyrical about "how things are", then when someone challenges you, backed out by saying it's just "how things were" or "how I decided to do things ten years ago". If you're going to make informed claims about the present then you need to be using info from the present; if you're not, then they're not really informed in any meaningful sense.
  • Korguz - Friday, June 19, 2020 - link

    schujj07, Spunjji, you know damn well he wont ever do that. you call him out on any of his bs and fud, and all he does is resort to name calling, condesending remarks, and insults. his whole attitude is : how dare you call me out ? i know what i am talking about, and its fact, ( even though he rarely, if ever posts proof ) so dont argue with me !!! or runs away and hides.
  • Deicidium369 - Saturday, June 20, 2020 - link

    Hey little buddy

    bored? have nothing substantive to say? thinking about me? Living rent free in that low rent run down tenement in your head.

    Now do as you mother has asked - and go clean the basement.
  • Korguz - Saturday, June 20, 2020 - link

    hey, has McDonalds hired you back yet ? or are you still layed off cause of covid 19 ?

    " have nothing substantive to say " oh like you do al the time ? your posts are pure fiction, and BS just like your life. your living rent free at your parents house, so point is ? the way you talk, and ALWAYS resort to insults, name calling and condescending remarks shows your are not what you claim to be. clean the basement ? what for ? when i would just make a brat like you do it.
  • Deicidium369 - Thursday, June 25, 2020 - link

    I am not laid off - I own the businesses. Most of my employees in the largest business I own are idled at the moment - with only about 150 returning to work due to the construction sites in Texas and Florida being reopened. So, yes quite a few of my employees in that business are idled - I don't work for the businesses - I own them. Might be hard for you to understand.

    Parents are both dead - I am 49yo, married, 3 kids - own not only the home I am at right now typing this, but also 4 others. So not paying rent or mortgage.

    I am sorry that your zero useful content posts are responded to condescendingly - but that is all they warrant. Maybe you can put up that screen shot of a post I made, and when I responded quite a few people told you you needed to stop already. Not calling anyone names, little buddy - you do need to clean the basement, you are the brat that has that job.

    Have a wonderful day, I hope that tomorrow brings some accomplishment that allows you to grow as an individual and finally start making posts that are not just reactionary posts to something that I have posted.
  • WaltC - Friday, June 19, 2020 - link

    Interesting...mentioning Intel's "custom" non-SKU versions is fine, but AMD does exactly the same thing...;) Also, it sure looks as if TSMC is 5-7 years ahead of Intel fabrication at the moment. That's an amazing leap forward, imo.
  • sing_electric - Friday, June 19, 2020 - link

    TSMC's definitely ahead but I'm not sure that it's by that much. It's pretty obvious that Intel repeatedly shot itself in the foot with 10nm, but from the last update I know of (March) their 7nm was still on track for 2021, which will put it at ROUGH parity with TSMC's 5nm.

    And past a point, arguing that a comparable process from Intel or TSMC is better than the other is kind of a fool's errand - you can't just say that just because say, EUV is used in X layers its better, or that the denser one wins: One company might decide to dial down density or increase the size of certain gates in libraries because they think it'll ultimately enable the best designs. You've also got to consider frequency scaling, yields and cost, and no one has hard numbers for those from both companies.
  • Deicidium369 - Saturday, June 20, 2020 - link

    Intel has worked out the issues with Cobalt - not just minor features, but entire conductive layers - neither TSMC nor Samsung have even begun.

    Intel 7nm will be FULL EUV - ALL 12-15 layers will be EUV - nothing on TSMC roadmap has them doing full EUV / no DUV.

    But yeah, Intel traditionally had 3 variants of each node - 1 was frequency optimized, 1 was density optimized and 1 was power optimized - wasn't uncommon to get 2 in one product.

    Skylake is most definitely frequency optimized - and the most recent 14nm iteration is significantly denser than the 1st 14nm iteration.

    Agree on the yields - Intel announced couple years ago that it's 2018 10nm (10nm-?) had major yield issues - which somehow follows them to 10nm and 10nm+ - but that's mostly the fanboys. Intel's issue with 10 was that instead of trying for a ~2x density, they tried and failed to skip a generation with a 2.7x. 10nm is 2X and 10nm+ is that 2.7x density increase - boneheaded mistake to be sure - and in the mean time they could not produce 14nm fast enough to meet demand.

Log in

Don't have an account? Sign up now