CPU Encoding Tests

One of the interesting elements on modern processors is encoding performance. This includes encryption/decryption, as well as video transcoding from one video format to another. In the encrypt/decrypt scenario, this remains pertinent to on-the-fly encryption of sensitive data - a process by which more modern devices are leaning to for software security. Video transcoding as a tool to adjust the quality, file size and resolution of a video file has boomed in recent years, such as providing the optimum video for devices before consumption, or for game streamers who are wanting to upload the output from their video camera in real-time. As we move into live 3D video, this task will only get more strenuous, and it turns out that the performance of certain algorithms is a function of the input/output of the content.

All of our benchmark results can also be found in our benchmark engine, Bench.

7-Zip 9.2: link

One of the freeware compression tools that offers good scaling performance between processors is 7-Zip. It runs under an open-source licence, is fast, and easy to use tool for power users. We run the benchmark mode via the command line for four loops and take the output score.

Encoding: 7-Zip Combined Score

Encoding: 7-Zip CompressionEncoding: 7-Zip Decompression

At the request of a few users, we've gone back through our saved benchmark data and pulled out compression/decompression numbers for 7-zip. AMD clearly makes a win here in decompression by a long way with all the threads, and the 1800X beats the 1950X in Game Mode due to frequency.

WinRAR 5.40: link

For the 2017 test suite, we move to the latest version of WinRAR in our compression test. WinRAR in some quarters is more user friendly that 7-Zip, hence its inclusion. Rather than use a benchmark mode as we did with 7-Zip, here we take a set of files representative of a generic stack (33 video files in 1.37 GB, 2834 smaller website files in 370 folders in 150 MB) of compressible and incompressible formats. The results shown are the time taken to encode the file. Due to DRAM caching, we run the test 10 times and take the average of the last five runs when the benchmark is in a steady state.

Encoding: WinRAR 5.40

WinRAR encoding is another test that doesn't scale up especially well with thread counts. After only a few threads, most of its MT performance gains have been achieved. The balance here is with memory and frequency, to which the 1800X wins. The 1800X takes a sizeable gain over the 1950X in Game Mode too, likely due to far memory latency.

AES Encoding

Algorithms using AES coding have spread far and wide as a ubiquitous tool for encryption. Again, this is another CPU limited test, and modern CPUs have special AES pathways to accelerate their performance. We often see scaling in both frequency and cores with this benchmark. We use the latest version of TrueCrypt and run its benchmark mode over 1GB of in-DRAM data. Results shown are the GB/s average of encryption and decryption.

Encoding: AES

HandBrake v1.0.2 H264 and HEVC: link

As mentioned above, video transcoding (both encode and decode) is a hot topic in performance metrics as more and more content is being created. First consideration is the standard in which the video is encoded, which can be lossless or lossy, trade performance for file-size, trade quality for file-size, or all of the above can increase encoding rates to help accelerate decoding rates. Alongside Google's favorite codec, VP9, there are two others that are taking hold: H264, the older codec, is practically everywhere and is designed to be optimized for 1080p video, and HEVC (or H265) that is aimed to provide the same quality as H264 but at a lower file-size (or better quality for the same size). HEVC is important as 4K is streamed over the air, meaning less bits need to be transferred for the same quality content.

Handbrake is a favored tool for transcoding, and so our test regime takes care of three areas.

Low Quality/Resolution H264: Here we transcode a 640x266 H264 rip of a 2 hour film, and change the encoding from Main profile to High profile, using the very-fast preset.

Encoding: Handbrake H264 (LQ)

High Quality/Resolution H264: A similar test, but this time we take a ten-minute double 4K (3840x4320) file running at 60 Hz and transcode from Main to High, using the very-fast preset.

Encoding: Handbrake H264 (HQ)

HEVC Test: Using the same video in HQ, we change the resolution and codec of the original video from 4K60 in H264 into 4K60 HEVC.

Encoding: Handbrake HEVC (4K)

 

Benchmarking Performance: CPU Web Tests Benchmarking Performance: CPU Office Tests
Comments Locked

104 Comments

View All Comments

  • silverblue - Friday, August 18, 2017 - link

    I'd like to see what happens when you manually set a 2+2+2+2 core configuration, instead of enabling Game Mode. From what I've read, Game Mode destroys memory bandwidth but yields better latency, however it's not answering whether Zen cores can really benefit from the extra bandwidth that a quad-channel memory interface affords.

    Alternatively, just clock the 1950 and 1920 identically, and see if the 1920's per-core performance is any higher.
  • KAlmquist - Friday, August 18, 2017 - link

    “One of the interesting data points in our test is the Compile. Because <B>this test requires a lot of cross-core communication</B> and DRAM, we get an interesting metric where the 1950X still comes out on top due to the core counts, but because the 1920X has fewer cores per CCX, it actually falls behind the 1950X in Game Mode and the 1800X despite having more cores.”

    Generally speaking, copmpilers are single threaded, so the parallelism in a software build comes from compiling multiple source files in parallel, meaning the cross-core communication is minimal. I have no idea what MSVC is doing here, can you explain? In any case, while I appreciate you including a software development benchmark, the one you've chosen would seem to provide no useful information to anyone who doesn't use MSVC.
  • peevee - Friday, August 18, 2017 - link

    I use MSVC and it scales pretty well if you are using it right. They are doing something wrong.
  • KAlmquist - Saturday, August 19, 2017 - link

    Thanks. It makes sense that MSVC would scale about as well as any other build environment.

    ARS Technica also benchmarked a Chromium build, which I think uses MSVC, but uses the Google tools GN and Ninja to manage the build. They get:

    Ryzen 1800X (8 cores) - 9.8 build/day
    Threadripper 1920X (12 cores) - 16.7 build/day
    Threadripper 1950X (16 cores) - 18.6 build/day

    Very good speedup with the 1920X over the 1800X, but not so much going from the 1920X to the 1950X. Perhaps the benchmark is dependent on memory bandwidth and L3 cache.
  • Timur Born - Friday, August 18, 2017 - link

    Thanks for the tests!

    I would have liked to see a combination of both being tested: Game Mode to switch off the second die and SMT disabled. That way 4 full physical cores with low latency memory access would have run the games.

    Hopefully modern titles don't benefit from this, but some more "legacy" ones might like this setup even more.
  • Timur Born - Friday, August 18, 2017 - link

    Sorry, I meant 8 cores, aka 8/8 cores mode.
  • mat9v - Friday, August 18, 2017 - link

    I wish someone had an inclination to test creative mode but with games pinned to one module. It is essentially NUMA mode but with all cores active.
    Or just enable SMT that is disabled in Gaming Mode - we actually then get a Ryzen 1800X CPU that overclocks well but with possibly higher performance due to all system task running on different module (if we configure system that way) and unencumbered access to more PCIEx lines.
  • peevee - Friday, August 18, 2017 - link

    Yes, that would be interesting.
    c:\>start /REALTIME /NODE 0 /AFFINITY 5555 you_game_here.exe
  • mat9v - Friday, August 18, 2017 - link

    I think I would start it on node 1 is anything since system task would be at default running on node 0.
    Mask 5555? Wouldn't it be AAAA - for 8 cores (8 threads) and FFFF for 8 cores (16 threads)?
  • peevee - Friday, August 18, 2017 - link

    The mask 5555 assumes that SMT is enabled. Otherwise it should be FF.

    When SMT is enabled, 5555 and AAAA will allocate threads to the same cores, just different logical CPUs.
    Where system threads will be run is system dependent, nothing prevents Windows from running them on NODE 1. /NODE 0 allows to run whether or not you actually have multiple NUMA nodes.

    With /REALTIME Windows will have hard time allocating anything on those logical CPUs, but can use the same cores with other logical CPUs, so yes, technically it will affect results. But unless you load it with something, the difference should not be significant - things like cache and memory bus contention are more important anyway and don't care on which cores you run.

Log in

Don't have an account? Sign up now