The Raw Benchmark Numbers

Section By Andrei Frumusanu

Before we go into more details, we're going to have a look at how much of a difference this behavior contributes to benchmarking scores. The key is in the differences between having Huawei/Honor's benchmark detection mode on and off. We are using our mobile GPU test suite which includes of Futuremark’s 3DMark and Kishonti’s GFXBench.

The analysis right now is being limited to the P20’s and the new Honor Play, as I don’t have yet newer stock firmwares on my Mate 10s. It is likely that the Mate 10 will exhibit similar behaviour - Ian also confirmed that he's seeing cheating behaviour on his Honor 10. This points to most (if not all) Kirin 970 devices released this year as being affected.

Without further ado, here’s some of the differences identified between running the same benchmarks while being detected by the firmware (cheating) and the default performance that applies to any non-whitelisted application (True Performance). The non-whitelisted application is a version provided to us from the benchmark manufacturer which is undetectable, and not publicly available (otherwise it would be easy to spot). 

3DMark Sling Shot 3.1 Extreme Unlimited - Graphics - Peak 

3DMark Sling Shot 3.1 Extreme Unlimited - Physics - Peak 

GFXBench Aztec High Off-screen VK - Peak 

GFXBench Aztec Normal Off-screen VK - Peak 

GFXBench Manhattan 3.1 Off-screen - Peak 

GFXBench T-Rex Off-screen - Peak

We see a stark difference between the resulting scores – with our internal versions of the benchmark performing significantly worse than the publicly available versions. We can see that all three smartphones perform almost identical in the higher power mode, as they all share the same SoC. This contrasts significantly with the real performance of the phones, which is anything but identical as the three phones have diferent thermal limits as a result of their different chassis/cooling designs. Consequently, the P20 Pro, being the largest and most expensive, has better thermals in the 'regular' benchmarking mode.

Raising Power and Thermal Limits

What is happening here with Huawei is a bit unusual in regards to how we’re used to seeing vendors cheat in benchmarks. In the past we’ve seen vendors actually raise the SoC frequencies, or locking them to their maximum states, raising performance beyond what’s usually available to generic applications.

What Huawei instead is doing is boosting benchmark scores by coming at it from the other direction – the benchmarking applications are the only use-cases where the SoC actually performs to its advertised speeds. Meanwhile every other real-world application is throttled to a significant degree below that state due to the thermal limitations of the hardware. What we end up seeing with unthrottled performance is perhaps the 'true' form of an unconstrained SoC, although this is completely academic when compared to what users actually expereience.

To demonstrate the power behaviour between the two different throttling modes, I measured the power on the newest Honor Play. Here I’m showcasing total device power at fixed screen brightness; for GFXBench the 3D phase of the benchmark is measured for power, while for 3DMark I’m including the totality of the benchmark run from start to finish (because it has different phases).

Honor Play Device Power - Default vs Cheating

The differences here are astounding, as we see that in the 'true performance' state, the chip is already reaching 3.5-4.4W. These are the kind of power figures you would want a smartphone to limit itself to in 3D workloads. By contrast, using the 'cheating' variants of the benchmarks completely explodes the power budget. We see power figures above 6W, and T-Rex reaching an insane 8.5W. On a 3D battery test, these figures very quickly trigger an 'overheating' notification on the device, showing that the thermal limits must be beyond what the software is expecting.

This means that the 'true performance' figures aren’t actually stable - they strongly depend on the device’s temperature (this being typical for most phones). Huawei/Honor are not actually blocking the GPU from reaching its peak frequency state: instead, the default behavior is a very harsh thermal throttling mechanism in place that will try to maintain significantly lower SoC temperature levels and overall power consumption.

The net result is that that in the phones' normal mode, peak power consumption during these tests can reach the same figures posted by the unthrottled variants. But the numbers very quickly fall back in a drastic manner. Here the device thottles down to 2.2W in some cases, reducing performance quite a lot.

Benchmarking Bananas: A Recap Getting the Real Data: Kirin 970 GPU Performance Overview


View All Comments

  • Cicerone - Friday, September 7, 2018 - link

    But sometimes Kirin 970 is on the same level with 2016 Exynos 8890 found on Samsung S7. Reply
  • shogun18 - Tuesday, September 4, 2018 - link

    > I think it's important for users to know that the Kirin 970 has a significantly weaker GPU than the S845

    How so? If some popular game needs 10,000 shader OPS to run at 800x600 at 30 frames/sec what difference does it make if one SoC can pump out 8000 (admittedly synthetic - are you really going to tell me you're going to notice 24FPS vs 30? pahlease), or 15,000 or another 40,000? Ok, so does OPS/Watt actually matter in anybody's evaluation metric? No. Does anyone choose a phone based on this one lets me run X game for 30 minutes before running out of batt but I can get 40 minutes with this other one because in "game mode" the manufacturer took liberties with wattage?
  • cfenton - Tuesday, September 4, 2018 - link

    What modern phone runs at 800x600? Also, faster GPUs can get closer to 60fps, which is definitely a noticeable improvement over 30fps.

    If all you're playing is Candy Crush, then it doesn't matter what GPU you have, but if you're playing Fortnite or the upcoming Elder Scrolls game, then GPU performance is important. If two phones are roughly the same price, but one of them has 3x the GPU power with no downsides, I'm going to go with the faster one every time.
  • shogun18 - Tuesday, September 4, 2018 - link

    The human eye in games like Fortnite etc can only process a very limited frame rate. So anything over 30 is basically pointless. Plus factor in using a 27+ monitor(s) vs a piddly-ass phone screen with lousy (by comparison to "gaming" monitors) refresh characteristics the benchmark is even less useful.

  • cfenton - Tuesday, September 4, 2018 - link

    That article make it very clear that people can tell the difference between 60fps and 30fps. Its claim is that it's only an improvement in smoothness, not an improvement in our ability to track changes. A higher frame rate won't improve my ability to pick out movement.

    60fps looks better than 30fps. If I can choose between the two, at the same resolution, I'm always going to pick 60fps. Will it make me better at the game? No. Does it make the game look at feel better? Yes.
  • techconc - Monday, September 10, 2018 - link

    @shogun18 - I always find it amusing when people present "evidence" to support their position only to find out the evidence they are producing very clearly refutes their position. The article very clearly states:
    "Certainly 60 Hz is better than 30 Hz, demonstrably better." - Professor Thomas Busey

    From my own perspective, I would suggest to you that games need to have a 30 fps at minimum to be playable and to appear to be somewhat fluid. 60 fps is clearly better, but not "twice as good". You can see the difference though. On my iPad, I can do 120 fps on games like World of Tanks Blitz and can even notice that difference. For some games, reaction time is critical and network performance also plays a role in this. However, higher frame rates can indeed provide a competitive advantage.
  • shogun18 - Tuesday, September 11, 2018 - link

    did you BOTHER to read to the end let alone comprehend what was being put forth? The human brain is SLOW! It's massively parallel but it's SLOW. Just like our ears are crap compared to other creatures who actually have good hearing. If you're playing FPS on a phone you're an idiot to begin with. Fluidity or more properly the perception of same doesn't make your performance better. Your reaction time is also completely shit compared to the theoretical frame rate you think you are perceiving. Anyone who cares about game play on a phone is a moron. Reply
  • Reflex - Tuesday, September 4, 2018 - link

    Buyers should be able to value whatever they wish when making their purchasing decisions. Lying to them denies them the right to make decisions based on the criteria that matter most to them, whether it be nice cameras, great screens, excellent call quality, or yes, 'geekmarks' or whatever.

    It's not for you to determine what is most important to a customer, nor is it ethical to lie about one of those or other items in order to trick people who value them into buying your product.
  • boozed - Tuesday, September 4, 2018 - link

    Funny you should say that, considering the reason for the existence of this website. Reply
  • Samus - Wednesday, September 5, 2018 - link

    You need to put a performance metric on things somehow. Cars have horsepower and torque, batteries have volts and milliamps, and food has protein and carbs.

    Unfortunately these metrics do not come from the SoC manufacturer, but the phone vendor. That therein lies the problem. "Overclocking" or boosting a SoC beyond reasonable thermal design limitations is blatant cheating if it can't be sustained throughout, say, a game, that the benchmark is momentarily mimicking.

    At the end of the day, this is really an Android problem too, because the freedom the OS gives phone vendors to manipulate the kernel, scheduler, and frequency curve of the CPU/GPU. This kind of flexibility didn't exist (and still doesn't exist) in other mobile operating systems.

    So imagine if this were happening in the PC space. Where vendors were selling overclocked systems WITHOUT SAYING they were overclocked. Where vendors were manipulating the real-world benefits of a GPU with software that faked benchmark results.

    I would liken it to what happened with the game console clones of the 80's, when there were third-party Atari's, Intellivisions, etc, that had custom CPU's running at higher frequencies. In that case, it actually hurt developers more than consumers (but still hurt consumers) because developers couldn't even depend on a performance metric for the platform they were developing for. This is partially why there were virtually no third party developers (Activision and Hudsonsoft - who later developed their own console simply to have some control over the hardware environment! - were effectively the first cross-platform developers.)

Log in

Don't have an account? Sign up now