Cortex-A77 Cores, LPDDR5, Adreno 650 GPU

No More CPU Customisations For Now: Cortex-A77 Used

On the CPU side, the Snapdragon 865’s improvements over the Snapdragon 855 are very straightforward in terms of specifications:

SoC

Snapdragon 865

Snapdragon 855
CPU 1x Cortex A77
@ 2.84GHz 1x512KB pL2

3x Cortex A77
@ 2.42GHz 3x256KB pL2

4x Cortex A55
@ 1.80GHz 4x128KB pL2

4MB sL3 @ ?MHz
1x Kryo 485 Gold (A76 derivative)
@ 2.84GHz 1x512KB pL2

3x Kryo 485 Gold (A76 derivative)
@ 2.42GHz 3x256KB pL2

4x Kryo 485 Silver (A55 derivative)
@ 1.80GHz 4x128KB pL2

2MB sL3 @ 1612MHz

We find the same 1+3+4 CPU configuration as found on the Snapdragon 855, using the same 2.84GHz, 2.4GHz and 1.8GHz clock frequencies, and the same 512KB, 256KB and 128KB L2 cache configurations.

The big difference of course, that instead of using Cortex-A76 based cores in the S855, the new chipset is using Arm’s newest Cortex-A77 cores. A larger change in strategy this year is that while Qualcomm still uses the “Built-on-Cortex-technology” license to be able to customize some parts of the interface IP of the CPUs (And be able to brand it as the Kryo 585 parts), they’ve abandoned customisations on the CPU core itself. Qualcomm saw that the return on time investment with Arm to customise previous generations didn't result in as high returns as they would have hoped, and for the Snapdragon 865 they simply opted to use the default configuration Cortex-A77 as offered by Arm.

Still, the new CPU microarchitecture is said to be able to offer a 25% performance uplift this year, all of it which is essentially due to IPC improvements of the new design. As we had expected, we’re not seeing any clock improvements this year, and the CPU frequencies remain flat, reaching an identical 2.84GHz for the “Prime” core and 2.4GHz for the “Gold” cores.

Qualcomm claims a 25% power efficiency gain for the new SoCs, but this is an ISO-performance comparison against the previous generation (At equal performance to the S855, the S865 uses 25% less energy/power). In absolute terms at maximum performance, the new Snapdragon 865 big CPUs use more power to achieve their higher performance, meaning efficiency at peak perf is flat.

The Cortex-A55 cores remain practically the same, which to be honest are getting a bit long in the tooth as they’ve essentially not changed in 3 generations now. We’re very much overdue a large microarchitectural update from Arm as the yearly small efficiency core updates from Apple now put the Android SoCs to shame in regards to performance and efficiency.

One aspect that Qualcomm did finally improve on is the L3 cache of the CPU cluster, which has now been increased from 2MB to 4MB. While the performance benefits here are welcome, Qualcomm explains that the main reason for the change was increased power efficiency, reducing more expensive DRAM memory accesses in light workloads, which makes a lot of sense.

The Adreno 650 GPU: Similar Architecture, Wider Microarchticture 

On the GPU side of things, we’re seeing Qualcomm make evolutionary changes to the new IP block. It’s actually a bit unusual for Qualcomm to remain in the 600-series this year and the company had never before stretched out an architecture over three generations like this.

SoC

Snapdragon 865

Snapdragon 855
GPU Adreno 650 @ ? MHz

+25% perf
+50% ALUs
+50% pixel/clock
+0% texels/clock
Adreno 640 @ 585 MHz




 

Qualcomm’s improvement claims for this generation come down to an aggregate of 25% over a variety of industry benchmarks and real workloads. It’s to be noted that the company did say we’ll be see higher improvements in some workloads, for example higher complexity tests such as Aztec might see a higher percentage.


Qualcomm's "Mobile GPU approaches to power efficiency"
@ High-Performance Graphics 2019

Earlier last summer Qualcomm did detail some titbits of its architecture, describing the ALUs execution units as being separate MUL+ADD units per lane, and also showcased some first-party performance metrics of the Adreno 640 GPU. The Adreno 630 and 640 were both 2-core GPUs with each respectively 256 and 384 ALUs per core based on the measured throughput. The Adreno 650 increasing this by 50% could either mean we’re seeing three 256-ALU cores, or that there’s now two 512-ALU cores. The resulting computational power at a hypothetical 600MHz clock rate would be ~1.2TFLOPs. We’ll have to wait and see where the frequency ends up at, Qualcomm didn't want to disclose it.

The Adreno 640 had made a huge jump in texture fillrate as it doubled up from the TMUs from 12 per core to 24 per core, for a total of 48. This resulted in quite the unusual 3:1 texel:pixel ratio in terms of fillrate capability. The Adreno 650 now increasing the ROPs by 50% means well see an increase from 8 per core (16 total) to 12 per core (24 total), bringing it back to a 2:1 balance, with the pixel fillrate capabilities now catching up to the Mali GPU counterparts.

I asked about absolute power consumption, and the company discloses that it’ll be similar to the Snapdragon 855 in terms of peak power. There’s a slight contradiction here as the company also mentions that sustained performance is going to be a lot better than on the S855. It’s possible that we would be seeing larger performance improvements at lower than peak frequencies, and the company does say that at equal performance levels, the Adreno 650 is now 35% more efficient than its predecessor.

It was also a good opportunity to talk about the increasing peak power consumption over the last few generations. Back during the Snapdragon 835 days and earlier, Qualcomm’s SoCs were praised and took pride in being able to remain quite cool, with peak device active load power only being 3.5-4W. In more recent generations this has steadily increased, with the Snapdragon 855 now falling in at around 5W. Qualcomm explains that this was actually a change requested by OEMs who had improved the thermal dissipation capabilities of their device designs – telling the company that they are now able to handle higher power levels, and for Qualcomm to use that to increase performance.

I’m not too sure how to feel about this as it’s pretty much a double-edged sword and puts the responsibility of actually achieving the peak performance of the SoC solely in the hands of the device vendor, and them being able to properly design a good thermal solution. As we’ve already seen this year, some vendors are doing well in this regard, while others prioritise temperature over retaining higher performance levels. In any case, our unique testing methodology makes sure we’re exposing such differences in detail.

Unrelated to the hardware of the GPU, a big new improvement for the Adreno 650 and Snapdragon SoCs going forward is the fact that Qualcomm is now planning to distribute and update the graphics drivers over the Play Store. This feature was actually introduced to Android back in 2017 with Android 8/Oreo, but up until now no vendors actually ever took advantage of it, as it looks like it took time for vendors to make this feature operable with their BSPs. Qualcomm is now the first to achieve this, and the company is planning on quarterly GPU driver updates for new SoC generations going forward.

2020 Sees The Commercialisation of LPDDR5

2020 is the year where we’ll be seeing LPDDR5 being introduced to mobile. The new memory standard promises to improve the power efficiency per bit transferred by roughly 20%, and the Snapdragon 865 now comes with support for the new technology.

SoC

Snapdragon 865

Snapdragon 855
Memory
Controller
4x 16-bit CH

@ 2133MHz LPDDR4X / 33.4GB/s
or
@ 2750MHz LPDDR5  /  44.0GB/s

3MB system level cache
4x 16-bit CH

@ 2133MHz LPDDR4X 33.4GB/s



3MB system level cache

What’s interesting here is that Qualcomm is using a hybrid memory controller, supporting both LP4X and LP5 memory standards. LPDDR4X is supported to up to 2133MHz, while LPDDR5 memory is supported 2750MHz (5500MHz effective considering DDR). The bandwidth improvement here is 31%, and Qualcomm notes that the two technologies will roughly offer the same memory latency, meaning no improvement nor degradation.

Surprisingly, Qualcomm sounded rather conservative in regards to the actual performance impact of the new technology. They do say that it will bring advantages in bandwidth starved situations, but the company’s representatives underplayed the impact that it will have. Qualcomm allows vendors to choose between LP4X and LP5 in their device implementations, and says the performance differences between the two would a few percentage points. The new memory technology is rather a matter of future-proofing the SoC as well as being able to support the memory as DRAM-manufacturers shift their production lines.

The company also said that they’ve improved the memory subsystem of the SoC and were able to lower latency to around “130ns”, although it’s not exactly clear what the context and percentage improvement here is – if it’s most likely a full random access test, in which case in comparison the S855 landed in at 150-170ns depending on memory depth, then that’s actually a quite notable and respectable improvement.

The system level cache of the SoC still falls in at 3MB this generation, the same as the previous generation. I questioned if it made sense to still maintain this at this size, especially in the face of now almost every vendor having a similar capability, and especially in the face of Apple’s humongous 16MB implementation in the A13. The company said that for their architecture and workloads that they’re seeing, they opted to retain the size, but are looking into the future to possibly increase it (Die area is probably a large consideration here, maybe expect an increase in the next process node?).

The Snapdragon 865 SoC: A Surprise In Modem Configuration Immense Camera Upgrades: 15 TOPs AI, 200 MPix Sensors, 8K30 Recording
Comments Locked

91 Comments

View All Comments

  • Alistair - Wednesday, December 4, 2019 - link

    To be a bit more clear, the touch responsiveness and screen is better with my android, and the text message integration with windows is amazing (bring imessage to windows and maybe I'll get another iphone).
  • Raqia - Wednesday, December 4, 2019 - link

    Yeah, there's obvious appeal to the seamless consumer electronics that Apple produces. They have an easier job than the likes of Qualcomm with its dozens of partners and on average end up with better results as well. I'm very impressed with their latest iPad Pro myself.

    However, their homogeneity poses great risks to consumers and industry competition in the long run. They do not allow competing store fronts on their platform (which they should be forced to open with licensing on FRAND terms) and charge an exorbitant 30% fee to software writers.
    Their much touted security may only locally obfuscate severe bugs in their very large ecosystem:

    https://www.vice.com/en_us/article/pajkkz/its-almo...

    Their treatment of suppliers is downright abusive, cheating business after business such as Dialog, Imagination, and Qualcomm out of their IP and stifling the ability of the industry to support competing products. There are real perils of vertical integration:

    https://www.eetimes.com/document.asp?doc_id=133200...

    I hope Apple continues to keep the industry on its toes with its excellent execution, but I also hope it opens its platform, by regulatory force if necessary.
  • generalako - Thursday, December 5, 2019 - link

    That's not a fair comparison, seeing as the 4 XL not only has an underclocked SD855, but also UFS 2.1 and not the best software performance optimization. Compare it to a OnePlus 7(T) Pro, which has much faster storage (UFS 3.0) performance, larger and better RAM management and proper performance optimization in both interface and in relation to the CPU, and the difference you claim to see will vanish. Just do yourself the favor and look at comparison videos on YouTube.

    If software smoothness is what's important to you, I get your grievances. But then again, 90Hz makes up a lot of that (and more), and OxygenOS is probably the most stable and smooth third-party interface on Android after Pixel UI.
  • name99 - Thursday, December 5, 2019 - link

    "the snappiness of iPhones doesn't have especially much to do with peak single threaded integer throughput so much as IO and memory performance coupled with tight integration of iOS with hardware."

    People keep claiming this. But that SAME tight OS integration has existed on every iPhone since at least the A6 and A7...
    Even so, every year I can tell feel the increased fluidity of the new phones. Even at iPhone 6 people were claiming that phones were fast enough, that they never dropped frames. And yet each successive 20% to 30% annual boost is notable in feeling that much smoother, especially as ever more of the UI is built around swiping in different directions rather than tapping.

    Are we NOW maxed out? Certainly when I use my A12 and A12X based phone+iPad I don't NOW feel any delays in the UI that bug me. (Every year it's got better; with iPhone 6 it was at the point of "thank god I don't have to wait", since then it has been "yes, definitely smoother, no stuttering, feels right").
    And you could say, at this point, OK, good enough, we don't need to do more. Certainly plenty of people seem to think that way (many on the Android side, at least some on the Apple side). But there's is still so much more phones COULD do. Where's my real-time translation (text and speech)? Where's my assistant fixing my typos at the sentence and paragraph level, rather than at the basic (and not THAT accurate) word-by-word level?
    If you change the question from "is my phone now fast enough" to "what would I like my phone to do, to hell with practicality or current software technology" you look at CPU design in a very different way.

    Apple is certainly on that second track. ARM and QC I think also are for the past few years, though it's not an especially natural place for them, and I wouldn't be surprised if the inside voices pushing for excellence are in a fragile position, liable to be ousted if there's a single false step...
  • Raqia - Thursday, December 5, 2019 - link

    CPUs aren't responsible for much of the heavy lifting in the tasks you're describing like smooth UX scrolling or voice translation. They handle control general purpose program flow which is memory intensive or dynamic recompilation which can bottleneck in some cases like browser execution of Javascript or during just plain benchmarking scenarios.

    The small cores, larger caches, better buses / IO, GPU compositing functions, and the new AI units are much more responsible for typical user experience than peak CPU single threaded performance, and indeed Apple excels here too but not to the degree they do over Android SoCs in the single threaded metric. Apple is riding high on the positive wave of press and user perception over its excellent CPU performance though due to its being one of the only components on an SoC that's easy to systematically benchmark and publicize.
  • Sharma_Ji - Wednesday, December 4, 2019 - link

    If you get time some someday, use some snappy android phones from likes of 1+, Asus, etc.
  • Ironchef3500 - Thursday, December 5, 2019 - link

    I am starting to feel the same way..
  • generalako - Thursday, December 5, 2019 - link

    That gap is NOT widening. It is closing. SD855 essentially cut the gap by a 40%, to its lowest point in many years. Even SD865, with A77, is making sure that gap has not widened (in fact, slightly decreased). So your comment is false.

    Where the gap has been widening, is in Apple's efficiency cores and in GPU performance, however. Here, ARM and Qualcomm have a lot of work to do.
  • Kabm - Wednesday, December 4, 2019 - link

    Now there are a market for gamer chip. But before QC don't have room as Apple as they have integrated 4G modem. The 865 is the first to have the same room as Apple
  • ksec - Wednesday, December 4, 2019 - link

    While Geekbench is not a perfect benchmark ( No Benchmark is ever perfect ), it is a good tool to estimate performance.

    The best Single Core Performance of 855 is around ˜710, so a 25% increase would be around 900. an iPhone 8 does 900+, iPhone XS does 1100, and iPhone 11 does 1300.

    Of coz MultiCore would blow past iPhone X or even XS. But I dont care much about MultiCore Performance. You are still fundamentally limited by Single Core performance.

    And of course, your System Performance ( Not your CPU performance ) depends a lot on Software, NAND Speed, Controller, Memory etc.

Log in

Don't have an account? Sign up now