Chipworks Disassembles Apple's A8 SoC: GX6450, 4MB L3 Cache & More
by Ryan Smith on September 23, 2014 1:00 PM ESTOne of the more enjoyable rituals with Apple’s annual iPhone launch is the decapping, deconstruction, and photographing of the processor die at the heart of Apple’s newest SoC. While we can learn a lot about the SoC from software, for some things there’s just no replacement for looking at the hardware itself and counting the functional blocks present. And this year, as in past years, the honor of being the first to tear apart the SoC goes to Chipworks.
For determining the layout of A8, Chipworks reached out to us to solicit our input on their die shot, and after some rounds of going back and forth we believe we’ve come to a solid determination of some of A8’s features and how it has been configured. So let's dive in.
First and foremost we’ll start with A8’s GPU, as this was one of the hardest elements to analyze in software. Based on Apple’s 50% performance improvement we had previously speculated that A8 contained an Imagination PowerVR GX6650. However as we noted back then, a die shot would reveal all, and right on schedule it has.
A close analysis of the die shot makes it clear that there are only 4 GPU cores available and not 6, which immediately rules out the 6 core GX6650 we were previously expecting. Instead with 4 cores present this is conclusive proof that Apple is using the smaller 4 core GX6450 on A8, the direct successor to the G6430 used on the A7. GX6450 induces some performance optimizations along with some feature updates – including ASTC support, which Apple’s documentation has already confirmed is present – so its inclusion here is a natural progression for Apple.
On A8 and its 20nm process this measures at 19.1mm2, versus A7’s 22.1mm2 G6430. As a result Apple is saving some die space compared to A7, but this is being partially offset by the greater complexity of GX6450 and possibly additional SRAM for larger caches on the GPU. Meanwhile looking at the symmetry of the block, it’s interesting that the blocks of texturing resources that every pair of GPU cores share is so visible and so large. With these resources being so big relative to the GPU cores themselves, you can see why Imagination would want to share them as opposed to building them 1:1 with the GPU cores.
Meanwhile opposite the GPU we have the CPU block. Unlike the GPU the CPU block has seen some significant shrinking, which Chipworks estimates is down from 17.1mm2 in A7 to 12.2mm2 in A8. In A7 Cyclone did not lend itself to easily picking apart the individual CPU cores, and neither does the CPU here in A8. We’ll be looking at the new CPU’s architecture in-depth in our iPhone 6 review, but for now it’s safe to say that while this is definitely derived from Cyclone, Apple has added a few tweaks over the last year that make it an even more potent CPU than the first Cyclone. Meanwhile based on this die shot Chipworks believes that the L2 cache has been reorganized to a per-core design, as there is no obvious single block of L2 on A8 like there was A7.
The final major identifiable block on A8 is once again the SRAM cache memory. On A7 we discovered that this block was 4MB and was responsible for servicing the GPU and CPU. On A8 this block is similarly present and serving the same role. This 4MB of SRAM ends up being quite big despite the shrink from 28nm to 20nm, and while at first glance it seems like it should be larger than 4MB given the relative size, in practice what has happened is that the individual SRAM cells have not shrunk by a full 50%. Chipworks estimates the cell size to now be about 0.08µm2, versus 0.12µm2 on A7, which is closer to a 33% shrink that a 50% shrink. As a result the SRAM cache still takes up a fair bit of space, but the value of being able to serve larger memory requests without having to go off-die continues to be immense.
Apple A8 vs A7 SoCs | ||||
Apple A8 (2014) | Apple A7 (2013) | |||
Manufacturing Process | TSMC 20nm HKMG | Samsung 28nm HKMG | ||
Die Size | 89mm2 | 104mm2 | ||
Transistor Count | ~2B | "Over 1B" | ||
CPU | 2 x Apple Enhanced Cyclone ARMv8 64-bit cores | 2 x Apple Cyclone ARMv8 64-bit cores | ||
GPU | IMG PowerVR GX6450 | IMG PowerVR G6430 |
Overall, Chipworks’ analysis points to A8 being fabbed on TSMC’s 20nm process. This makes A8 among the first SoCs to receive the 20nm treatment. Thanks to this smaller node Apple has been able to build in additional features to the SoC while simultaneously shaving off around 15% of their die size. Chipworks estimates the final die size of A8 to stand at 89mm2, versus the 104mm2 for the Samsung 28nm based A7. Chipworks notes that if this were a straight shrink that one would expect the A8 to be closer to 50% the size of A7 (though not all logic can shrink quite that well), which indicates that Apple has spent quite a bit of die size on improving performance through more complex CPU and GPU architectures and miscellaneous feature additions.
Wrapping things up, we’ll be back later this month with our review of the iPhone 6 family and our full analysis of the A8 SoC. So until then stay tuned.
Source: Chipworks
65 Comments
View All Comments
barleyguy - Wednesday, September 24, 2014 - link
First of all, it was an analogy, not a metaphor.Also, Chipworks is doing nothing but taking a picture of a physical object. Which is not illegal, regardless of whether Apple approves of it or not. If Chipworks did get sued for this, I'd definitely contribute to their legal fund.
And, the car companies DID at one time go around suing people for stuff like this. There were some famous legal cases dealing with the large service manuals like Chilton's. But auto manufacturing is a mature industry; most of these types of things were resolved 40 years ago or so (much of the legal action was in the 1970s). Smartphones on the other hand are a brand new industry, and there need to be some pissing matches before companies figure out that stupid lawsuits are a waste of time and money.
ancientarcher - Wednesday, September 24, 2014 - link
How many car manufacturers sued competitors for having 4 wheels and the same shape as their cars??BrooksT - Wednesday, September 24, 2014 - link
I'm aware of 5 (Google for specifics):BMW sued Shaunguan over an X3 copy
Mercedes-Benz sued Shaunguan over a Smart Car copy
Ford sued JAC over an F150 copy
GM sued Chery over a Matiz copy
Fiat sued Great Wall over a Panda copy
CharonPDX - Wednesday, September 24, 2014 - link
Um, yes, actually: http://www.automoblog.net/2011/01/14/100-years-ago... (Or, more basically, for putting an engine on a chassis with wheels.)tipoo - Tuesday, September 23, 2014 - link
Any chip IP in there will be far smaller than what this picture can resolve. We can count sram blocks and registers all we want, but the low level optimizations are far deeper.Think about it, we've been looking at AMD and Intel and Nvidia die shots for years (well, some bullshots too). They even divulge them freely. If it was a competitive concern, would they?
BrooksT - Wednesday, September 24, 2014 - link
Besides, reverse engineering is perfectly legal. If Chipworks takes super high resolution images and ships them to a fab for manufacture, that would be illegal. Merely taking things apart, and even publishing detailed analysis, is not illegal. IP law does not recognize a right to secrecy (except trade secrets, but it's not illegal to discover / reverse engineer a trade secret)allanmac - Tuesday, September 23, 2014 - link
If you don't actually know for sure that it's a 4-cluster GPU then write a trivial Metal kernel that occupies just under 16KB of local memory per kernel workgroup and performs a fixed amount of work.Launch a grid of (12*N) workgroups. If it's a 4-cluster GPU it will take (3*N) time units to execute. If it's a 6-cluster then it will take (2*N). Adjust for clock frequency and architectural differences.
This is the only way to be sure.
Homeles - Tuesday, September 23, 2014 - link
Or, you know, look at the die. That is, if you know what to look for.allanmac - Wednesday, September 24, 2014 - link
True, you're right! The die reveals everything once you know what to look for.I think everyone assumed Apple's on-stage "50%" claim *had* to be accomplished with GX6650 and didn't realize that ImgTec had already stated on their blog that improvements in the XT series could result in 50% performance gains.
A couple of those early iPhone 6/6+ benchmarks show solid improvement over the 5s despite the new devices having 1.4x / 3.8x (6/6+) more pixels (real/virtual).
ltcommanderdata - Tuesday, September 23, 2014 - link
The proportion of the die taken up by the CPU, GPU, and L3 cache has decreased between the A7 and A8. Any thoughts where the transistors are now going?