AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
by Johan De Gelas on November 27, 2007 6:00 AM EST- Posted in
- IT Computing
The Opteron 2360SE - the Facts
Getting back to AMD, the new quad-core chip still lives under veil of secrecy. Quite a few rumors and myths are going around and we investigated them one by one so we could be sure that you only get the facts.
Fact 1: The B2 stepping does not have a much faster memory controller than the B1 stepping
The controller found in stepping "2" might be a tiny bit faster, but we have not found any significant difference. Our Stream benchmarks were only a tiny bit faster on the 2.5GHz (Stepping 2) than on the 2GHz (Stepping DR-B1) and so were the latency numbers.
The 2.5GHz Barcelona is a newer stepping than the 2GHz sample we tested earlier.
Fact 2: The 2.5GHz review sample is running at 1.2V; it is not overclocked
CPU-Z reported that the chip was running at 1.5V, while the 2GHz quad-core was running at 1.2V.
Power measurements show that the BIOS of our ASUS board is accurate, but CPU-Z is not. The last evidence is of course the laser marking on the quad-core Opteron.
AMD is capable of producing 2.5GHz quad-core, but not in large quantities at this time. The 2.5GHz part should arrive at the end of this year, with large quantities expected in the first quarter of 2008.
Fact 3: the memory controller always runs RAM at the rated frequency
In this case, the new quad-core Opteron is completely different from what we have seen with previous Opteron (and Athlon 64) processors. In the first and second generation Opteron, the memory controller ran at a divisor of the CPU. This resulted in very odd memory clocks speeds at times, particularly on odd multipliers. For example, a 2.2GHz Opteron (11X multiplier) uses a divisor of 7 and ends up running DDR2-667 (333.5MHz clock) at 314MHz. That gives DDR2-628 instead of 667. In reality, this doesn't have any major performance impact, and it is only measurable with "Stream-like" benchmarks. In contrast, Barcelona's memory controller runs at its own frequency and will run the DIMMs at the rated speed.
Fact 4: By running the Northbridge at a lower speed, the new quad-core loses a bit of performance but saves power
The core of the Opteron 2360 runs at 2.5GHz, but the L3 cache runs at Northbridge frequency, which is 2GHz. It seems that AMD's engineers felt that running the L3 at core frequency would not have resulted in significantly higher performance, but significantly higher power dissipation. From another point of view, given a certain power envelope, running the Northbridge and the L3 cache at higher frequencies would result in lower core frequencies.
Fact 5: The L3 cache was a good choice, but…
The L3 cache does increase latency of accessing the main memory but decreases the average latency seen by the CPU. This leads to the question of whether the relatively slow L3 cache is really an advantage. The L3 cache has a latency of 43 cycles (2GHz) or 48 cycles (2.5GHz), but it's still quite a bit faster than system memory, which takes about 130 to 170 cycles to access.
In addition, it has one main advantage for server workloads. If more than one core is accessing a cacheline in the L3 cache, it will remain in the L3. If not, the L3 cache will behave like a fully exclusive cache: it will send the cacheline to the L1 and throw out the cacheline to make place for a "victim" of the L2. This allows relatively fast sharing of data between threads, which is important for large code footprint applications like database applications and others. For single-threaded applications, it looks like they get a 2.5MB L2 cache, although with an average latency of about 20 cycles.
Still, there is no doubt that the L3 cache of Barcelona could have been a bit bigger to score even better in the larger database benchmarks such as TPC. We have to guess that a larger L3 cache became a victim (pun intended) of the already large 283 mm² die size. Still, a 44 cycle latency (and more) is rather disappointing for only 2MB of L3 cache.
Fact 6: Dual-Link is possible with AMD 2xxx Opterons
Several readers asked us how it was possible that our ASUS KFSN4-DRE board linked our 2350 CPUs with two instead of one HyperTransport point-to-point connection, as the 23xx Opteron supports only one coherent HyperTransport link. However, the constraint is not the number of links but actually the number of coherent responses that are supported. Our ASUS board does feature twice as much bandwidth for CPU-to-CPU traffic (snoop, access to remote memory etc.)
43 Comments
View All Comments
Hans Maulwurf - Wednesday, November 28, 2007 - link
Agreed, I have not seen an article as good as this one for years at Anandtech. And not for some time on other review sites as well.Thank you.
JohanAnandtech - Tuesday, November 27, 2007 - link
Thanks people. This kind of articles take ridiculously amounts of time and I really appreciate that you let me know that you liked the article. It keeps us going. (and I mean that!)magreen - Tuesday, November 27, 2007 - link
Excellent article, thorough and with amazing depth and expertise. Keep up the great work AT!Bluestealth - Tuesday, November 27, 2007 - link
I agree, it was a very well done article. I can't wait to see how Intel's processors preform on Hyper... errr... Common System Interface (next year?). I believe that I will be buying AMD until that happens though for any servers.Regs - Tuesday, November 27, 2007 - link
Yeah, every time I see "Johan De Gelas" I have to read it.I like the added info on the Barc's L3 cache and the intro-factoid about the new architecture.
I agree that the Barc's arrival is a year late and joined the party a little too shy. Integer performance will likely have to be addressed in the Bulldozer in 2-3 years. Which is 2-3 years too long. I would be really surprised if they can manage anything other than a die shrink for Shanghi with maybe more L3 cache and some tweaks for cache latency and SSE.
Just seems like AMD took a nose dive in development for their processors in the past 3-4 years. After the K8 I would think they would be able to come up with something more innovative. Revolutionary should of never entered their heads and they should actually look down upon themselves for using such a word after 4 years.
jones377 - Tuesday, November 27, 2007 - link
Any chance you could use the same tools to profile desktop applications as well in the future?DigitalFreak - Tuesday, November 27, 2007 - link
Three months or so since "launch", and you still can't get a server with AMD quad-core chips from any of the big 3 vendors (HP, Dell, IBM). AMD really screwed the pooch on this one.jojo4u - Tuesday, November 27, 2007 - link
Yuck, ugly GIF on the first page. Please use PNG because 256 colors are not enough for screenshots ;)deathwombat - Saturday, December 1, 2007 - link
In addition to being less ugly, PNG's higher compression would also make the file smaller (using less bandwidth), which I assume is what they were going for.jkostans - Tuesday, November 27, 2007 - link
Didn't even notice.