AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
by Johan De Gelas on November 27, 2007 6:00 AM EST- Posted in
- IT Computing
Introduction
The new Xeon 45nm (or Xeon 54xx series) arrived in the lab weeks ago, so a server CPU benchmark update is overdue. However, there is another reason we decided to write this article. When AMD launched their newest quad-core chips, we could only give you a preview of its performance - not a full review. We did not fully understand many of the performance aspects of the new AMD architecture, so we decided to delve a little deeper.
We will only focus on performance in this article, as our primary goal is to get an idea where Barcelona (AMD's quad-core), Harpertown (Intel 45nm quad-core Xeon), and Clovertown (Intel quad-core 65nm Xeon) stand. To do so, we performed a minimal profiling of each of our benchmarks and we used several new micro benchmarks that will tell you a lot more than some real world benchmarks can. If you like understanding the benchmarks out there a bit better, dig in.
The new 45 Xeon
The new 45nm Xeon 54xx series, aka "Harpertown", is still based on the Core architecture, but it has been tweaked a bit. We have already discussed those improvements in detail here and here, so we won't discuss them in detail again, but here's a quick overview:
- Faster 4-bit divider (Radix-16) instead
of 2-bit divider
- Up to 1600MHz FSB
- Shared 6MB (24-way set associative) instead 4MB L2 cache (per dual-core
die)
- Super Shuffle engine (For SSE instructions)
- Split Load Cache Enhancement
- SSE4
The Radix 16 divider is the most interesting of these improvements. Dividing involves a repetition of subtractions, tests "if-it-fits" and shifts. If you can do this with four bits at a time instead of two, this means that you can cut the number of these iterations required to get your result in half. The square root calculation is similar and also benefits from these improvements. While divisions and square roots are rather rare in common software, they have a very significant performance impact. Contrary to the more "popular" instructions, they are not pipelined and the latency of these instructions is high. For example, a floating-point multiply takes five cycles and the "Clovertown Core" architecture can finish one every two cycles thanks to pipelining. However, a floating-point division takes no less than 32 cycles and cannot be pipelined at all.
Nanotechnology is here: a core with 820 million transistors and you can fit at least two of them in one coin. (Photo by Tjerk Ameel)
Besides being an improved "Clovertown", the new Xeon is also a marvel of nanotechnology with no less than 410 million transistors on a die of only 107 mm². Two die make one quad-core Xeon 54xx. Considering that this CPU is close to behemoth CPUs like Itanium and Power 6 in SPECint performance, the new Intel is a formidable adversary for AMD's newest quad-core.
There is more than the CPU of course. The new CPU works on the old "Bensley/5000P chipset" platform, though we were not able to get it running on our P5000PSL Intel motherboard despite the fact that we applied the BIOS update that came out early this month.
There is also a new HPC/workstation platform for the Intel Xeon thanks to the Seaburg chipset, which features an improved snoop filter. Besides reducing the snoop traffic, the new chipset should also be able to extract more bandwidth out of the same FBDIMMs. The support for DDR2-800 FBDIMMs should bring another performance boost but our current test platform is only stable with DDR2-667 FBDIMMs.
43 Comments
View All Comments
Hans Maulwurf - Wednesday, November 28, 2007 - link
Agreed, I have not seen an article as good as this one for years at Anandtech. And not for some time on other review sites as well.Thank you.
JohanAnandtech - Tuesday, November 27, 2007 - link
Thanks people. This kind of articles take ridiculously amounts of time and I really appreciate that you let me know that you liked the article. It keeps us going. (and I mean that!)magreen - Tuesday, November 27, 2007 - link
Excellent article, thorough and with amazing depth and expertise. Keep up the great work AT!Bluestealth - Tuesday, November 27, 2007 - link
I agree, it was a very well done article. I can't wait to see how Intel's processors preform on Hyper... errr... Common System Interface (next year?). I believe that I will be buying AMD until that happens though for any servers.Regs - Tuesday, November 27, 2007 - link
Yeah, every time I see "Johan De Gelas" I have to read it.I like the added info on the Barc's L3 cache and the intro-factoid about the new architecture.
I agree that the Barc's arrival is a year late and joined the party a little too shy. Integer performance will likely have to be addressed in the Bulldozer in 2-3 years. Which is 2-3 years too long. I would be really surprised if they can manage anything other than a die shrink for Shanghi with maybe more L3 cache and some tweaks for cache latency and SSE.
Just seems like AMD took a nose dive in development for their processors in the past 3-4 years. After the K8 I would think they would be able to come up with something more innovative. Revolutionary should of never entered their heads and they should actually look down upon themselves for using such a word after 4 years.
jones377 - Tuesday, November 27, 2007 - link
Any chance you could use the same tools to profile desktop applications as well in the future?DigitalFreak - Tuesday, November 27, 2007 - link
Three months or so since "launch", and you still can't get a server with AMD quad-core chips from any of the big 3 vendors (HP, Dell, IBM). AMD really screwed the pooch on this one.jojo4u - Tuesday, November 27, 2007 - link
Yuck, ugly GIF on the first page. Please use PNG because 256 colors are not enough for screenshots ;)deathwombat - Saturday, December 1, 2007 - link
In addition to being less ugly, PNG's higher compression would also make the file smaller (using less bandwidth), which I assume is what they were going for.jkostans - Tuesday, November 27, 2007 - link
Didn't even notice.