AMD's 3rd generation Opteron versus Intel's 45nm Xeon: a closer look
by Johan De Gelas on November 27, 2007 6:00 AM EST- Posted in
- IT Computing
Introduction
The new Xeon 45nm (or Xeon 54xx series) arrived in the lab weeks ago, so a server CPU benchmark update is overdue. However, there is another reason we decided to write this article. When AMD launched their newest quad-core chips, we could only give you a preview of its performance - not a full review. We did not fully understand many of the performance aspects of the new AMD architecture, so we decided to delve a little deeper.
We will only focus on performance in this article, as our primary goal is to get an idea where Barcelona (AMD's quad-core), Harpertown (Intel 45nm quad-core Xeon), and Clovertown (Intel quad-core 65nm Xeon) stand. To do so, we performed a minimal profiling of each of our benchmarks and we used several new micro benchmarks that will tell you a lot more than some real world benchmarks can. If you like understanding the benchmarks out there a bit better, dig in.
The new 45 Xeon
The new 45nm Xeon 54xx series, aka "Harpertown", is still based on the Core architecture, but it has been tweaked a bit. We have already discussed those improvements in detail here and here, so we won't discuss them in detail again, but here's a quick overview:
- Faster 4-bit divider (Radix-16) instead
of 2-bit divider
- Up to 1600MHz FSB
- Shared 6MB (24-way set associative) instead 4MB L2 cache (per dual-core
die)
- Super Shuffle engine (For SSE instructions)
- Split Load Cache Enhancement
- SSE4
The Radix 16 divider is the most interesting of these improvements. Dividing involves a repetition of subtractions, tests "if-it-fits" and shifts. If you can do this with four bits at a time instead of two, this means that you can cut the number of these iterations required to get your result in half. The square root calculation is similar and also benefits from these improvements. While divisions and square roots are rather rare in common software, they have a very significant performance impact. Contrary to the more "popular" instructions, they are not pipelined and the latency of these instructions is high. For example, a floating-point multiply takes five cycles and the "Clovertown Core" architecture can finish one every two cycles thanks to pipelining. However, a floating-point division takes no less than 32 cycles and cannot be pipelined at all.
Nanotechnology is here: a core with 820 million transistors and you can fit at least two of them in one coin. (Photo by Tjerk Ameel)
Besides being an improved "Clovertown", the new Xeon is also a marvel of nanotechnology with no less than 410 million transistors on a die of only 107 mm². Two die make one quad-core Xeon 54xx. Considering that this CPU is close to behemoth CPUs like Itanium and Power 6 in SPECint performance, the new Intel is a formidable adversary for AMD's newest quad-core.
There is more than the CPU of course. The new CPU works on the old "Bensley/5000P chipset" platform, though we were not able to get it running on our P5000PSL Intel motherboard despite the fact that we applied the BIOS update that came out early this month.
There is also a new HPC/workstation platform for the Intel Xeon thanks to the Seaburg chipset, which features an improved snoop filter. Besides reducing the snoop traffic, the new chipset should also be able to extract more bandwidth out of the same FBDIMMs. The support for DDR2-800 FBDIMMs should bring another performance boost but our current test platform is only stable with DDR2-667 FBDIMMs.
43 Comments
View All Comments
tshen83 - Tuesday, November 27, 2007 - link
Seriously, can you buy the 2360SE? Newegg doesn't even stock the 1.7Ghz 2344HEs.The same situation exist on the Phenom line of CPUs. I don't see the value of reviewing Phenom 9700, 9900s when AMD cannot deliver them. I am trouble locating Phenom 9500s.
alantay - Tuesday, November 27, 2007 - link
The MySQL scalability problem is not so much in MySQL as in the Linux kernel and Glibc used.To have it scale correctly to 8 CPUs you need kernel 2.6.22.x (alternatively you could try with a 2.6.24-RC -should be a bit faster-, but not with 2.6.23.x) and Glibc 2.6 or higher.
A default Ubuntu 7.10 for example should scale well with MySQL (OpenSUSE 10.3 *might* work, but they have backported the 2.6.23 scheduler which has a scalability problem).
Thanks for the article!
JohanAnandtech - Tuesday, November 27, 2007 - link
Excellent feedback.It is a bit frustrating that once again you need some ultra new kernel and libraries to get good scalability. THat is unrealistic for people who use SLES and who rely on their support contract to get updates.
MGSsancho - Wednesday, November 28, 2007 - link
how about opensolaris? i dont know how much different it is from solaris 10, but it should be able to scale to dozens of cores nicely. I was about to ask about oracle and DB2 benchmarks but you answered that in your article; expensive, and the oems usually publish that info.anyways awesome article
Roy2001 - Tuesday, November 27, 2007 - link
I cannot find a SINGLE one, nowhere.drebo - Tuesday, November 27, 2007 - link
Newegg has the Phenom 9500 in stock. At least, they did yesterday. I've also got a vendor I use that has them in stock.JarredWalton - Tuesday, November 27, 2007 - link
But Phenom isn't Opteron 23xx. Different socket, different market, and it has L3. (Does Phenom X4 have an L3 cache? Maybe I should go check....)drebo - Wednesday, November 28, 2007 - link
Yes, Phenom 9500 has an L3. But if you look at his question (in the subject line), he is asking about barcelona as a whole and phenom specifically. The answer is Yes, they are available.Slaimus - Tuesday, November 27, 2007 - link
They may be gobbled by up Cray for that Budapest supercomputer.Regs - Tuesday, November 27, 2007 - link
I would not expect any from vendors and wholesalers until early next year.Matter of fact I wouldn't want one until then anyhow. I would at least wait until B3 stepping.