The GPU Advances: ATI's Stream Processing & Folding@Home

Name: The GPU Advances: ATI's Stream Processing & Folding@Home
Item: The GPU Advances: ATI's Stream Processing & Folding@Home
Author: Ryan Smith

by Ryan Smith on September 30, 2006 8:00 PM EST

Posted in
GPUs

43 Comments | Add A Comment

43 Comments

In the continual progression of GPU technology, we've seen GPUs become increasingly useful at generalized tasks as they have added flexibility for game designers to implement more customized and more expansive graphical effects. What started out as a simple fixed-function rendering process, where texture and vertex data were fed into a GPU and pixels were pushed out, has evolved into a system where a great deal of processing takes place inside the GPU. The modern GPU can be used to store and manipulate data in ways that go far beyond just quickly figuring out what happens when multiple textures are mixed together.

What GPUs have evolved into today are devices that are increasingly similar to CPUs in their ability to do more things, while still specializing in only a subset of abilities. Starting with Shader Model 2.0 on cards like the Radeon 9700 and continuing with Shader Model 3.0 and today's latest cards, GPUs have become floating-point powerhouses that are able to do most floating-point calculations many times faster than a CPU, a necessity as 3D rendering is a very FP-intensive process. At the same time, we have seen GPUs add programming constructs like looping, branching, and other abilities previously only used on CPUs, but which are crucial to enable effective programmer use of the GPU resources . In short, today's GPUs have in many ways become extremely powerful floating-point processors that have been used for 3D rendering but little else.

Both ATI and NVIDIA have been looking to put the expanded capabilities of their GPUs to good use, with varying success. So far, the only types of programs that have effectively tapped this power other than applications and games requiring 3D rendering have also been video related, such as video decoders, encoders, and video effect processors. In short, the GPU has been underutilized, as there are many tasks that are floating-point hungry while not visual in nature, and these programs have not used the GPU to any large degree so far.

Meanwhile the academic world has been working on designing and utilizing custom-built floating-point hardware for years for their own research purposes. The class of hardware related to today's topic, stream processors, are extremely powerful floating-point processors able to process whole blocks of data at once, where CPUs carry out only a handful of numerical operations at a time. We've seen CPUs implement some stream processing with instruction sets like SSE and 3DNow!+, but these efforts still pale in comparison to what custom hardware has been able to do. This same progress was happening on GPUs, only in a different direction, and until recently GPUs remained untapped as anything other than a graphics tool.

Today's GPUs have evolved into their own class of stream processors, sharing much in common with the customized hardware of researchers, as a result of the 3D rendering process also being a streaming task. The key difference here however is that while GPU designers have cut a couple of corners where they don't need certain functionality for 3D rendering as compared to what a custom processor can do, by and large they have developed extremely fast stream processors that are just as fast as custom hardware but due to economies of scale are many, many times cheaper than a custom design.

It's here where ATI is looking for new ideas on what to run on their GPUs as part of their new stream computing initiative. The academic world is full of such ideas, chomping at the bit to run their experiments on more than a handful of customized hardware designs. One such application, and part of the star of today's announcement, is Folding@Home, a Stanford research project designed to simulate protein folding in order to unlock the secrets of diseases caused by flawed protein folding.

Folding@Home

PRINT THIS ARTICLE

Post Your Comment
Please log in or sign up to comment.

Comments Locked

43 Comments

View All Comments

ProviaFan - Sunday, October 1, 2006 - link
The problem with F@H scaling over multiple cores is not what one might first think. I've run F@H on my Athlon X2 system since I bought it when the X2's became available in mid-2005. Since each F@H client is single-threaded, you simply install a separate command line client for each core (the GUI client can't run more than one of itself at once), and once they are installed as Windows services, they distribute nicely over all of the available CPUs. The problem with this is that each client has its own work unit with the requisite memory requirements, which with the larger units can become significant if you must run four copies of the client to keep your quad-core system happy. The scalability issues mentioned actually involve in the difficulty in making a single client with a single unit of work multi-threaded. I'm hoping that the F@H group doesn't give up trying to make this possible, because the memory requirements will become a serious issue with large work units when quad and octo core systems become more readily available.
highlandsun - Sunday, October 1, 2006 - link
Two thoughts - first, buy more RAM, duh. But that raises a second point - I've got 4GB of RAM in my X2 system. If resource consumption is really such a problem, how is a GPU with a measly 256MB of RAM going to have a chance? How much of the performance is coming from having superfast GDDR available, what kind of slowdown do they see from having to swap data in and out of system memory?

As for Crossfire (or SLI, for that matter) why does that matter at all, these things aren't rendering a video display any more, they don't need to talk to each other at all. You should be able to plug in as many video cards as you have slots for and run them all independently.

It sounds to me like these programs are compute-bound and not memory bandwidth-bound, so you could build a machine with 32 PCIEx1 slots and toss 32 GPUs into it and have a ball.
icarus4586 - Tuesday, October 3, 2006 - link
It depends on the type of core, and the data it's working on. There's an option when you set up the client for whether or not to do "big" WUs. I've found that a "big" WU generally uses somewhere around 500MB of system memory, while "small" ones use 100MB or less. I would assume that they'd target it to graphics card memory sizes. Given that the high-end cards they're targeting have 256MB or 512MB of RAM, this should be doable.
gersson - Saturday, September 30, 2006 - link
I'm sure my PC can do some good...3.5Ghz C2D and x1900 Crossfire.
I've done some Folding @ home before but could never get into it. I'll give it a spin when the new client comes out.
Pastuch - Saturday, September 30, 2006 - link
I just wanted to say thanks to Anandtech for writing this article. I have been an avid reader for years and an overclocker. People always talk about folding in the OC scene but I never took the time to learn just what folding@home is. I had no idea it was research into Alzheimer's. I'm downloading the client right now.
Griswold - Sunday, October 1, 2006 - link
Unfortunately, many overclocked machines that are stable by the owners standards dont meet the standards of such projects. The box may run rock stable but can you vouch for the the results to be correct due to the fact that the system is running outside of its specs?

If you read the forums of these projects, you will soon see that the people running them arent too fond of overclocking. I've never seen any figures, but I bet there are many, many work units being discarded (yet you still get credit) because they're useless. However, the benefit still seems to outweight the damage. There are just so many people contributing to the project because they want to see their name on a ranking list - without caring about the actual background. I guess this can be called a win-win situation.
JarredWalton - Sunday, October 1, 2006 - link
If a WU completes - whether on an OC'ed PC or not - it is almost always a valid result. If a WU is returned as an EUE (or generates another error that apparently stems from OC'ing), then it is reassigned to other users to verify the result. Even non-OC'ed PCs will sometimes have issues on some WUs, and Standford does try to be thorough - they might even send out all WUs several times just to be safe? Anyway, if you run OC'ed and the PC gets a lot of EUEs (Early Unit Ends), it's a good indication that your OC is not stable. Memory OCs also play a significant role.
nomagic - Saturday, September 30, 2006 - link
I'd like to put some of my GPU power into some use too.
Griswold - Sunday, October 1, 2006 - link
Read the article.
Furen - Saturday, September 30, 2006 - link
Has ATI updated it at all? I dont have an ATI video card around here so I can't go check it out but from what I've seen it's was an extremely barebone application.

The GPU Advances: ATI's Stream Processing & Folding@Home

Post Your Comment

43 Comments

View All Comments

ProviaFan - Sunday, October 1, 2006 - link

highlandsun - Sunday, October 1, 2006 - link

icarus4586 - Tuesday, October 3, 2006 - link

gersson - Saturday, September 30, 2006 - link

Pastuch - Saturday, September 30, 2006 - link

Griswold - Sunday, October 1, 2006 - link

JarredWalton - Sunday, October 1, 2006 - link

nomagic - Saturday, September 30, 2006 - link

Griswold - Sunday, October 1, 2006 - link

Furen - Saturday, September 30, 2006 - link

Log in

Don't have an account? Sign up now