The GPU (Specialized Processors)
Every CPU that has ever been built has the basic instructions to do arithmetic, move data around in memory, compare and test numbers, and run programs. No CPU is smarter than any other processor in the sense that it can do something special. Processors are simply faster or slower at doing the same task. This means that your desktop, laptop, or cell phone has the instructions needed to do word processing or solve complex Physics problems. Twenty years ago every Fortune 500 corporation did all its data processing on a computer that is slower and had less memory than the cheapest system you can buy at Costco. In fact, your iPod may be more powerful.
However, a small number of problems require so much power that they can strain an ordinary computer. Decompressing the HD movie on a Blu-Ray disk fast enough to display the movie on the screen at proper speed can max out a Dual Core processor. Fortunately, there is another type of processor that can handle the problem more efficiently.
If you buy a loaf of bread, you can cut it into slices with a knife. That is fine if you want to create some thin slices and some thick slices. The bakery has a machine with dozens of evenly spaced slicers. You put the bread in the machine and it cuts it into slices in one operation. If you have to cut one slice, you need a knife. If you need to slice hundreds of loaves of bread, you need the machine.
In a restaurant, a "dishwasher" may be a guy who washes dishes or a machine where you stack the dishes and turn it on. The guy washes dishes one at a time, but he can do other jobs like cleaning the floor or taking out the garbage. The machine washes a hundred dishes at once "in parallel", but it can only perform this one job.
If you go back ten or twenty years ago, specialized processors were sold on add-in boards to perform specific operations. The sound processing chip on an audio card has always provided better and more complex audio processing than you could perform with just the CPU. When DVD movies first came out, you needed a separate board with a specialized chip to decode them because the Pentium CPU wasn't fast enough. Subsequent generations of general purpose processors could eventually solve each specific problem that justified some particular board, but specialized processors will always be faster at doing specific types of computing.
Displaying 3D video games or decoding HD TV movie streams is done more efficiently with the Graphics Processing Unit (GPU) on a modern video card. Just as Walt Disney assigned different artists to do the foreground characters, the background, and to fill in the colors, video cards just a few years back had specialized circuits for generating textures and shading. Around the time that Vista came out this approach changed, and modern video cards can have from 64 up to 800 identical "unified" processor circuits that can be assigned to perform any of the graphics tasks.
A video card can perform certain types of repetitive processing 10 or 20 times faster than even a Quad Core CPU. However, Intel did not entirely give up on their CPU design. Each new generation of CPU design includes an expanded set of instructions to do bulk processing on blocks of data. These SSE or SIMD (Single Instruction, Multiple Data) circuits speed multimedia processing for sound and video decoding. While the CPU has a little of this SIMD capability, the Graphics Processing Unit (GPU) on your video card has massively more of this type of processing capability.
Moore's Law says that the number of transistors doubles every 24 months. Twice as many circuits could double the number of current cores, but Intel and AMD are too smart to just consider the obvious. The alternative is to keep two or four current generation (Core 2 or Phenom) complex cores, and then do something entirely different with the extra circuits.
The next generation of Intel GPU (Larrabee) will have has many as 80 cores, but these cores will be similar to the original Pentium. Thus each core will be much slower than one of the modern CPU cores, but in aggregate they will have a lot of processing power for simple repetitive tasks. [Note that when just one or two Larrabee core CPUs are sold as a separate chip, the result is the Intel Atom processor that is now appearing in low cost, low power laptops and Internet devices.]
The Sony PS3 is powered by the IBM Cell processor design. It has one general purpose core that runs the operating system and programs, and then there are six bulk processing specialized cores that do the calculations for the graphics and game logic.
AMD has been talking for several years about Fusion. The idea is to create a single chip with one or two CPU cores and then a GPU style array of hundreds of specialized graphics processing circuits. This could reduce the number of chips and resulting cost of a mid-range system, or it could reduce the power cost of a laptop.
The problem is that each of these specialized designs requires customized programming, and application programmers don't have the time to do this complex work over and over again. AMD and Nvidia have released different programming interfaces and software development kits that allow applications to use some of the processing power in the video card. The big players (Intel and Microsoft) may produce a single standard interface which would be more attractive to software developers.
Nvidia "SLI" and AMD "CrossFire" allow the user to plug more than one video card into a computer to increase the number of specialized video processing circuits available for 3D computation. If one GPU has 800 unified circuits, then two video cards could bump that to 1600. This can be a very expensive solution. A more modest solution is provided by Hybrid Graphics. A Hybrid system starts with a low power graphics chip built into the mainboard. It may not be good at blasting aliens and saving the galaxy, but it is perfectly capable of running Windows (even Vista) and most of the ordinary applications. If you don't do much gaming, then you don't need a separate video card. However, if you encounter an application that needs more power then simply buy a video card and plug it into the PCI-e slot.
Until Hybrid computing, when you plugged a video card into a mainboard it took over the entire video function. You moved the monitor cable to the card, and the integrated video on the mainboard was simply ignored and replace. However, in a Hybrid system the integrated video on the mainboard remains in control of the monitor. Most of the time, when you are just running normal Windows applications or browsing the Web, the add-on video card is powered down to save energy. However, if you start up a 3D video game that requires the power, then the video card powers up and adds its processing power to the smaller number of unified processing elements that the mainboard video chip provided.
Each of these is a very good idea. Each provides specific advantages for target systems or applications. Gamers will declare a "winner" based on which architecture delivers the most powerful 3D platform, but there are other objectives (cost, power use, flexibility) that appeal to the rest of us. In general, each of these innovations has its own specific benefits.
