AMD reinvents the x86

06.02.2007
AMD's next-generation processor line, code-named Torrenza, has gone from a block diagram to living, breathing silicon. The first incarnation of AMD's redesigned x86 CPU is Barcelona, that which your non-co-readers will call quad-core Opteron. Barcelona is genius, a genuinely new CPU that frees itself entirely of the millstone of the Pentium legacy. It'll do the same for you.

Each of Barcelona's four cores incorporates a new vector math unit referred to as SSE128 (128-bit streaming single-instruction-multiple-data extensions). I am aware that you only do quantum physics on weekends, but the potential for hardcore IT tasks such as encryption, compression, real-time analysis of high volumes of streaming business transactions, and wire-speed packet analysis is also the stuff of science fiction. Barcelona gives floating point operations their own schedulers (checkout lanes) and runs them twice as fast as 64-bit SSE did. AMD claims that Barcelona's per-core floating point performance is more than 80 percent faster than the present Opteron. Benchmark that. And separating integer and floating-point schedulers also accelerates this thing called virtualization, which you may notice is a recurring theme for Barcelona.

Nested paging tables is a per-core feature that will light the afterburners on x86 hardware virtualization. A paging table holds the map that translates virtual memory addresses to physical memory addresses, and each CPU core has only one. Virtual machines have to load and store their page tables as they get and lose their slice of the CPU. AMD solved the problem with nested paging tables. Simplified, each VM maintains its own paging table that stays fixed in place. Instead of loading and saving paging tables as your system flips from VM to VM, your system just supplies Barcelona with the ID of the virtual machine being activated. The CPU core flips page tables automatically and transparently. This is another feature that's implemented for each core.

Much fuss has been made about power efficiency, but the best of x86 power saving schemes is crude. They adjust the clock speed and the operating voltage of the entire CPU, and the selection of set points is small. Barcelona keeps this technique, but builds on it with inspiration from IBM and Transmeta. Barcelona blacks out power to individual portions of the chip that are idled, from in-core execution units to on-die bus controllers. This hasn't made it into PCs before because it's very difficult to manage light switches for several 'rooms' individually and to make sure that, like a refrigerator light, whenever a door is opened, the light is on as if it's been burning the whole time. Power savings from these schemes are dramatic. If Barcelona lacked this feature, it would still be a green CPU.

Unlike Intel's Core, Barcelona gives each core dedicated L2 cache, and Barcelona incorporates a redesign that reduces cache latency (access delays). Barcelona adds Level 3 cache, a newcomer to the x86 and a page out of IBM's POWER playbook. All four CPU cores in a Barcelona socket will share a single master catalog of recently-retrieved data. A three-level cache is a must-have for a multicore CPU, and that becomes obvious when you get a demo that switches L3 on and off.

Barcelona is a new CPU, not a doubling of cores and not extensions strapped on here and there. Get ready to be blown away long before its release, which is scheduled for midyear.