Barcelona: the technical lowdown
AMD has launched its Barcelona range of quad-core Opterons. But what new technologies do the processors bring to the market? PC Pro’s deputy editor, David Fearon, explains.
The Barcelona chip is based on the 65nm fabrication process, and launches with variants in both 95W and 120W power envelopes. The maximum clock speed at launch is 2GHz, but faster processors will appear later this year.
This makes them not only pin-compatible with existing Rev F Opterons, but thermally compatible as well: AMD claims the only thing needed to upgrade a current Rev F dual-core system with the new quad-core parts is a BIOS update.
Aside from its four “native” cores, Barcelona’s biggest architectural departure is in its cache architecture. It’s the first chip design in recent years to sport both the now-familiar level 2 cache – with 512MB per core – and an extra stage of 2MB level 3 cache.
AMD dubs this arrangement “balanced smart cache”. The balanced side of the equation come from the fact that each core’s level 2 cache is dedicated to that core and can’t be shared, whereas the level 3 is distributed across the four cores. This is in contrast to Intel’s arrangement with its Core microarchitecture CPUs, which have just one monolithic slab of level 2 cache shared between all cores.
The dedicated level 2 complement of each core is a key area that AMD claims leads to enhanced performance over the competition when it comes to multithreaded applications. With Intel’s system each core competes for cache which, AMD claims, means threads can experience “cache starvation” if one core has managed to grab all or most of the available complement.
The flip side of the coin is that single-threaded performance can suffer when the maximum available complement of level 2 cache any core can use is 512KB, compared to Intel’s 4MB per pair of cores.
Under the general banner of Memory Optimizer Technology, AMD claims to have increased memory bandwidth by 40% using various techniques. Chief among these is the DRAM pre-fetcher, which speculatively loads instructions from main memory before they’re required.
Intel has a similar trick, but according to an AMD spokesman, “we pre-fetch into a buffer where they [Intel] pre-fetch into level 2 cache.” By keeping pre-fetched instructions in a separate buffer, AMD claims the impact of mis-prediction and “cache pollution” is minimised, reducing the need to flush and refill the cache.
Virtualisation is very high on the server-side agenda, and as well as the advantage from the level 2 cache layout, AMD claims other enhancements to improve performance in this area. Its “nested paging” technique brings more of the memory management of virtual machines into hardware, and is said to reduce context-switching – the ability of the CPU to switch from execution of one virtual machine to the next – by 25%.
With Barcelona, AMD continues its method of integrating the memory controller into the chip itself, eliminating the need for a separate MCH (memory controller hub). This route has been vindicated by Intel’s admission that it will be going for a similar approach with its Nehalem processor.
AMD, however, is sticking with DDR2 memory rather than going with the FBDIMMs that Intel’s server platforms now require. AMD claims there are “enormous power and heat penalties for memory capacity using FBDIMM”, quoting a typical power consumption figure of 83W at idle for 8 FBDIMMs, as against 14W for the same number of DDR2 DIMMs.