Athlon is the brand name applied to a series of different x86 processors designed and manufactured by AMD. The original Athlon, or Athlon Classic, was the first seventh-generation x86 processor and, in a first, retained the initial performance lead it had over Intel's competing processors for a significant period of time. AMD has continued the Athlon name with the Athlon 64, an eighth-generation processor featuring x86-64 (later renamed AMD64) technology.
In August 1999, AMD released the Athlon (K7) processor. Notably, the design team was led by Dirk Meyer, one of the lead engineers on the DEC Alpha project. Jerry Sanders had approached many of the engineering staff to work for AMD as DEC wound the project down, and brought in a near-complete team of engineering experts. The balance of the Athlon design team comprised AMD K5 and K6 veterans.
By working with Motorola, AMD was able to refine copper interconnect manufacturing to the production stage about one year before Intel. The revised process permitted 180-nanometer processor production. The accompanying die-shrink resulted in lower power consumption, permitting AMD to increase Athlon clockspeeds to the 1 GHz range. AMD found processor yields on the new process exceeded expectations, and delivered high speed chips in volume in March 2000.
Internally, the Athlon is a fully seventh generation x86 processor, the first of its kind. Like the AMD K5 and K6, the Athlon is a RISC microprocessor which decodes x86 instructions into its own internal instructions at runtime. The CPU is an out-of-order design, again like previous post-5x86 AMD CPUs. The Athlon utilizes the DEC Alpha EV6 bus architecture with double data rate (DDR) technology. This means that at 100 MHz the Athlon front side bus actually transfers at a rate similar to a 200 MHz single data rate bus (referred to as 200 MT/s), which was superior to the method used on Intel's Pentium III (with SDR bus speeds of 100 MHz and 133 MHz).
AMD designed the CPU with more robust x86 instruction decoding capabilities than that of K6, to enhance its ability to keep more data in-flight at once. Athlon's CISC to RISC decoder triplet could potentially decode 6 x86 operations per clock, although this was somewhat unlikely in real-world use. The critical branch predictor unit, essential to keeping the pipeline busy, was enhanced compared to what was onboard the K6. Deeper pipelining with more stages allowed higher clock speeds to be attained. Whereas the AMD K6-III+ topped out at 570 MHz due to its short pipeline, even when built on the 180 nm process, the Athlon was capable of going much higher.
AMD ended its long-time handicap with floating point x87 performance by designing a super-pipelined, out-of-order, triple-issue floating point unit. Each of its 3 units were tailored to be able to calculate an optimal type of instructions with some redundancy. By having separate units, it was possible to operate on more than one floating point instruction at once. This FPU was a huge step forward for AMD. While the K6 FPU had looked anemic compared to the Intel P6 FPU, with Athlon this was no longer the case.
The 3DNow! floating point SIMD technology, again present, received some revisions and a name change to "Enhanced 3DNow!". Additions included DSP instructions and an implementation of the extended MMX subset of Intel SSE.
CPU Caching onboard Athlon consisted of the typical two levels. Athlon was the first x86 processor with a 128 KB split level 1 cache; a 2-way associative, later 16-way, cache separated into 2×64 KB for data and instructions (Harvard architecture). This cache was double the size of K6's already large 2×32 KB cache, and quadruple the size of Pentium II and III's 2×16 KB L1 cache. The initial Athlon (Slot A, later renamed Athlon Classic) used 512 KB of level 2 cache separate from the CPU, on the processor cartridge board, running at 50% to 33% of core speed. This was done because the 250 nm manufacturing processes was too large to allow for on-die cache while maintaining cost-effective die size. Later Athlon CPUs, afforded greater transistor budgets by smaller 180 nm and 130 nm process nodes, moved to on-die L2 cache at full CPU clock speed.
Athlon Classic is a cartridge-based processor. The design, called Slot A, was quite similar to Intel's Slot 1 cartridge used for Pentium II and Pentium III; actually it used mechanically the same slot part as competing Intel CPUs (allowing motherboard manufacturers to save on costs) but reversed "upside-down" to prevent users putting in wrong CPUs (as they were completely signal incompatible). The cartridge allowed use of higher speed cache memory than is possible to put on the motherboard. Like Pentium II and the "Katmai"-core Pentium III, Athlon Classic used a 512 KB secondary cache. This cache, again like its competitors, ran at a fraction of the core clock rate and had its own 64-bit bus, called a "backside bus" that allowed concurrent system front side bus and cache accesses. Initially the L2 cache was set for half of the CPU clock speed, on up to 700 MHz Athlon CPUs. Faster Slot-A processors were forced to compromise with cache clock speed and ran at 2/5 (up to 850 MHz) or 1/3 (up to 1 GHz). The SRAM available at the time was incapable of matching the Athlon's clock scalability, due both to cache chip technology limitations and electrical/cache latency complications of running an external cache at such a high speed.
The Slot-A Athlons were the first multiplier-locked CPUs from AMD. This was partly done to hinder CPU remarking being done by questionable resellers around the globe. AMD's older CPUs could simply be set to run at whatever clock speed the user chose on the motherboard, making it trivial to relabel a CPU and sell it as a faster grade than it was originally intended. These relabeled CPUs were not always stable, being overclocked and not tested properly, and this was damaging to AMD's reputation. Although the Athlon was multiplier locked, crafty enthusiasts eventually discovered that a connector on the PCB of the cartridge could control the multiplier. Eventually a product called the "Goldfingers device" was created that could unlock the CPU, named after the gold connector pads on the processor board that it attached to.
In commercial terms, the Athlon Classic was an enormous success — not just because of its own merits, but also because the normally dependable Intel endured a series of major production, design, and quality control issues at this time. In particular, Intel's transition to the 180 nm production process, starting in late 1999 and running through to mid-2000, suffered delays. There was a shortage of Pentium III parts. In contrast, AMD enjoyed a remarkably smooth process transition and had ample supplies available, causing Athlon sales to become quite strong.
The second generation Athlon, the Thunderbird, debuted on June 5, 2000. This version of the Athlon shipped in a more traditional pin-grid array (PGA) format that plugged into a socket ("Socket A") on the motherboard (it also shipped in the slot A package). It was sold at speeds ranging from 600 MHz to 1400 MHz. The major difference, however, was cache design. Just as Intel had done when they replaced the old Katmai Pentium III with the much faster Coppermine P-III, AMD replaced the 512 KB external reduced-speed cache of the Athlon Classic with 256 KB of on-chip, full-speed exclusive cache. As a general rule, more cache improves performance, but faster cache improves it further still.
AMD changed cache design significantly with Thunderbird. With the older Athlon CPUs, the CPU caching was of an inclusive design where data from the L1 is duplicated in the L2 cache. Thunderbird moved to an exclusive design where the L1 cache's contents are not duplicated in the L2. This increases total cache size of the processor and effectively makes caching behave as if there is a very large L1 cache with a slower region (the L2) and a very fast region (the L1). Because of Athlon's very large L1 cache and the exclusive design which turns the L2 cache into basically a "victim cache", the need for high L2 performance and size was lessened. AMD kept the 64-bit L2 cache data bus from the older Athlons, as a result, and allowed it to have a relatively high latency. A simpler L2 cache reduced the possibility of the L2 cache causing clock scaling and yield issues. Still, instead of the 2-way associative scheme used in older Athlons, Thunderbird did move to a more efficient 16-way associative layout.
The Thunderbird was AMD's most successful product since the Am386DX-40 ten years earlier. Mainboard designs had improved considerably by this time, and the initial trickle of Athlon mainboard makers had swollen to include every major manufacturer. Their new fab in Dresden came online, allowing further production increases, and the process technology was improved by a switch to copper interconnects. In October 2000 the Athlon "C" was introduced, raising the mainboard front side bus speed to 133 MHz (266 MT/s) and providing roughly 10% extra performance per clock over the "B" model Thunderbird.
AMD released the third major Athlon version on October 9, 2001, code-named "Palomino", and named it Athlon XP. The Athlon XP was marketed using a PR system, which compared its performance to an Athlon with the "Thunderbird" core. Athlon XP was introduced at speeds between 1333 MHz and 1533 MHz, with ratings from 1500+ to 1800+. At launch, the new core allowed AMD to take the x86 performance lead with the 1800+ model, and enhance that lead with the release of the 1600 MHz 1900+ less than a month later. The "XP" suffix is interpreted to mean eXtreme Performance and also as an unofficial reference to Windows XP.
Palomino was the first K7 core to include the full SSE instruction set from the Intel Pentium III as well as AMD's 3DNow! Professional. It is roughly 10% faster than Thunderbird at the same clock speed, thanks in part to the new SIMD functionality and to several additional improvements. The core has enhancements to the K7's TLB architecture and the addition of a hardware data prefetch mechanism to better take advantage of available memory bandwidth.
Changes in core layout result in Palomino being more frugal with its electrical demands, consuming approximately 20% less power than its predecessor, and thus reducing heat output comparatively as well. While Athlon "Thunderbird" was near its clock ceiling at 1400 MHz, changes to Palomino's transistor layout and the reduction in power demands allowed it to continue increasing clock speed even at the same 180 nm manufacturing process node and core voltage.
The "Palomino" was actually first released as a mobile version, called the Mobile Athlon 4 (codenamed "Corvette"). Palomino was also available in a form that officially supports dual processing, known as Athlon MP.
The fourth-generation Athlon, the Thoroughbred, was released 10 June 2002 at 1.8 GHz, or 2200+ on the PR system. The "Thoroughbred" core marked AMD's first production 130 nm silicon, resulting in a significant reduction in die size compared to its 180 nm predecessor.
There are two versions of this core, commonly called A and B. The A version was introduced at 1800 MHz, and had some heat and design issues that held its clock scalability back. In fact, AMD wasn't able to increase its clock above Palomino's top grades. Because of this, it was only sold in versions from 1333 MHz to 1800 MHz, replacing the larger Palomino core. The B version of Thoroughbred has an additional metal layer to improve its ability to reach higher clock speeds. It launched at higher clock speeds.
Other than the new manufacturing process, the Thoroughbred design was largely the same as the "Palomino". The Thoroughbred line received an increased front side bus clock during its lifetime, up to 333 MT/s from 266 MT/s. This improved the processor's memory and I/O access efficiency, and improved per-clock performance as a result. AMD shifted their PR rating scheme accordingly, making lower clock speeds equate to higher PR ratings.
By the time of Barton's release, the "Northwood" Pentium 4 had become more than competitive with AMD's processors. Unfortunately, due to the architecture of AMD's processor caches, an L2 cache increase to 512 KB did not have nearly the same impact as it did to Intel's line. Only an increase of several percent was gained in per-clock performance. The PR rating became somewhat inaccurate because some Barton models with lower clock rate weren't consistently outperforming their higher-clocked Thoroughbred predecessors with lower ratings.
The other improvement, a higher 400 MT/s bus clock, helped Barton gain some more efficiency. However, it was clear by this time that Intel's quad-pumped bus was scaling well above AMD's double-pumped EV6 bus. The 800 MT/s Pentium 4 bus was well out of Athlon's reach. In order to reach the same bandwidth levels, the Athlon bus would have to be clocked at levels simply unreachable.
The K7 architecture had scaled to its limit. Maintaining performance equivalence with Intel's improving processors would require a significant redesign. AMD would soon launch Athlon 64.
Barton (130 nm)
Thorton (130 nm)
Mobile Athlon XPs (Athlon XP-M) are identical to normal Athlon XPs, apart from running at lower voltages, often lower bus speeds, and not being multiplier-locked. The lower Vcore rating caused the CPU to have lower power consumption (ideal for battery-powered laptops) and lower heat production. Athlon XP-M CPUs also have a higher-rated heat tolerance, a requirement of the tight conditions within a notebook PC.
The Athlon XP-M replaced the older Mobile Athlon 4. The Mobile Athlon 4 used the older Palomino core, while the Athlon XP-M used the newer Thoroughbred and Barton cores. Some specialized low-power Athlon XP-Ms utilize the microPGA socket 563 rather than the standard Socket A.
The CPUs, like their mobile K6+ predecessors, were also capable of dynamic clock adjustment for power optimization. When the system is idle, the CPU clocks itself down through a lower bus multiplier and also reduces its voltage. Then, when a program demands more computational resources, the CPU very quickly (there is some latency) returns to intermediate or maximum speed to meet the demand. This technology was marketed as "PowerNow!". It was similar to Intel's SpeedStep power saving technique. The feature was controlled by the CPU, motherboard BIOS, and operating system. AMD later renamed the technology to Cool'n'Quiet, on their K8-based CPUs (Athlon 64, etc), and re-imagined it for use on desktop PCs as well.
Athlon XP-Ms were popular with desktop overclockers, as well as underclockers. The lower voltage requirement and higher heat rating resulted in CPUs that were basically "cherry picked" from the manufacturing line. Being the best of the cores off the line, the CPUs typically were more reliably overclocked than their desktop-headed counterparts. Also, the fact that they weren't locked to a single multiplier was a significant simplification for the overclocking process. Some Barton core Athlon XP-Ms have been successfully overclocked to as high as 3.1 GHz.
As stated, the chips were also liked for their underclocking ability. Underclocking is a process of determining the lowest Vcore at which a CPU can remain stable at for a given clock speed. The Athlon XP-M CPUs were capable of running lower voltages per clock rate compared to their desktop siblings. As such, the chips were used in home theater PC systems due to their high performance and low heat output at low Vcore settings.