The strategic and marketing need to provide for 3D calculations in the floating-point domain was especially needed by AMD. The K6 processor at the time was not well equipped for intensive floating-point mathematics in comparison to the Intel Pentium II.
The 3DNow! instruction set was created during the late 1990s when 3D graphics was exploding in popularity because of 3D gaming, and 3D games heavily use floating-point arithmetic.
Whereas earlier in the 1990s AMD could easily get by with limited floating-point performance, because the vast majority of software was integer-calculation-based, with which the K6 was extremely proficient, 3D gaming and advanced multimedia applications were quickly changing the landscape.
The first implementation of 3DNow! technology contains 21 new instructions that supported SIMD floating-point operations. The 3DNow! data format is packed, single-precision, floating-point. The 3DNow! instruction set also includes operations for SIMD integer operations, data prefetch, and faster MMX-to-floating-point switching. Later, Intel would add similar (but incompatible) instructions to the Pentium III, known as SSE.
3DNow! floating-point instructions
3DNow! performance-enhancement instructions
There is little or no evidence that the second version of 3DNow! was ever officially given its own trade name. This has led to some confusion in documentation that refers to this new instruction set. The most common terms are Extended 3DNow!, Enhanced 3DNow! and 3DNow!+. The phrase "Enhanced 3DNow!" can be found in a few locations on the AMD website but the capitalization of "Enhanced" appears to be either purely grammatical or used for emphasis on processors that may or may not have these extensions (the most notable of which references a benchmark page for the K6-III-P that does not have these extensions).
This extension to the 3DNow! instruction set was introduced with the first-generation Athlon processors. The Athlon added 5 new 3DNow! instructions and 19 new MMX instructions. Later, the K6-2+ and K6-III+ (both targeted at the mobile market) included the 5 new 3DNow! instructions, leaving out the 19 new MMX instructions. The new 3DNow! instructions were added to boost DSP. The new MMX instructions were added to boost streaming media.
3DNow! or MMX extensions?
The 19 new MMX instructions are a subset of Intel's SSE1 instruction set. In AMD technical manuals, AMD segregates these instructions apart from the 3DNow! extensions. In AMD customer product literature, however, this segregation is less clear where the benefits of all 24 new instructions are credited to enhanced 3DNow! technology. This has led programmers to come up with their own name for the 19 new MMX instructions. The most common appears to be Integer SSE (ISSE). SSEMMX and MMX2 are also found in video filter documentation from the public domain sector. [It should also be noted that ISSE could also refer to Internet SSE, an early name for SSE.]
3DNow! extension DSP instructions
MMX extension instructions (Integer SSE)
3DNow! Professional does not appear to be an extension to the 3DNow! instruction set but rather a trade name created to indicate processors that combine 3DNow! technology with a complete SSE instructions set (such as SSE1, SSE2 or SSE3). The first processor to match this description would be the Athlon XP. The Athlon XP added the remainder of the SSE1 instruction set missing from earlier Athlon processors (for the total of: 21 original 3DNow! instructions; 5 3DNow! extension DSP instructions; 19 MMX extension instructions; and 52 additional SSE instructions for complete SSE1 compatibility).
The Geode GX and Geode LX added 2 new 3DNow! instructions which are currently absent in all the other processors.
3DNow! Professional instructions unique to the Geode GX/LX
A disadvantage with 3DNow! compared to SSE is that it only stores two numbers in a register, as opposed to four in SSE. However, 3DNow! instructions can generally be executed with a lower latency and quicker throughput than SSE instructions.
3DNow! also shares the same physical registers as MMX, while SSE has its own independent registers. Because these MMX and 3DNow! registers are also used by the standard x87 FPU, 3DNow! instructions and x87 instructions cannot be executed simultaneously. However, because it is aliased to the x87 FPU, the 3DNow! & MMX register states can be saved and restored by the traditional x87 FNSAVE and FRSTR instructions. Using the pre-existing x87 instructions meant that no operating system modifications had to be made to support 3DNow!.
By contrast, to save and restore the state of SSE registers required the use of the newly added FXSAVE and FXRSTR instructions; the FX* instructions are an upgrade to the older x87 save and restore instructions because these could save not only SSE register states but also those x87 register states (hence which meant that it could save MMX and 3DNow! registers too).
On AMD Athlon XP and K8-based cores (i.e. Athlon 64), assembly programmers have noted that it is possible to actually use both 3DNow! and SSE at the same time. Although both share the same functional unit, this can allow more performance by avoiding some register pressure, but it is difficult to accomplish.