When done with floating point numbers it might be performed with two roundings (typical in many DSPs) or with a single rounding. When performed with a single rounding, it is called a fused multiply-add (FMA) or fused multiply-accumulate (FMAC).
Modern computers may contain a dedicated multiply-accumulate unit, or "MAC-unit", consisting of a multiplier implemented in combinational logic followed by an adder and an accumulator register which stores the result when clocked. The output of the register is fed back to one input of the adder, so that on each clock the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than the method of shifting and adding typical of earlier computers. The first processors to be equipped with MAC-units were digital signal processors, but the technique is now common in general-purpose processors too.
Therefore, it makes a difference to the result whether the multiply-add is performed with two roundings, or in one operation with a single rounding. When performed with a single rounding, the operation is termed a fused multiply-add.
When implemented in a microprocessor, this is typically faster than a multiply operation followed by an add. Because of this instruction there is no need for a hardware divide or square root unit, since they can both be implemented efficiently in software using the FMA.
A fast FMA can speed up and improve the accuracy of many computations which involve the accumulation of products:
The 1999 standard of the C programming language supports the FMA operation through the
fma standard math library function.
FMA capability is also present in the NVIDIA GeForce 200 Series (GTX 200) and NVIDIA Tesla T10 computing GPU processors. A fused multiply-add is implemented on the SPARC64, PowerPC, PA-RISC (PA-8000 and above) and Itanium processors and will be implemented in AMD processors with SSE5 instruction set support. Intel plans to implement FMA in its 'Haswell' chip, due sometime in 2012.
"System and Method of Bypassing Unrounded Results in a Multiply-Add Pipeline Unit" in Patent Application Approval Process
Oct 04, 2012; By a News Reporter-Staff News Editor at Politics & Government Week -- A patent application by the inventors Brooks, Jeffrey S....