TFTP operates in a simple lock-step: there is only ever one packet outstanding at any time, and every packet received by either party caused one packet to be sent in reply (until the termination of the transfer). The TFTP specification said that any time any packet was received, the receiver was required to send the appropriate reply packet. Thus, the receipt of a block of data triggered the sending of an 'acknowledgement', and the receipt of an acknowledgement triggered the sending of the next data block. This may sound fairly harmless, but it led to disaster.
TFTP also, like all protocols designed to operate across an unreliable network, includes timeouts. For example, when it does something to which it expects a reply from the party at the other end (such as sending it a packet), it starts a timer, and if the timer goes off and the reply has not been received, it takes some action; usually, the response is to re-send the original packet.
SAS occurred when a packet was not lost in the internetwork, but rather simply delayed, and later successfully delivered, after a timeout had occurred (on either side).
The timeout caused a second copy of the previous packet to be generated, notionally to replace the 'lost' packet. However, the first copy was not lost, and since, according to the TFTP specification, receipt of any packet always forced the generation of a reply packet, two replies were generated (one to each copy). Those forced the generation of two replies to them, and so on. A typical scenario was as follows:
It will be seen that at this point the situation is now stable, and repeats; every packet from then on is duplicated (that is, two identical copies are sent across the internetwork).
Even worse, the increased number of packets being sent around the internetwork was likely to cause congestion, which was likely to cause a packet to be delayed past the timeout yet again, which would then cause yet another duplicate packet to be generated by a timeout, and from then on a third copy of each packet would be sent. Needless to say, at that point, the situation would usually snowball, and further copies would be generated —hence the name given to this pattern of behaviour.
For a small file, the transfer would complete, and the duplicate packets would eventually drain from the internetwork. If the file were large, however, congestive collapse would result, and only when the transfer failed would the mass of packets drain from the internetwork.
The fix to SAS was quite simple: the TFTP specification was modified to indicate that only the first instance of a received acknowledgment would cause the next data block to be sent, thus breaking the retransmission loop. In the new version of the protocol, a block would only be retransmitted on timeout.
This change also makes it possible to simplify the implementation of the receiving end (often, a bootstrap program written in a low level language) by omitting the retransmission timer, as any lost packet would cause retransmission of the last packet sent by the sender. However, keeping the timer has its benefits, such as dealing with lost ACKs more efficiently.