CAS is used to implement synchronization primitives like semaphores and mutexes, as well as more sophisticated lock-free and wait-free algorithms. Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add, and that, assuming a fairly large amount of memory, it can implement all of them .
Algorithms built around CAS typically read some key memory location and remember the old value. Based on that old value, they compute some new value. Then they try to swap in the new value using CAS, where the comparison checks for the location still being equal to the old value. If CAS indicates that the attempt has failed, it has to be repeated from the beginning: the location is re-read, a new value is computed and the CAS is tried again.
Some of these algorithms are affected by and must handle the problem of a false positive match, or the ABA problem. It's possible that between the time the old value is read and the time CAS is attempted, some other processors or threads change the memory location two or more times such that it acquires a bit pattern which matches the old value. The problem arises if this new bit pattern, which looks exactly like the old value, has a different meaning: for instance, it could be a recycled address, or a wrapped version counter.
CAS, and other atomic instructions, are unnecessary in uniprocessor systems, because the atomicity of any sequence of instructions can be achieved by disabling interrupts while executing it. However, often disabling interrupts is too expensive to be practical, so even programs only intended to run on uniprocessor machines will benefit by using them, as in the case of Linux's futexes.
In multiprocessor systems, it is impossible and undesirable to disable interrupts on all processors at the same time. Even with interrupts disabled, two or more processors could be attempting to access the same semaphore's memory at the same time. The compare-and-swap instruction allows any processor to atomically test and modify a memory location, preventing such multiple processor collisions.