More technically it is a synchronization mechanism which can sometimes be used as an alternative to a readers-writer lock. It allows extremely low overhead, wait-free reads. However, RCU updates can be expensive, as they must leave the old versions of the data structure in place to accommodate pre-existing readers. These old versions are reclaimed after all pre-existing readers finish their accesses. RCU can be put to a number of other tasks, including dynamically changing NMI (Non-Maskable interrupt) handlers and implementing lazy barriers, but it is most frequently used as replacement for reader-writer locking.
RCU is available in a number of operating systems, including version 2.6 of the Linux kernel. The technique is covered by U.S. software patent 5,442,758, issued August 15, 1995 and assigned to Sequent Computer Systems, as well as by 5,608,893, 5,727,528, 6,219,690, and 6,886,162. The now-expired US Patent 4,809,168 covers a closely related technique. RCU is also the topic of one claim in the SCO v. IBM lawsuit.
Splitting an update into removal and reclamation phases allows the updater to perform the removal phase immediately, and to defer the reclamation phase until all readers active during the removal phase have completed, either by blocking until they finish or by registering a callback that is invoked after they finish. Only readers that are active during the removal phase need be considered, because any reader starting after the removal phase will be unable to gain a reference to the removed data items, and therefore cannot be disrupted by the reclamation phase.
So the typical RCU update sequence goes something like the following:
kfree in the Linux kernel). Step (2) above is the key idea underlying RCU's deferred destruction. The ability to wait until all readers are done allows RCU readers to use much lighter-weight synchronization — in some cases, absolutely no synchronization at all. In contrast, in more conventional lock-based schemes, readers must use heavy-weight synchronization in order to prevent an updater from deleting the data structure out from under them. This is because lock-based updaters typically update data items in place, and must therefore exclude readers. In contrast, RCU-based updaters typically take advantage of the fact that writes to single aligned pointers are atomic on modern CPUs, allowing atomic insertion, removal, and replacement of data items in a linked structure without disrupting readers. Concurrent RCU readers can then continue accessing the old versions, and can dispense with the atomic operations, memory barriers, and cache misses that are so expensive on modern SMP computer systems, even in absence of lock contention.
In the three-step procedure shown above, the updater is performing both the removal and the reclamation step, but it is often helpful for an entirely different thread to do the reclamation, as is in fact the case in the Linux kernel's directory-entry cache (dcache). Even if the same thread performs both the update step (step (1) above) and the reclamation step (step (3) above), it is often helpful to think of them separately. For example, RCU readers and updaters need not communicate at all, but RCU provides implicit low-overhead communication between readers and reclaimers, namely, in step (2) above.
The Linux kernel contains numerous other RCU-related APIs, and these may be found elsewhere
synchronize_rcu will not necessarily wait for any subsequent RCU read-side critical sections to complete. For example, consider the following sequence of events:
CPU 0 CPU 1 CPU 2
----------------- ------------------------- ---------------
1. rcu_read_lock()
2. enters synchronize_rcu()
3. rcu_read_lock()
4. rcu_read_unlock()
5. exits synchronize_rcu()
6. rcu_read_unlock()
To reiterate, synchronize_rcu waits only for ongoing RCU read-side critical sections to complete, not necessarily for any that begin after synchronize_rcu is invoked.
Of course, synchronize_rcu does not necessarily return immediately after the last pre-existing RCU read-side critical section completes. For one thing, there might well be scheduling delays. For another thing, many RCU implementations process requests in batches in order to improve efficiencies, which can further delay synchronize_rcu.
Since synchronize_rcu is the API that must figure out when readers are done, its implementation is key to RCU. For RCU to be useful in all but the most read-intensive situations, synchronize_rcu's overhead must also be quite small.
The call_rcu API is a callback form of synchronize_rcu, and is described in more detail in a later section. Instead of blocking, it registers a function and argument which are invoked after all ongoing RCU read-side critical sections have completed. This callback variant is particularly useful in situations where it is illegal to block.
rcu_dereference to fetch an RCU-protected pointer, which returns a value that may then be safely dereferenced. Note that rcu_deference does not actually dereference the pointer, instead, it protects the pointer for later dereferencing. It also executes any needed memory-barrier instructions for a given CPU architecture. Currently, only DEC Alpha needs memory barriers within rcu_dereference; on other CPUs, it compiles to nothing, not even a compiler directive.Note that the value returned by rcu_dereference is valid only within the enclosing RCU read-side critical section. For example, the following is not legal:
rcu_read_lock(); p = rcu_dereference(head.next); rcu_read_unlock(); x = p->address; rcu_read_lock(); y = p->data; rcu_read_unlock();
Holding a reference from one RCU read-side critical section to another is just as illegal as holding a reference from one lock-based critical section to another! Similarly, using a reference outside of the critical section in which it was acquired is just as illegal as doing so with normal locking.
As with rcu_assign_pointer, an important function of rcu_dereference is to document which pointers are protected by RCU.
The RCU infrastructure observes the time sequence of rcu_read_lock, rcu_read_unlock, synchronize_rcu, and call_rcu invocations in order to determine when (1) synchronize_rcu invocations may return to their callers and (2) call_rcu callbacks may be invoked. Efficient implementations of the RCU infrastructure make heavy use of batching in order to amortize their overhead over many uses of the corresponding APIs.
void rcu_read_lock(void) { } void rcu_read_unlock(void) { }
void synchronize_rcu(void)
{
int cpu;
for_each_cpu(cpu)
run_on(cpu);
}
You can ignore rcu_assign_pointer and rcu_dereference without missing much. But here they are anyway. And whatever you do, don't forget about them when submitting Linux-kernel patches that make use of RCU!
#define rcu_assign_pointer(p, v) ({
smp_wmb();
(p) = (v);
}) #define rcu_dereference(p) ({
typeof(p) _________p1 = p;
smp_read_barrier_depends();
(_________p1);
})
Note that rcu_read_lock and rcu_read_unlock do absolutely nothing. This is the great strength of classic RCU in a non-preemptive kernel: read-side overhead is precisely zero, at least on non-Alpha CPUs. And there is absolutely no way that rcu_read_lock can participate in a deadlock cycle, cause a realtime process to miss its scheduling deadline, precipitate priority inversion, or result in high lock contention.The implementation of synchronize_rcu simply schedules itself on each CPU in turn. This guarantees that all readers of the old version of the data structure have completed, because it is illegal to block while in an RCU read-side critical section. Therefore, if a given CPU executes a context switch (to schedule another process), we know that it must have completed all preceding RCU read-side critical sections. Once all CPUs have executed a context switch, then all preceding RCU read-side critical sections will have completed.
For example, suppose that we remove a data item from its structure and then invoke synchronize_rcu. Once synchronize_rcu returns, we are guaranteed that there are no RCU read-side critical sections holding a reference to that data item, so we can safely reclaim it.
The differences between the two approaches are quite small. Read-side locking moves to1 struct el { 1 struct el {2 struct list_head lp; 2 struct list_head lp;3 long key; 3 long key;4 spinlock_t mutex; 4 spinlock_t mutex;5 int data; 5 int data;6 /* Other data fields */ 6 /* Other data fields */7 }; 7 };8 DEFINE_RWLOCK(listmutex); 8 DEFINE_SPINLOCK(listmutex);9 LIST_HEAD(head); 9 LIST_HEAD(head);1 int search(long key, int *result) 1 int search(long key, int *result)2 { 2 {3 struct el *p; 3 struct el *p;4 45 read_lock(&listmutex); 5 rcu_read_lock();6 list_for_each_entry(p, &head, lp) { 6 list_for_each_entry_rcu(p, &head, lp) {7 if (p->key == key) { 7 if (p->key == key) {8 *result = p->data; 8 *result = p->data;9 read_unlock(&listmutex); 9 rcu_read_unlock();10 return 1; 10 return 1; 11 } 11 } 12 } 12 } 13 read_unlock(&listmutex); 13 rcu_read_unlock(); 14 return 0; 14 return 0; 15 } 15 }1 int delete(long key) 1 int delete(long key)2 { 2 {3 struct el *p; 3 struct el *p;4 45 write_lock(&listmutex); 5 spin_lock(&listmutex);6 list_for_each_entry(p, &head, lp) { 6 list_for_each_entry(p, &head, lp) {7 if (p->key == key) { 7 if (p->key == key) {8 list_del(&p->lp); 8 list_del_rcu(&p->lp);9 write_unlock(&listmutex); 9 spin_unlock(&listmutex);10 synchronize_rcu();10 kfree(p); 11 kfree(p); 11 return 1; 12 return 1; 12 } 13 } 13 } 14 } 14 write_unlock(&listmutex); 15 spin_unlock(&listmutex); 15 return 0; 16 return 0; 16 } 17 }
rcu_read_lock and rcu_read_unlock, update-side locking moves from a reader-writer lock to a simple spinlock, and a synchronize_rcu precedes the kfree.However, there is one potential catch: the read-side and update-side critical sections can now run concurrently. In many cases, this will not be a problem, but it is necessary to check carefully regardless. For example, if multiple independent list updates must be seen as a single atomic update, converting to RCU will require special care.
Also, the presence of synchronize_rcu means that the RCU version of delete can now block. If this is a problem, there is a callback-based mechanism that never blocks, namely call_rcu, that can be used in place of synchronize_rcu.
When the thread which made the copy is awakened by the kernel, it can safely deallocate the old structure.
So the structure is read concurrently with a thread copying in order to do an update, hence the name "read-copy update". The abbreviation "RCU" was one of many contributions by the Linux community. Other names for similar techniques include passive serialization and MP defer by VM/XA programmers and generations by K42 and Tornado programmers.
A number of researchers, including Van Jacobson, Aju John, and Richard Rashid have proposed deferring reclamation for a fixed period of time, an approach that might be useful in hard real time software.