Locality of reference

Wikipedia, the free encyclopedia - Cite This Source

In computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are three basic types of locality of reference: temporal, spatial and sequential:Temporal locality

Here a resource that is referenced at one point in time is referenced again soon afterwards. Spatial locality
Here the likelihood of referencing a storage location is greater if a storage location near it has been recently referenced.Sequential locality
Here storage is accessed sequentially, in descending or ascending order.

Programs and systems which exhibit locality exhibit predictable behavior, and thus, provide opportunities for designers to improve performance by through prefetching, precomputing, and caching of code and data for future use.

Reasons for occurrence

Locality occurs often because of the way in which computer programs are created. Generally, related data is stored in nearby locations in storage. One common pattern in computing involves the processing of several items, one at a time. This means that if a lot of processing is done, the single item will be accessed more than once, thus leading to temporal locality of reference. Furthermore, moving to the next item implies that the next item will be read, hence spatial locality of reference, since memory locations are typically read in batches.

Locality often occurs because code contains loops that tend to reference arrays or other data structures by indices.

Increasing and exploiting locality of reference are common techniques for optimization. This can happen on several levels of the memory hierarchy. Paging obviously benefits from spatial locality. A cache is a simple example of exploiting temporal locality, because it is a specially designed faster but smaller memory area, generally used to keep recently referenced data and data near recently referenced data, which can lead to potential performance increases. Data in cache does not necessarily correspond to data that is spatially close in main memory; however, data elements are brought into cache one cache line at a time. This means that spatial locality is again important: if one element is referenced, a few neighboring elements will also be brought into cache. Finally, temporal locality plays a role on the lowest level, since results that are referenced very closely together can be kept in the machine registers. Programming languages such as C allow the programmer to suggest that certain variables are kept in registers.

Memory Locality

Memory locality (or data locality) is a term in computer science used to denote the temporal or spatial proximity of memory access in computer programs. There are two basic types of locality of memory reference: temporal and spatial:Temporal locality
The concept that a memory location that is referenced by a program at one point in time will be referenced again sometime in the near future. Spatial locality
The concept that likelihood of referencing a memory location by a program is higher if a memory location near it was just referenced.

Data locality is a typical memory reference feature of regular programs (though many irregular memory access patterns exist). It makes the hierarchical memory layout profitable. In computers, memory is divided up into a hierarchy in order to speed up data accesses. The lower levels of the memory hierarchy tend to be slower, but larger. Thus, a program will achieve greater performance if it uses memory while it is cached in the upper levels of the memory hierarchy and avoids bringing other data into the upper levels of the hierarchy that will displace data that will be used shortly in the future. This is an ideal, and sometimes cannot be achieved.

Typical memory hierarchy (access times and cache sizes are approximations of typical values used as of 2006 for the purpose of discussion; actual values and actual numbers of levels in the hierarchy vary):

  • CPU registers (8-32 registers) – immediate access (0-1 clock cycles)
  • L1 CPU caches (32 KiB to 128 KiB) – fast access (3 clock cycles)
  • L2 CPU caches (128 KiB to 4 MiB) – slightly slower access (10 clock cycles)
  • Main physical memory (RAM) (256 MiB to 4 GiB) – slow access (100 clock cycles)
  • Disk (virtual memory and file system) (1 GiB to 1 TiB) – very slow (10,000,000 clock cycles)
  • Remote Memory (such as other computers or the Internet) (Practically unlimited) – speed varies

Modern machines tend to read blocks of lower memory into the next level of the memory hierarchy. If this displaces used memory, the operating system tries to predict which data will be accessed least (or latest) and move it down the memory hierarchy. Prediction algorithms tend to be simple to reduce hardware complexity, though they are becoming somewhat more complicated.

A common example is matrix multiplication:

for i in 0..n
  for j in 0..m
    for k in 0..p
      C[i][j] = C[i][j] + A[i][k] * B[k][j];

When dealing with large matrices, this algorithm tends to shuffle data around too much. Since memory is pulled up the hierarchy in consecutive address blocks, in the C programming language it would be advantageous to refer to several memory addresses that share the same row (spatial locality). By keeping the row number fixed, the second element changes more rapidly. In C and C++, this means the memory addresses are used more consecutively. One can see that since j affects the column reference of both matrices C and B, it should be iterated in the innermost loop (this will fix the row iterators, i and k, while j moves across each column in the row). This will not change the mathematical result, but it improves efficiency. By switching the looping order for j and k, the speedup in large matrix multiplications becomes dramatic. (In this case, 'large' means, approximately, more than 100,000 elements in each matrix, or enough addressable memory such that the matrices will not fit in L1 and L2 caches.)

Temporal locality can also be improved in the above example by using a technique called blocking. The larger matrix can be divided into evenly-sized sub-matrices, so that the smaller blocks can be referenced (multiplied) several times while in memory.

for (ii = 0; ii < SIZE; ii += BLOCK_SIZE)
  for (kk = 0; kk < SIZE; kk += BLOCK_SIZE)
    for (jj = 0; jj < SIZE; jj += BLOCK_SIZE)
      for (i = ii; i < ii + BLOCK_SIZE && i < SIZE; i++)
        for (k = kk; k < kk + BLOCK_SIZE && k < SIZE; k++)
          for (j = jj; j < jj + BLOCK_SIZE && k < SIZE; j++)
            C[i][j] = C[i][j] + A[i][k] * B[k][j];

The temporal locality of the above solution is provided because a block can be used several times before moving on, so that it is moved in and out of memory less often. Spatial locality is improved because elements with consecutive memory addresses tend to be pulled up the memory hierarchy together.

See also

References

  • P.J. Denning, S.C. Schwartz, Communications of the ACM, Volume 15 , Issue 3 (March 1972), Pages:191-198

Bibliography



Wikipedia, the free encyclopedia © 2001-2006 Wikipedia contributors (Disclaimer)
This article is licensed under the GNU Free Documentation License.
Last updated on Sunday March 09, 2008 at 14:08:32 PDT (GMT -0700)
View this article at Wikipedia.org - Edit this article at Wikipedia.org - Donate to the Wikimedia Foundation