Grid computing evolved from the parallel processing systems of the 1970s, the large-scale cluster computing systems of the 1980s, and the distributed processing systems of the 1990s, and is often referred to by these names. Grid computing can make a more cost-effective use of computer resources, can be applied to solve problems that require large amounts of computing power, and may be the forerunner of pervasive computing—computer applications that pervade our environment without our being aware of their presence.
See A. S. Tanenbaum and M. van Steen, Distributed Systems (2001); F. Berman, G. Fox, and A. J. G. Hey, Grid Computing (2003); A. Abbas, Grid Computing (2003).
What distinguishes grid computing from typical cluster computing systems is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed. Also, while a computing grid may be dedicated to a specialized application, it is often constructed with the aid of general purpose grid software libraries and middleware.
"Distributed" or "grid" computing in general is a special type of parallel computing which relies on complete computers (with onboard CPU, storage, power supply, network interface, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high-speed computer bus.
The primary advantage of distributed computing is that each node can be purchased as commodity hardware, which when combined can produce similar computing resources to a multiprocessor supercomputer, but at lower cost. This is due to the economies of scale of producing commodity hardware, compared to the lower efficiency of designing and constructing a small number of custom supercomputers. The primary performance disadvantage is that the various processors and local storage areas do not have high-speed connections. This arrangement is thus well-suited to applications in which multiple parallel computations can take place independently, without the need to communicate intermediate results between processors.
The high-end scalability of geographically dispersed grids is generally favorable, due to the low need for connectivity between nodes relative to the capacity of the public Internet.
There are also some differences in programming and deployment. It can be costly and difficult to write programs so that they can be run in the environment of a supercomputer, which may have a custom operating system, or require the program to address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of "grid" infrastructure can allow conventional, standalone programs to run on multiple machines (but each given a different part of the same problem). This makes it possible to write and debug on a single conventional machine, and eliminates complications due to multiple instances of the same program running in the same shared memory and storage space at the same time.
One feature of distributed grids is that they can be formed from computing resources belonging to multiple individuals or organizations (known as multiple administrative domains). This can facilitate commercial transactions, as in utility computing, or make it easier to assemble volunteer computing networks.
One disadvantage of this feature is that the computers which are actually performing the calculations might not be entirely trustworthy. The designers of the system must thus introduce measures to prevent malfunctions or malicious participants from producing false, misleading, or erroneous results, and from using the system as an attack vector. This often involves assigning work randomly to different nodes (presumably with different owners) and checking that at least two different nodes report the same answer for a given work unit. Discrepancies would identify malfunctioning and malicious nodes.
Due to the lack of central control over the hardware, there is no way to guarantee that nodes will not drop out of the network at random times. Some nodes (like laptops or dialup Internet customers) may also be available for computation but not network communications for unpredictable periods. These variations can be accommodated by assigning large work units (thus reducing the need for continuous network connectivity) and reassigning work units when a given node fails to report its results as expected.
The impacts of trust and availability on performance and development difficulty can influence the choice of whether to deploy onto a dedicated computer cluster, to idle machines internal to the developing organization, or to an open external network of volunteers or contractors.
In many cases, the participating nodes must trust the central system not to abuse the access that is being granted, by interfering with the operation of other programs, mangling stored information, transmitting private data, or creating new security holes. Other systems employ measures to reduce the amount of trust "client" nodes must place in the central system such as placing applications in virtual machines.
Public systems or those crossing administrative domains (including different departments in the same organization) often result in the need to run on heterogeneous systems, using different operating systems and hardware architectures. With many languages, there is a tradeoff between investment in software development and the number of platforms that can be supported (and thus the size of the resulting network). Cross-platform languages can reduce the need to make this tradeoff, though potentially at the expense of high performance on any given node (due to run-time interpretation or lack of optimization for the particular platform).
Various middleware projects have created generic infrastructure, to allow diverse scientific and commercial projects to harness a particular associated grid, or for the purpose of setting up new grids. BOINC is a common one for academic projects seeking public volunteers; more are listed at the end of the article.
In fact, the middleware can be seen as a layer between the hardware and the software. On top of the middleware, a number of technical areas have to be considered, and these may or may not be middleware independent. Example areas include SLA management, Trust and Security, VO management, License Management, Portals and Data Management. These technical areas may be taken care of in a commercial solution, though the cutting edge of each area is often found within specific research projects examining the field.
CPU-scavenging, cycle-scavenging, cycle stealing, or shared computing creates a "grid" from the unused resources in a network of participants (whether worldwide or internal to an organization). Typically this technique uses desktop computer instruction cycles that would otherwise be wasted at night, during lunch, or even in the scattered seconds throughout the day when the computer is waiting for user input or slow devices.
Volunteer computing projects use the CPU scavenging model almost exclusively.
In practice, participating computers also donate some supporting amount of disk storage space, RAM, and network bandwidth, in addition to raw CPU power. Since nodes are likely to go "offline" from time to time, as their owners use their resources for their primary purpose, this model must be designed to handle such contingencies.
The term Grid computing originated in the early 1990s as a metaphor for making computer power as easy to access as an electric power grid in Ian Foster and Carl Kesselmans seminal work, "The Grid: Blueprint for a new computing infrastructure".
CPU scavenging and volunteer computing were popularized beginning in 1997 by distributed.net and later in 1999 by SETI@home to harness the power of networked PCs worldwide, in order to solve CPU-intensive research problems.
The ideas of the grid (including those from distributed computing, object oriented programming, web services and others) were brought together by Ian Foster, Carl Kesselman and Steve Tuecke, widely regarded as the "fathers of the grid." They led the effort to create the Globus Toolkit incorporating not just computation management but also storage management, security provisioning, data movement, monitoring and a toolkit for developing additional services based on the same infrastructure including agreement negotiation, notification mechanisms, trigger services and information aggregation. While the Globus Toolkit remains the defacto standard for building grid solutions, a number of other tools have been built that answer some subset of services needed to create an enterprise or global grid.
During 2007 the term cloud computing came into popularity, which is conceptually similar to the canonical Foster definition of grid computing (in terms of computing resources being consumed as electricity is from the power grid). Indeed grid computing is often (but not always) associated with the delivery of cloud computing systems.
Grids offer a way to solve Grand Challenge problems like protein folding, financial modeling, earthquake simulation, and climate/weather modeling. Grids offer a way of using the information technology resources optimally inside an organization. They also provide a means for offering information technology as a utility for commercial and non-commercial clients, with those clients paying only for what they use, as with electricity or water.
Grid computing is presently being applied successfully by the National Science Foundation's National Technology Grid, NASA's Information Power Grid, Pratt & Whitney, Bristol-Myers Squibb, Co., and American Express.
As of March 2008, Folding@home had achieved peaks of 1502 teraflops on over 270,000 machines.
Another well-known project is distributed.net, which was started in 1997 and has run a number of successful projects in its history.
Until April 27, 2007, United Devices operated the United Devices Cancer Research Project based on its Grid MP product, which cycle scavenges on volunteer PCs connected to the Internet. As of June 2005, the Grid MP ran on about 3,100,000 machines
The Enabling Grids for E-sciencE project, which is based in the European Union and includes sites in Asia and the United States, is a follow up project to the European DataGrid (EDG) and is arguably the largest computing grid on the planet. This, along with the LHC Computing Grid (LCG) have been developed to support the experiments using the CERN Large Hadron Collider. The LCG project is driven by CERN's need to handle huge amounts of data, where storage rates of several gigabytes per second (10 petabytes per year) are required. A list of active sites participating within LCG can be found online as can real time monitoring of the EGEE infrastructure. The relevant software and documentation is also publicly accessible.
Another well-known project is the World Community Grid The World Community Grid's mission is to create the largest public computing grid benefiting humanity. This work is built on the belief that technological innovation combined with visionary scientific research and large-scale volunteerism can change our world for the better. IBM Corporation, has donated the hardware, software, technical services and expertise to build the infrastructure for World Community Grid and provides free hosting, maintenance and support.
Grids can be categorized with a three stage model of departmental grids, enterprise grids and global grids. These correspond to a firm initially utilising resources within a single group i.e. an engineering department connecting desktop machines, clusters and equipment. This progresses to enterprise grids where non-technical staff's computing resources can be used for cycle-stealing and storage. A global grid is a connection of enterprise and departmental grids which can be used in a commercial or collaborative manner.
|Open Middleware Infrastructure Institute Europe (OMII-Europe)||Europe||May 2006||May 2008|
|Enabling Grids for E-sciencE (EGEE)||Europe||March 2004||March 2006|
|Enabling Grids for E-sciencE II (EGEE II)||Europe||April 2006||April 2008|
|D4Science (DIstributed colLaboratories Infrastructure on Grid ENabled Technology 4 Science)||Europe and Asia and the Pacific||January 2008||December 2009|
|E-science grid facility for Europe and Latin America (EELA-2)||Europe and Latin America||April 2008||March 2010|
|E-Infrastructure shared between Europe and Latin America (EELA)||Europe and Latin America||January 2006||December 2008|
|Business Experiments in GRID (BEinGRID)||Europe||June 2006||November 2009|
|BREIN||Europe||September 2006||August 2009|
|KnowARC||Europe||June 2006||August 2009|
|Nordic Data Grid Facility||Scandinavia and Finland||June 2006||December 2010|
|DataTAG||Europe and North America||January 2001||January 2003|
|European DataGrid (EDG)||Europe||March 2001||March 2004|
|BalticGrid||Europe (Baltic States)||November 2005||April 2008|
|EUFORIA (EU Fusion fOR Iter Applications)||Europe||January 2008||December 2010|
|World Community Grid||Global||November 2004||unknown|
|XtreemOS||Europe||June 2006||June 2010|
|GridEcon||Europe||June 2006||Arpil 2009|