Introduction
What is a grid?
Grid technologies represent a significant step forward in the effective use of network-connected resources, providing a
framework for sharing distributed resources while respecting the distinct administrative priorities and autonomy of the
resource owners. A grid can also help people discover and enable new ways of working together — providing a means for
resource owners to trade unused cycles for access to significantly more compute power when needed for short periods,
for example, or establishing a new organizational or cultural paradigm of focused investments in common infrastructure
that is made available for broad benefit and impact.
Arriving at a common definition of "a grid" today can be very difficult. Perhaps the most generally useful definition
is that a grid consists of shared heterogeneous computing and data resources networked across administrative boundaries.
Given such a definition, a grid can be thought of as both an access method and a platform, with grid middleware being
the critical software that enables grid operation and ease-of-use. For a grid to function effectively, it is assumed that
- hardware and software exists on each resource to support participation in a grid and,
- agreements and policies exist among grid participants to support and define resource sharing.
Standards to define common grid services and functionality are still under development. The promise of the transparent
and ubiquitous resource sharing has excited and inspired a variety of views of a grid, often with considerable hype,
from within multiple sectors (academe, industry, government) and flavored by numerous perspectives.
Many products are available for implementing "a grid", or grid-like capabilities. In some cases, the focus is on
providing high performance capability, either through eased or increased access to existing high performance
computing (HPC) resources, or a new level of performance realized through the orchestration of existing resources.
In other cases, the focus is on using the network coupled with grid middleware to provide users or applications with
seamless access to distributed resources of varying types, often in the service of solving a single problem or inquiry.
With both standards and products under rapid development, product selection inevitably affects the definition of the
resulting grid — that is, any given grid is at least partially defined by the functionality, focus and features of the
product(s) that are used to implement it. Throughout this Cookbook, high level concepts and general examples will
consider a variety of "grid types" but specific examples and case studies necessarily reflect particular products and
approaches, with emphasis on those most commonly implemented today.
When grid technology is viewed as evolving into a generalized and globally shared infrastructure (a "grid of grids",
comprised of campus grids, projects grids, regional grids, institutional or organizational grids, etc.), the vision
is often referred to as "the Grid", still only a concept but similar in many ways to today's Internet, which evolved
from distributed IP networks loosely united to provide a globally-used capability.
Is it a grid or a cluster?
Clusters are often compared to, and confused with, grids. A cluster can be defined as a group of computers coupled
together through a common operating system, security infrastructure and configuration that are
used as a group to handle users' computing jobs. Clusters fall into a variety of categories, including the following.
- High performance computing (HPC) clusters provide a cost-effective capability that rivals or exceeds the
performance of large shared-memory multiprocessors for many applications. Such clusters typically consist of
thousands, tens of thousands, or hundreds of thousands of compute elements (i.e., processors or cores) and a high
performance network (e.g., Myrinet, Infiniband, etc.) that is substantially more efficient than Ethernet.
- Beowulf clusters comprised of commodity-hardware compute nodes running Linux software and with dedicated
interconnects (and similar architectures using other operating systems.)
- "Cycle-scavenging" services (aggregating and scheduling access to compute cycles that would otherwise go unused on
individual systems, not necessarily running the same operating system (e.g., Condor pools).
For the purposes of this
cookbook, a grid is assumed to consist of at least two such systems that connect across administrative domains.
A computational grid emphasizes aggregate compute power and performance through its collective nodes. A data grid
emphasizes discovery, transfer, storage and management of data distributed across grid nodes.
What instruments, resources and services might you find on a grid?
The predominant impression, or sometimes de facto definition, of a grid is that it is a collection of computational
resources that can be combined to produce a greater HPC capability than each resource can
provide on its own. In fact, many grids are focused on computation, at least initially, since the concepts and processes
for combining computational elements are the most mature and compute-intensive applications are more obviously positioned
to benefit from the multiplication of capability made possible by grid technology. A grid, however, can facilitate access
to a wide variety of resources, and the type and timing of resources to be added to any given grid depends on the intended
use community and application set. Resources other than compute resources may be more obvious or compelling for a
particular community to share, such as visualization tools, high-capacity storage, data services, or access to unique
or distributed instruments (e.g., telescopes, microscopes, sensors).
The actual process for adding a resource to a grid — or "grid-enabling" the resource — varies according to the type of
resource being added as well as the grid technology in use. Compute resources are often the focus of examples within
this Cookbook due to their prevalence and relatively straight-forward (or at least common!) inclusion in a grid.
Processes to grid-enable other types of resources (e.g. data services, visualization, instruments) are less well known,
are likely to be more variable from grid product to grid product,
and may also be proprietary or highly dependent on the technical specifications of the particular device.
Some examples that illustrate the value and variety of making different resources available via a grid include:
- George E. Brown, Jr. Network for Earthquake Engineering Simulation
[1]
- From their Web site: "NEES is a shared national network of 15 experimental facilities, collaborative tools, a
centralized data repository, and earthquake simulation software, all linked by the ultra-high-speed Internet2 connections
of NEESgrid. Together, these resources provide the means for collaboration and discovery in the form of more advanced
research based on experimentation and computational simulations of the ways buildings, bridges, utility systems,
coastal regions, and geomaterials perform during seismic events ... NEES will revolutionize earthquake engineering
research and education. NEES research will enable engineers to develop better and more cost-effective ways of
mitigating earthquake damage through the innovative use of improved designs, materials, construction techniques,
and monitoring tools." The NEES Central portal provides a single launching point for access to a variety of facilities
(see NEEScentral web site
[20])
including instruments such as geotechnical centrifuges, shake tables and tsunami wave basins.
- Laser Interferometer Gravitational-Wave Observatory (LIGO)
[3]
- From their Web site: "The Laser Interferometer Gravitational-Wave Observatory (LIGO) is a facility dedicated to the
detection of cosmic gravitational waves and the harnessing of these waves for scientific research...the LIGO Data Grid
is being developed with an initial focus on distributed data services — replication, movement, and management — versus
high-powered computation. " The gravitational wave detectors produce large amounts of observational data that is
analyzed alongside similar scale expected or predicated data by scientists working in this field.
- Earth System Grid
[4]
- From their Web site: "The primary goal of ESG is to address the formidable challenges associated with enabling
analysis of and knowledge development from global Earth System models. Through a combination of Grid technologies
and emerging community technology, distributed federations of supercomputers and large-scale data and analysis
servers will provide a seamless and powerful environment that enables the next generation of climate research." Both
data resources/services and high performance computational resources are necessary on this grid to meet a primary
project objective: "High resolution, long-duration simulations performed with advanced DOE SciDAC/NCAR climate models
will produce tens of petabytes of output. To be useful, this output must be made available to global change
impacts researchers nationwide, both at national laboratories and at universities, other research laboratories,
and other institutions."
- cancer Biomedical Informatics Grid (caBIG)
[5]
- From their Web site: "To expedite the cancer research communities, access to key bioinformatics tools, platforms and
data, the NCI is working in partnership with the Cancer Center community to deploy an integrating biomedical informatics
infrastructure: caBIG (cancer Biomedical Informatics Grid). caBIG is creating a common, extensible informatics
platform that integrates diverse data types and supports interoperable analytic tools in areas including clinical
trials management, tissue banks and pathology, integrative cancer research, architecture, and vocabularies and
common data elements." The current suite of software development toolkits, applications, database technologies, and
Web-based applications from caBIG are openly available from their
Tools, Infrastructure, Datasets Web site
[21],
as tools for the target research community but also as models and reusable components for meeting similar service
needs in other grid environments.
-
Two notable initiatives are also addressing, at a more general level, the question of how to connect and control instruments
in particular within a grid environment:
- Grid-enabled Remote Instrumentation with Distributed Control and Computation
[2]
(GRIDCC) — From their Web site: "Recent developments in Grid technologies have concentrated on providing batch access to
distributed computational and storage resources. GRIDCC will extend this to include access to and control of
distributed instrumentation ... The goal of the GRIDCC project is to build a widely distributed system that is able to
remotely control and monitor complex instrumentation.
-
Instrument Middleware Project
[6]
From their Web site: "The Common Instrument Middleware Architecture (CIMA) project, supported by the National Science
Foundation Middleware Initiative, is aimed at "Grid enabling" instruments as real-time data sources to improve
accessibility of instruments and to facilitate their integration into the Grid... The end product will be a consistent
and reusable framework for including shared instrument resources in geographically distributed Grids."
Both of the above initiatives are implementing their emerging products and services into actual and specific pilot
applications to verify the efficacy and extensibility of their architecture and approach. Between the two initiatives,
examples of grid-enabled instrumentation are being further developed in several diverse fields, including electrical
and telecommunication grids (those "other grids"!), particle physics, earth observation and geohazard monitoring,
meteorology, and x-ray crystallography.
Who can access grid resources?
Authentication (authN) and authorization (authZ) are used together on grids to enforce conditions of use for
resources as specified by the resource owner. This is recognized by Foster et al. in describing grid technology
as a "resource-sharing technology with software and services that let people access computing power, databases,
and other tools securely online across corporate, institutional, and geographic boundaries without sacrificing
local autonomy"
[11].
A researcher in the higher-education community, for example, may not only be a computer user
on their campus's primary network, they may be a user of regional, national, or international resources within
grid-based projects. Each grid determines what process and proof is acceptable to identify a user (authentication),
and decides what that user is then authorized to access (authorization.)
Authentication (authN) is the act of identifying an individual user through the presentation of some credential. It
does not include determining what resources the user can access, which is considered authorization. The process of
authentication verifies that a real-world entity (e.g. person, compute node, remote instrument, application process)
is who or what its identifier (e.g., username, certificate subject, etc.) claims it to be. In the process, the
authentication credentials are evaluated and verified as being from a trusted source and at a particular
level of assurance. Examples of credentials include a smartcard, response to a challenge question, password,
public-key certificate, photo ID, fingerprint, or a biometric
[12]
[13]
[14].
Authentication is also often referred to as identity management.
Authorization (authZ) refers to the process of determining the eligibility of a properly authenticated entity to
perform the functions that it is requesting (access a grid-based application, service, or resource, for instance). The
term "authorization" may be applied to the right or permission that is granted, the issuing of the token that
proves a subject has that right, or to the token itself (e.g., a signed assertion). Signed assertions and other
authorization characteristics are stored for reference in a variety of ways: within a local file system, on an
external physical device (e.g. a smartcard), in a separate data system, or within system or enterprise-wide
directories
[12]
[13]
[14].
The characteristics that are assessed to determine status or levels of authorization for
a given entity are often referred to as "attributes" of that entity.
Organizations contributing to a grid infrastructure develop policies for conditions of use of the grid resources and
use authentication and authorization tools to implement those policies. Several types of authentication and
authorization mechanisms have been developed or adopted for grids over time and are in active use today. There
is not (yet?) consensus on which technologies are or will prove to be most effective, particularly for grids to
scale to the level of global infrastructure, or for inter-departmental, inter-institutional, multi-project or
multi-purpose grids, in which resources are not governed under the same administrative domain. However, a variety of sound,
operational authN/Z approaches do exist. It is valuable to review several options when deciding on an
approach to meet immediate as well as future needs of a given grid deployment, keeping in mind that choosing a
particular toolkit may lock you into a particular authentication/authorization model.
Bibliography[1] George E. Brown, Jr. Network for Earthquake Engineering Simulation
(http://www.nees.org) [2] Grid Enabled Remote Instrumentation with Distributed Control and Computation (GRIDCC)
(http://www.gridcc.org/) [3] Laser Interferometer Gravitational-Wave Observatory (LIGO)
(http://www.ligo.caltech.edu) [4] Earth System Grid
(http://www.earthsystemgrid.org/) [5] cancer Biomedical Informatics Grid (caBIG)
(http://cabig.cancer.gov/index.asp) [6] Instrument Middleware Project
(http://www.instrumentmiddleware.org/metadot/index.pl) [7] Grid Café
(http://gridcafe.web.cern.ch/gridcafe/gridatwork/gridatwork.html) [11] Foster, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, 2002
[12] nmi-edit Glossary
(http://www.nmi-edit.org/glossary/index.cfm) [13] GFD Authorization Glossary
(http://www.gridforum.org/documents/GFD.42.pdf) [14] Internet2 Authentication WebISO
(http://middleware.internet2.edu/core/authentication.html) [17] SURA's NMI Case Study Series
(http://www.sura.org/programs/nmi_testbed.html#NMI) [18] Adiga, Henderson, Jokl, et al. "Building a Campus Grid: Concepts and Technologies" (September 2005)
(http://www1.sura.org/3000/SURA-AuthNauthZ.pdf) [19] Adiga, Barzee, Bolet, et al. "Authentication & Authorization in SURAgrid: Concepts and Technologies", (May 2005)
(http://www1.sura.org/3000/BldgCampusGrids.pdf) [20] NEEScentral website
(https://central.nees.org/?action=DisplayFacilities) [21] caBIG Tools, Infrastructure, Datasets
(https://cabig.nci.nih.gov/inventory/)
|