What Grids Can Do For You
Payoffs and tradeoffs
The goal of grids is to enable and to simplify access to distributed resources. Based on the electric power grid as a
model, a strong concept behind the development of grid technology is to provide a basic computational infrastructure
that users could draw on for computation, visualization and data services. A person plugs in a toaster, radio or other
appliance, without worrying about where the power is coming from or how it gets to them. In an ideal world, grid
infrastructure would enable computational resources, data services, and even specialized instrumentation or sensors
to be "plugged into" the grid, with user interfaces similarly "plugged in" to provide access without users needing to
worry about many of the details as to where the devices, services or data reside. The challenge of grids is that the
resources involved are distributed across a wide area, are administered and controlled by a variety of individuals
and organizations, and adhere to a variety of usage policies and procedures. In addition, the performance
characteristics and benefits will vary in that some grids are used to facilitate access to HPC resources (supercomputers),
some bring together commodity computing capability, and all are dependent on the performance and reliability of the
system-level, local, and also wide area network interconnects that tie them together.
In this chapter, we consider the cost-benefit analysis in terms of the effort required to coordinate the use of a
heterogeneous set of resources that exist across administrative domains. That is, what makes such an extensive effort of
coordination and software development (i.e., middleware) worth while? What are the tradeoffs that must be considered
for an organization in the process of deciding whether or not to deploy or use resources on a grid? In this chapter,
we will discuss some of the issues in general terms, with more detail further on in the cookbook.
Access to resources beyond those locally available
If a researcher were offered access to compute clusters, visualization engines, and a multitude of databases beyond
what was locally available, most would be at least cautiously interested. Commonly anticipated advantages from an
end-user perspective include:
- Improved model resolution resulting from access to greater compute power
- Increased size or number of calculations or applications that can be executed simultaneously
- Access to specialized visualization resources, allowing the rendering of complex scientific results in forms more
easily interpreted by researchers
- Access to large amounts of preprocessed and well organized data across high speed networks and the ability to
participate in and contribute to large, geographically dispersed research collaborations
Some difficulty arises, however, from the fact that resources on a grid are not often owned or controlled by a
single administrative domain. This can affect the "cost" of computing — in terms of ready access, ease-of-use or even
actual financial cost — beyond what may be initially obvious. Even so, grid computing arguably provides its greatest
benefit when aggregating resources across project or organizations, enabling individuals within participating
organizations to share resources and knowledge at unprecedented levels. There are a variety of regional, national and
international-scale grid initiatives that provide shared access to specialized and general grid computing capabilities
in support of the research and education mission. Later in this section we will provide several examples of
existing grid initiatives providing a variety of services.
An alternate perspective on the inter-organizational sharing of resources comes from organizational management, who
may ask "Why should I provide others with access to machines that came at my institution's cost and in response to
specific needs and requests from my institution's users?" This question comes up time and again as institutions — or
even departments within an institution — contemplate adding significant resources to a grid that is beyond their local
domain. Accumulating resources locally may initially seem to be the most effective approach to meeting local needs,
however, the drive for increased capability and diversity within a growing community can rapidly outpace local
budget and resources for system acquisition and maintenance. Sharing resources through an inter-organizational grid
can be a more cost-effective way to meet ranging and evolving local needs while increasing the capabilities available
to the community at large. In addition, sharing resources with other organizations can provide users with access to a
multiplicity of compute architectures and other types of resources not locally available, and, as importantly, to a
larger community of potential collaborators and relationships for both technological and scientific advancement.
A notable challenge in the sharing of resources across institutions is determining the identity of users from
different organizations so that local as well as grid-wide access and authorization policies can be applied. The
successful coordination of authentication and authorization mechanisms with identity management technologies is key.
For instance, Globus leverages
Public Key Infrastructure (PKI)
[1]
as a basis for its management of access to grid resources. PKI offers a framework for organizations to share and trust
assertions of identity through the exchange of digital certificates supported by public and private digital keys. If
one's organization already utilizes PKI for identity management and is joining a grid that is Globus-based, integration
at this level is fairly straightforward. If not, processes and technologies for mapping or converting organizational
identities into appropriate PKI-based credentials need to be established. While this may not be complex in all situations,
an organization must have sufficient IT resources and expertise to evaluate possible solutions, and, ideally,
integration and cooperation with those who manage and administer the organization's existing identity management system(s).
Performance and speedup
Computational resources, and specifically high performance systems or clusters, are often the first type
of resource one thinks of at the mention of a grid. High performance, high-end, "super" computing
has been around for a long time. It can be difficult for an organization to engage its diverse audience in
an effort to construct HPC infrastructure at a campus. It is easier to engage in these
discussions in the context of establishing a grid, especially since the grid offers the potential of making
compute resources available to a larger community as well as augmenting the resources available at its member institutions.
The tradeoff here is that the grid doesn't always provide a complete
solution. Cross platform schedulers, accounting, message passing paradigms,
and so forth are required. Ongoing work in both standards and product development is attempting to bridge
these gaps and much of the detail can now be hidden from the user through th euse of web services and interfaces. Joining
a grid and accessing it through web services will be covered in significant detail later.
Collaboration
As noted earlier, groups within an individual institution may be too small to
justify or fund the type of resources they need and, in fact, they may only
need those resources from time to time. As sponsoring agencies began to fund
broader collaborations,
the idea of "communities" evolved. Communities generally come
in a number of categories such as "interest", "practice", "purpose" and
so forth. (See Wikipedia "Community
of interest"
[2]
for more explanation.)
In our case, the people in these communities share interest, practice, purpose
[and so forth] in a particular field of science or engineering.
Grids help these communities build and share resources as well. The payoffs
are in sharing knowledge, building expertise together (in both their shared
area as well as in grid use), and enabling the community to build better
cases together for more resources. The tradeoff is the
increased complexity and management that grid use brings in order to use those
resources. In this cookbook we will attempt to bridge the gaps and smooth
out some of the complexity in the most simple terms possible.
Alignment with National Vision for 21st Century Discovery
In the National Science Foundation's recent report, "Cyberinfrastructure Vision for 21st Century Discovery", the term
cyberinfrastructure is defined as, "... computing systems, data, information resources, networking, digitally
enabled-sensors, instruments, virtual organizations, and observatories." From Arden Bement's introduction to this report:
"At the heart of the cyberinfrastructure vision is the development of a cultural community that supports peer-to-peer
collaboration and new modes of education based upon broad and open access to leadership computing; data and information
resources; online instruments and observatories; and visualization and collaboration services. Cyberinfrastructure
enables distributed knowledge communities that collaborate and communicate across disciplines, distances and cultures.
These research and education communities extend beyond traditional brick-and-mortar facilities, becoming virtual
organizations that transcend geographic and institutional boundaries."
Clearly grid computing will have a central role in the development of the cyberinfrastructure capabilities envisioned
by the NSF. Understanding the basics of grid computing and working with collaborative teams of scientists and computing
professionals to use and help develop grid computing tools and techniques will be an increasingly important component
of a successful agency funding strategy.
Examples of Evolving Grid-based Services and Environments
Aggregating computational resources Aggregating computational resources
A grid layer can make otherwise separate, distributed and different computational hardware appear as a single,
common resource to which the user can submit jobs in a standard way. For instance, users may submit a genome
alignment application via a grid portal and the job will run on any of several clusters, whether those clusters
are at one university or another, or whether the operating systems are different versions.
Several examples of projects that are developing frameworks and toolkits for aggregating resources
include:
- TeraGrid — From the TeraGrid website
[57]:
"TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to
create an integrated, persistent computational resource. Using high-performance network connections, the TeraGrid
integrates high-performance computers, data resources and tools, and high-end experimental facilities around the country.
Currently, TeraGrid resources include more than 250 teraflops of computing capability and more than 30 petabytes of
online and archival data storage, with rapid access and retrieval over high-performance networks. Researchers can also
access more than 100 discipline-specific databases. With this combination of resources, the TeraGrid is the world's
largest, most comprehensive distributed cyberinfrastructure for open scientific research."
TeraGrid is coordinated through the Grid Infrastructure Group (GIG) at the University of Chicago, working in partnership
with the Resource Provider sites: Indiana University, Oak Ridge National Laboratory, National Center for Supercomputing
Applications, Pittsburgh Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing
Center, University of Chicago/Argonne National Laboratory, and the National Center for Atmospheric Research.
- SURAgrid — From the SURAgrid website
[8],
"SURAgrid is a consortium of organizations collaborating and combining resources to help bring grid technology to the
level of seamless, shared infrastructure. The vision for SURAgrid is to orchestrate access to a rich set of distributed
capabilities in order to meet diverse users' needs. Capabilities to be cultivated include locally contributed resources,
project-specific tools and environments, highly specialized or HPC access, and gateways to national and international
cyberinfrastructure. SURAgrid resources currently include over 10 teraflops of pooled computing resources, accessed
through a common SURAgrid portal using a common authentication and authorization mechanism, the SURAgrid Bridge
Certificate Authority."
- Geodise — The Geodise project
[3],
aimed initially at Computational Fluid Dynamics (CFD) applications, has the
mission "To bring together and further the technologies
of Design Optimisation, CFD, GRID computation, Knowledge Management & Ontology
in a demonstration of solutions to a challenging industrial problem". Funded
by the Engineering
and Physical Sciences Research Council (EPSRC)
[4]
in the United Kingdom (UK), Geodise involves multidisciplinary
teams
working on a state of the art design tool demonstrator. Intelligent design
tools will steer the user through set up, execution, post-processing, and optimization
activities. These tools are physically distributed, under the control of multiple
elements, to improve design processes that can require assimilation of terabytes
of distributed data.
- Elastic Compute Cloud — Brush up that Amazon account! They aren't just about books and
CDs anymore.
Amazon
Web Services
[9]
now provides application and service developers with direct access to Amazon's
technology platform.
From their website, "Build on Amazon's suite of web services to enable and enhance your applications. We
innovate for you, so that you can innovate for your customers." Their
Solutions catalog
[10]
shows services such as E-Commerce, Simple Storage, and so
forth. Their Elastic
Compute Cloud
[11]
(Amazon EC2) service is "a web service that provides resizable compute capacity
in the cloud. It is designed to make web-scale computing easier for developers." Known
also as utility computing by other service providers, Amazon
EC2 presents a virtual computing environment that allows you to use web
service interfaces
to requisition machines for use, load them with your custom application
environment, manage your network's access permissions, and run your image
using as many or few systems as you desire. Pricing is per instance-hour
consumed, per GB of storage transferred to/from Amazon, and per GB-month
of Amazon S3 (Simple Storage Solution) used.
InfoWorld's
[12]
article Amazon.com's
rent-a-grid
[13]
provides an interesting and compact summary of the service.
To quote them, "As the service's
name suggests, though, if you need an elastic capability that can nimbly
grow or shrink, EC2 is the only game in town." The
author quickly points out that 3Tera
[14]
is coming out with their AppLogic grid system
[15]
soon though.
Improved access for data-intensive applications
In an ideal world, a grid user may start up a data-intensive application and the grid will assemble the data streams
combining data from multiple, distributed sources, so that the user experiences fast responses and sees the data as a
logical whole. Several service components are needed to realize that vision, including data discovery, storage, possibly
replication and version control, and reliable data transfer.While still developing towards the ideal, current data grids
can manage access to data that may have been collected and stored at different locations, and provide controlled, secure
access for communities as well as individuals. A grid workflow can be developed to manage data integration transparently
for the user, or handle data access such that an application can process the data with improved throughput.
Applications in fields such as high energy physics (HEP), life sciences, and climate and weather modeling not only use
but also generate massive amounts of data. These compute intensive applications can realize great benefit from
access to an expanded pool of computational and data storage and management resources brought together using grid
technology. In this section we will concentrate on the data side of that puzzle.
- The International Virtual
Data Grid Laboratory (iVDgL)
[16]
was a global data grid that served forefront experiments in physics and astrophysics. Its resources were comprised of
heterogeneous computing and storage. Networking resources spanned the U.S., Europe, Asia and South America, thus
providing a unique laboratory that tested and validated Grid technologies at international and global
scales. The iVDgL was operated as a single system for the purposes of interdisciplinary
experimentation in Grid-enabled, data-intensive scientific computing. Its
goal was to drive the development, and transition to every day production
use, of Petabyte-scale virtual data applications.
Applications that made use of the iVDgL include:
-
Compact Muon Solenoid (CMS)
[17]
—
an experiment at the Large
Hadron Collider (LHC)
[18]
at CERN
[19]
in Geneva Switzerland. U.S. CMS
[20]
is
a collaboration of U.S. scientists participating in CMS. This collaboration
includes scientists at universities and Fermi
National Accelerator Laboratory (FNAL)
[21].
As their website states "The
CMS experiment is designed to study the collisions of protons at a center of
mass energy of 14 TeV. The physics program includes the study of electroweak
symmetry breaking, investigating the properties of the top quark, a search
for new heavy gauge bosons, probing quark and lepton substructure, looking
for supersymmetry and exploring other new phenomena." [U.S. CMS Overview
[22]]
-
A Toroidal LHC ApparatuS (ATLAS)
[23]
—
another experiment at the LHC, ATLAS is also designed
to detect particles created by the proton-proton collisions, " the main
goal for ATLAS is to look for a particle dubbed Higgs, which may be the source of mass
for all matter. Findings may also offer insight into new physics theories as
well as a better understanding of the origin of the universe." [U. S. ATLAS]
[24].
U.S. Atlas includes scientists at universities and Brookhaven
National Laboratory (BNL)
[25].
-
The Sloan Digital Sky Survey (SDSS)
[26]
—
when completed, SDSS will provide detailed optical images covering more than
a quarter of the sky, and a 3-dimensional map of about a million galaxies and
quasars. The SDSS is managed by the Astrophysical Research Consortium for its
participating institutions, including universities, museums, and laboratories.
The SDSS data server, SkyServer
[27],
holds two primary databases: BESTDR1 and TARGDR1. An
identical schema is used for both, but BESTDR1 has been processed
with the "best available software" for handling noise and is
therefore somewhat bigger. Combined the databases take over 800 GB of storage
which is over 3.4
billion rows (records)
[28].
SDSS is now up to Data
Release 5
[29].
iVDgL sites
in Europe and the U.S. were linked by a multi-gigabit per second transatlantic
link funded by the European DataTAG project
[30].

Figure WGD-3. iVDgL Project map.
(Interesting fact discovered while drafting this summary: "A TeV is a
unit of energy used in particle physics. 1 TeV is about the energy of motion
of
a flying
mosquito.
What makes the LHC so extraordinary is that it squeezes energy into a space
about a million million times smaller than a mosquito."
[31])
-
The EU-DataGrid
Project
[32],
funded by the European Union, had as its purpose "
to build the next generation computing infrastructure providing intensive computation
and analysis of shared large-scale databases, from hundreds
of TeraBytes to PetaBytes, across widely distributed scientific communities."
A collaboration of about twenty European research institutes, DataGrid fulfilled
its objectives in March of 2004 and moved on to become the EGEE (Enabling Grids
for E-sciencE)
[33].
The DataGrid project focused on three application areas:
- High Energy Physics — As has iVDgL, DataGrid set the stage for handling
the huge amounts of data produced by the LHC. A multi-tiered, hierarchical
computing model has been adopted to share data and computing efforts among
multiple institutions. The Tier-0 center is located at CERN and is linked
by high speed networks to approximately ten major Tier-1 data processing
centers. These fan out the data to a large number of smaller centers known as Tier-2s.
- Biology and Medical Image Processing — The DataGrid project's biology testbed
provided the platform for new algorithms on data mining, databases,
code management, graphical interface tools and facilitated sharing of
genomic and medical imaging databases for the benefit of international cooperation
and health care.
- Earth Observations — The European Space Agency missions involve the download,
from space to ground, of about 100 Gigabytes of raw images per day. Dedicated
ground infrastructures have been set up to handle the data produced by instruments
onboard the satellites.
DataGrid demonstrated an improved
way to access and process large volumes of data stored in distributed European-wide
archives.
See the DataGrid Project
Description
[34]
for more information.
- Looking at it from another perspective, projects like OGSA-DAI
[35]
develop middleware to assist with access and integration of data from separate sources
via the
grid. Directly from their website, "OGSA-DAI is motivated by the need to:
- Allow different types of data resources — including relational, XML and files
— to be exposed onto Grids.
- Provide a way of querying, updating, transforming and delivering data via web
services.
- Provide access to data in a consistent, data resource-independent way.
- Allow metadata about data, and the data resources in which this data is stored,
to be accessed.
- Support the integration of data from various data resources.
- Provide web services that can be combined to provide higher-level web services
that support data federation and distributed query processing.
- To contribute to a future in which scientists move away from technical
issues such as handling data location, data structure, data transfer and
integration
and instead focus on application-specific data analysis and processing."
Many grid projects are using OGSA-DAI including
- LEAD
[36]
— Linked Environments for Atmospheric Discovery
- caGrid
[37]
— the Cancer Biomedical Informatics Grid
- AstroGrid
[38]
— a project to build an infrastructure for the Virtual Observatory
(VObs)
- BRIDGES
[39]
— Biomedical Research Informatics Delivered by Grid Enabled Services
- eDiaMoND
[40]
— a Grid for X-Ray Mammography
- GeneGrid
[41]
— exploiting existing micro array and sequencing technologies and the large
volumes of data generated through screening
services. to develop specialist tissue specific datasets relevant to
the particular type of disease being studied
- and more
[42].
Federation of shared resources toward global services A particularly important aspect of the grid is that of support for
"virtual organizations," or VOs. When the high-energy physics
community began collaborating on large-scale physics problems,
researchers from many different and widely separated organizations
needed to work together. The problem domain was so vast that
researchers at any one site needed the expertise from researchers at
other sites in order to make progress. A project might represent
dozens, hundreds or thousands of scientists collaborating together.
The concept of the "virtual organization" recognized that such
project groups would convene from various organizations and need to
work together as if they were, in fact, from a single organization.
In fact, VOs may be very dynamic and ad hoc, coming together for very
specific purposes, working together for fixed time periods, adding
and losing members over time.
Grid middleware can support sharing of resources using a federated approach, where participating organizations
retain control over their local resources and services but also share these resources in a way that becomes
globally scalable.
For example, an institution would authenticate users locally for access to institutionally-controlled resources
but leverage grid security infrastructure to enable those same users to access external grid resources. Additionally,
users that are identified as members of a particular project, or VO, could be authorized to use resources in
a way that has been pre-approved for members of that group.
-
Funded by the National Science Foundation, the
Computational Chemistry Grid
[43],
(CCG) has developed a java client to facilitate access
to a controlled set of applications, HPC and storage resources for
use by the computational chemistry community. Project partners
include the Center for Computational Sciences at the University of
Kentucky, the Center for Computation & Technology at Louisiana State
University, the National Center for Supercomputing Applications
(NCSA), Texas Advanced Computing Center (UT Austin) and the Ohio
Supercomputer Center. From their Web site: "The 'Computational
Chemistry Grid' (CCG) is a virtual organization that provides access
to high performance computing resources for computational chemistry
with distributed support and services, intuitive interfaces and
measurable quality of service." Access is granted through an approval
process, with allocations "available to US academic and government
research staff and to non-US academic researchers." Three types of
project allocations are available: research, community research and
instructional. Research allocations are intended to support large,
often multi-year scientific research projects. Community allocations
are shorter term and intended to be used towards development of a
larger research effort. Instructional allocations can be used to
support academic instruction in the field.
-
The cancer Biomedical Informatics Grid
[44],
(caBig) is a virtual organization of "over 800 people from
approximately 50 NCI-designated Cancer Centers and other
organizations" in a "voluntary network or grid...to enable the
sharing of data and tools, creating a World Wide Web of cancer
research." Development of the project is taking place under the
leadership of the National Center Institute's Center for
Bioinformatics and has the primary goal of "[speeding] the delivery
of innovative approaches for the prevention and treatment of cancer".
However, the concepts and technologies involved are also being
developed with an eye towards reuse and adaptability outside of the
cancer research community. Releases of software and components are
publicly available on the project's community web site. A separate informational
web site is available for those who are not intending to use services or tools but who are interested
in knowing more about the initiative: http:cabig.cancer.gov
[45].
-
The Open Science Grid (OSG)
[46],
is an outgrowth of three notable physics
projects — the DOE-funded Particle Physics Data Grid (www.ppdg.net),
and the NSF-funded Grid Physics Network (GriPhyN, www.griphyn.org)
and the International Virtual Data Grid Laboratory (iVDGL,
www.ivdgl.org). Collaborators leading and within these projects
became interested in the benefits of grid technology for disciplines
beyond physics and began to develop their grid middleware and related
services with an eye towards broader use. Today, the concept of a
"virtual organization" is central to the conceptual as well as
operational functioning of OSG, and there are well over two-dozen VOs
participating, representing a variety of scientific fields.
Organizations that contribute resources to OSG retain control of
those resources but enable use by project groups through access
management tools that have been designed around the VO concept. From
their Web site: "A Virtual Organization (VO) is a collection of
people (VO members), computing/storage resources (sites) and services
(e.g., databases). In OSG, we typically use the term VO to refer to
the collection of people, and the terms Site, Computing Element (CE),
and/or Storage Element (SE) to refer to the resources owned and
operated by a VO." As an organization itself, OSG is also focused on
establishing interoperability with other grids, such as Teragrid,
international, regional and campus grids.
Harnessing unused cycles
Grids can enable an organization to capture the incredible amount of computing that exists in idle PCs and workstations.
Users can use grid services to submit applications as if to a single resource — the grid manages submission to
various computers, monitoring of status, and collection of the results.
Various tools, both open source and proprietary, exist to help an organization with this sort of grid-enabled service.
- Probably the most famous application is the cycle sharing application
SETI@home
[46].
SETI@home was proposed in 1995 and launched in 1999.
As their website states "SETI (Search for Extraterrestrial Intelligence) is a scientific area whose goal is to detect
intelligent life outside Earth. One approach, known as radio SETI, uses radio telescopes to listen for narrow-bandwidth
radio signals from space. Such signals are not known to occur naturally, so a detection would provide evidence of
extraterrestrial technology." SETI@home has developed a large community around their project and they include various statistics
about their participants on their website.
Today SETI@home uses software called
BOINC
[47].
BOINC has the expanded mission to use the idle time on your computer (Windows, Mac, or Linux) to cure diseases,
study global warming, discover pulsars, and do many other types of scientific research. You can use the BOINC software to
create your own project. Worldwide projects, such as the
World Community Grid
[48], use BOINC. As
their mission states "World Community Grid's mission is to create the world's largest public computing grid to
tackle projects that benefit humanity.
Our work has developed the technical infrastructure that serves as the grid's foundation for scientific research.
Our success depends upon individuals collectively contributing their unused computer time to change the world for the better.
World Community Grid is making technology available only to public and not-for-profit organizations to use in
humanitarian research that might otherwise not be completed due to the high cost of the computer infrastructure
required in the absence of a public grid. As part of our commitment to advancing human welfare, all results will be
in the public domain and made public to the global research community."
- Another well-known project is University of Wisconsin-Madison's
Condor
[49].
Condor is often used to manage clusters of dedicated processors, but it also has unique mechanisms that enable effective
harnessing of wasted CPU power from otherwise idle desktop workstations.
BOINC and Condor take very different approaching to the access and management of unused cycles. BOINC functions by
enabling thousands or even millions of users to trust a small set of programs to run on their computer, typically
leveraging the aggregate compute capacity towards the resolution of an overarching problem or inquiry. Condor harnesses
unused cycles to run unspecified applications. This requires a deeper level of trust and so is likely to involve a
smaller set of trusted computers. The benefit is the potential to run a much greater variety of applications, which
significantly increases the utility of Condor as a high-throughput computing system.
Condor can
- be configured to identify idle machines under various criteria
- checkpoint and migrate jobs when those machines are no longer available
- work in shared or non-shared file environment (that is, it can migrate files or retrieve from source as needed)
Condor also provides the job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management.
So Condor provides seamless access to a combination of distributed computers.
- United Devices
[50]
offers a number of commercial HPC products. Relevant to the discussion is
Grid-MP ™
[51]
which is an infrastructure solution for implementing and managing complex enterprise grids. GRID MP deployments can be single
cluster management implementations to large-scale multi-resource grids. Per United Devices, the GRID-MP system has scaled
to hundreds of thousands of CPUs and hundreds of thousands of jobs and can scale to over thousands of users.
Grid MP was built from the ground up to have a comprehensive security architecture
that includes transparent data encryption, secure authentication, digital signatures
and tamper detection. A framework for rapid application integration is
also included, based on open web services and standards. The interface
provides controlled access to all aspects of the grid system.
The system is designed for self-management via a web-based console, allowing
administrative access from anywhere.
Grid MP devices and users can be grouped with maximum flexibility. An administrator
can set up priority allocation and provisioning policies.
High-speed optical networking, network-aware applications As noted in the "Networks, switches and interconnects for grids"
section of this Cookbook, "...networks are the virtual bus for the
virtual grid computer and are central to the efficient, effective
operation of grids." As grids evolve, they are beginning to use high
bandwidth optical networks to interconnect grid nodes, increasing the
speed and efficiency possible between input/output, CPUs, storage and
other elements of the computational process. We are also seeing the
advent of "smart" applications — those that are able to actively (or
even proactively!) evaluate network conditions and react with dynamic
adjustments to insure successful operation. Both of these trends can
improve performance and thru-put as perceived by the users of grid
applications today, however, they also hold great promise for the
future. Some people feel that, to truly realize the potential of grid
technology, applications, middleware and network services must
interact much more frequently, intelligently and seamlessly than they
do today, to produce an adaptive capability much more akin to using a
single computer than distributing a problem across multiple systems.
Several concepts mentioned in the "Networks, switches and
interconnects for grid" section (virtual and dynamic circuits, advanced
monitoring, end-to-end performance, QoS) form a foundation for
further development in this area. In addition to the several project
examples provided in the "Networks..." section, the following projects are
exploring innovations relevant to the advancement of grid
technology:
-
The focus of the
Enlightened
Computing
[52]
(Highly-dynamic Applications Driving Adaptive Grid Resources)
project is "...on developing dynamic, adaptive, coordinated and optimized use of
networks connecting geographically distributed high-end computing
resources and scientific instrumentation. A critical feedback-loop
consists of resource monitoring for discovery, performance, and SLA
compliance, and feed back to co-schedulers for coordinated adaptive
resource allocation and coscheduling... For this project we have
assembled a global alliance of partners to develop, test, and
disseminate advanced software and underlying technologies which
provide generic applications with the ability to be aware of their
network, Grid environment and capabilities, and to make dynamic,
adaptive and optimized use of networks connecting various high end
resources. We will develop advanced software and Grid middleware to
provide the vertical integration starting from the application down
to the optical control plane."
-
From the Optiputer
[53]
website: "The OptIPuter, so named for its use of Optical
networking, Internet Protocol, computer storage, processing and
visualization technologies, is an envisioned infrastructure that will
tightly couple computational resources over parallel optical networks
using the IP communication mechanism. The OptIPuter exploits a new
world in which the central architectural element is optical
networking, not computers — creating "supernetworks". This paradigm
shift requires large-scale applications-driven, system experiments
and a broad multidisciplinary team to understand and develop
innovative solutions for a "LambdaGrid" world. The goal of this new
architecture is to enable scientists who are generating terabytes and
petabytes of data to interactively visualize, analyze, and correlate
their data from multiple storage sites connected to optical
networks."
-
From the
CANARIE*4
[54]
website (and the concept of
customer-empowered networks
[55]),
"CA*net 4 will, as did its predecessor CA*net 3,
interconnect the provincial research networks [of Canada], and
through them universities, research centers, government research
laboratories, schools, and other eligible sites, both with each other
and with international peer networks. Through a series of
point-to-point optical wavelengths, most of which are provisioned at
OC-192 (10 Gbps) speeds, CA*net 4 will yield a total initial network
capacity of between four and eight times that of CA*net 3...CA*net 4
will embody the concept of a "customer-empowered network" which will
place dynamic allocation of network resources in the hands of end
users and permit a much greater ability for users to innovate in the
development of network-based applications. These applications, based
upon the increasing use of computers and networks as the platform for
research in many fields, are essential for the national and
international collaboration, data access and analysis, distributed
computing, and remote control of instrumentation required by
researchers."
A Future View of "the Grid"
In an article in Scientific American
[56],
Ian Foster describes just how ubiquitous and transparent grids might be in the future.
"By linking digital processors, storage systems and software on a global scale, grid technology is poised to transform
computing from an individual and corporate activity into a general utility" — a utility similar to water distribution and
electrical power systems in both its value and the invisibility of the system itself to the consumer. Today's
researchers, information technology staff and commercial vendors are transforming grid technology in such a way that
what are presently exclusive high performance computing and data services, may one day be widely available via a
pervasive, daily (and perhaps somewhat mundane) utility.
It was barely a 100 years ago that the average citizen could only fantasize about fully wired houses (what did
"fully wired" mean a century ago?) with ubiquitous, "always on" electric power. It is perhaps not too fanciful to
imagine how academia, industry or even individuals might have utilitarian access in the future to what are today
expensive, complex high performance computing resources. Such a grid of computing and data services could have
widespread and socially valuable effects on the world. Given the rapidity with which grid technology is maturing and
being deployed, it is possible to imagine scenarios in which entire communities benefit from grid activities in both
ordinary and extraordinary circumstances.
The following scenario, set in 2012 in the southeastern United States, imagines how a ubiquitous
"grid of grids" (or "the Grid") would serve as part of the technical infrastructure supporting community health science
and services. In this scenario, entire user application communities are able to realize the benefits of the Grid
infrastructure. The Grid is envisioned as supporting multiple, general grid functions that include computation,
data management, collaboration services and knowledge discovery. In this scenario, these functions specifically support:
- Pre-hospital data analysis
- Bioinformatics
- Medical records data mining and
- Bio-medical simulations
News Release
September 12, 2012
Houston, Texas
Regional Grid Helps Heal Houston.
| 
|
The aftermath of last week's category-4 tropical storm Hale has disrupted local services and displaced
several hundred thousand citizens this week. While not reaching the devastation of 2005's
category 5 storm Katrina, the city and surrounding area are severely impacted
by wind, rain and flooding from the storm.
Luckily the Katrina aftermath is not being replayed, in part because core Grid
infrastructure allows vital services to continue seamlessly operating using other
compute and data nodes on the broad grid-based cyberinfrastructure
that now spans the southeastern United States. The regional
Grid cyberinfrastructure has a significant impact on the health care delivery
systems in this city today. Though power outages from Hale have shut down
many local computing facilities, the city's major hospitals
are only minimally affected since they can use the Grid to access computing capabilities from sites across the
southeast. Emergency first responders remain highly effective, receiving significant support from physicians in
other states. Using grid-based telemedicine technologies for remote assessment of critical vital signs, local
emergency medical teams work directly with remote physicians in determining medical triage decisions for the
best medical care. Meanwhile, the scheduling and coordination of our city's patient care, involving the complex
coordination of providers, equipment and facilities to match individual treatment requirements, uses a dynamic
priority-based scheduler over the Grid. Using artificial intelligence, the scheduler helps manage and prioritize
patient access to health care, expedites their treatment, and optimizes allocation of critical health care system
resources. The complex algorithms to determine patient care decisions automatically find and run on the best
available computing resources distributed across the southeast's regional grid,
ensuring that patient wait times are kept to a minimum.
Patient outcomes from Hale-related injuries are being vastly improved, benefiting from early patient evaluations
(pre-hospital data analysis) that medical first responders are able to upload directly to the grid from accident
scenes. These evaluations are providing immediate, expansive physiologic readings on large numbers of trauma
patients and helping ground-based medical first responders arrange air transports for the most critical patients.
At trauma centers, the predictive ability of patient data is much more clinically relevant through the use of grid
enabled data mining, neural networks, and decision tree analysis during the first 24 hours of admission. These
grid-based systems feed physiologic databases with more useful, and patient specific, outcome data than the
mere survival data typically used only a few years ago. Medical personnel are able to select the best treatment option.
Improved clinical outcomes, based on identifying predictive
input markers, are derived by running sophisticated algorithms against
the extensive medical health records data grid. Now a key part of the health
care system, medical records data mining is conducted on a rich set of records redundantly stored
across the secure grid infrastructure — so Houston's records remain available even though the local systems are temporarily off-line.
Using optical, point-to-point networks, these distributed medical records are accessible from highly secure
databases that have been deployed across the regional grid. Moreover, medical records data is the foundation
of an extensive and readily accessible knowledge base. For example, a large collection of radiological data is
available along with relevant patient history, clinical and histological information, for retrieval and comparative
interpretation using computer assisted diagnostic (CADx) systems and other visualization tools. Further, Houston's
medical records (with all person-specific information removed) are included with
other valuable health status demographics that are used by Problem Knowledge Coupler (PKC) systems. Such systems,
valuable as an alternative teaching tool for diagnostic skill development, also
are providing improved diagnostics for patients during the Hale aftermath. The PKC systems use grid-accessible medical data from thousands
of prior medical cases to suggest recommended procedures and to extrapolate best
practices Advanced bioinformatics and bio-medical simulation
components of the southeastern Grid are also providing further benefits for
Hale storm victims. In the first week after Hale, a rash began afflicting
many of our city's
residents. While initially confined to the Houston area, the illness soon spread
to the neighboring Gulf Coast. Rumors about the 11th anniversary of 9/11
attacks and possible release of toxins by terrorists started to
spread and threatened to complicate the area's storm relief efforts. Fortunately,
a local medical research facility with a bioinformatics program worked with
a team of biologists from other universities in the region
and the Centers for Disease Control and Prevention in Atlanta. The team used
dozens of the Grid's distributed
computational resources to search many genomic and proteomic databases in parallel
to identify the specific
agent causing the rash.
With the identification of a probable
agent, the teams are applying biomedical simulation techniques across many
Grid resources to analyze models of how the disease vectors propagate the
agent involved. The simulations
are using a cognitive reasoning system with an advanced conceptual modeling approach
for nuclear, biological and chemical (NBC) threat assessment, predictive
analysis, and decision-making. These models are showing medical
teams how to stem the agent's spread and, indeed, these same models are enabling
additional health care system
personnel to receive preventative training. While the storm's impact on Houston and the surrounding area is definitely being felt, the overall experience
has been significantly less difficult and traumatic due to the presence of a sophisticated grid across the southeast.
The grid brings the southeast's extensive computation, data, simulation and collaboration resources together
under a shared infrastructure that is serving emergency responders, medical teams
and distributed health care systems to provide effective, patient-specific care
that is so vital to minimizing long-term consequences to people and the region.
Of course, this is a hypothetical scenario, yet the future reality may quite likely be more surprising
than even as imagined above. Grid infrastructure is maturing and represents a significant sea change in
how computation, simulation, bioinformatics, collaboration and knowledge are supported. The ability to access
resources anywhere at anytime, with the ability to survive interruptions from local conditions, is an important
benefit offered by grids as part of a global cyberinfrastructure. Building that imagined infrastructure will
certainly depend on the contributions being made now in grid implementations and deployments.
Bibliography[1] Public Key Infrastructure
(http://tinyurl.com/39kx4a) [2] Community of interest
(http://en.wikipedia.org/wiki/Community_of_interest) [3] Geodise project
(http://www.geodise.org/) [4] Engineering and Physical Sciences Research Council
(http://www.epsrc.ac.uk/default.htm) [5] The Geodise Toolboxes, A User's Guide
(http://www.geodise.org/documentation/html/index.htm) [6] The Geodise Project: Making the Grid Usable Through Matlab
(http://www.gridtoday.com/grid/343938.html) [7] Grid Today
(http://www.gridtoday.com/gridtoday.html) [8] SURAgrid
(http://www.sura.org/programs/sura_grid.html) [9] Amazon Web Services
(http://tinyurl.com/2sbgmv) [10] [Amazon's] Solutions catalog
(http://solutions.amazonwebservices.com/connect/index.jspa) [11] [Amazon's] Elastic Compute Cloud
(http://www.amazon.com/gp/browse.html?node=201590011) [12] Infoworld
(http://www.infoworld.com/) [13] Amazon.com's rent-a-grid
(http://www.infoworld.com/article/06/08/30/36OPstrategic_1.html) [14] 3Tera
(http://www.3tera.com/index.html) [15] AppLogic grid system
(http://www.infoworld.com/4449) [16] International Virtual Data Grid Laboratory
(http://www.ivdgl.org/) [17] Compact Muon Solenoid (CMS)
(http://cms.cern.ch/) [18] Large Hadron Collider (LHS)
(http://public.web.cern.ch/Public/Content/Chapters/AboutCERN/CERNFuture/WhatLHC/WhatLHC-en.html) [19] CERN
(http://public.web.cern.ch/Public/Welcome.html) [20] U. S. CMS
(http://www.uscms.org/) [21] Fermi National Accelerator Laboratory
(http://www.fnal.gov/) [22] U. S. CMS Overview
(http://www.uscms.org/Public/overview.html) [23] A Toroidal LHC ApparatuS (ATLAS)
(http://atlas.web.cern.ch/Atlas/index.html) [24] U. S. ATLAS
(http://www.usatlas.bnl.gov/) [25] Brookhaven National Laboratory (BNL)
(http://www.bnl.gov/world/) [26] Sloan Digital Sky Survey (SDSS)
(http://www.sdss.org/) [27] SkyServer
(http://cas.sdss.org/dr5/en/) [28] SDSS Databases
(http://cas.sdss.org/dr5/en/sdss/data/data.asp#databases) [29] SDSS Data Release 5
(http://cas.sdss.org/dr5/en/sdss/release/) [30] DataTAG
(http://datatag.web.cern.ch/datatag/) [31] TeV in layman's terms
(http://public.web.cern.ch/Public/Content/Chapters/AboutCERN/CERNFuture/WhatLHC/WhatLHC-en.html) [32] EU-DataGrid Project
(http://web.datagrid.cnr.it/servlet/page?_pageid=1407&_dad=portal30&_schema=PORTAL30&_mode=3) [33] Enabling Grids for E-sciencE (EGEE)
(http://www.eu-egee.org/) [34] DataGrid Project Description
(http://web.datagrid.cnr.it/servlet/page?_pageid=873,879&_dad=portal30&_schema=PORTAL30&_mode=3) [35] OGSA-DAI
(http://www.ogsadai.org.uk/index.php) [36] LEAD
(http://www.lead.ou.edu/) [37] caGrid
(http://cabig.nci.nih.gov/) [38] AstroGrid
(http://www.astrogrid.org/) [39] BRIDGES
(http://www.brc.dcs.gla.ac.uk/projects/bridges/) [40] eDiaMoND
(http://www.ediamond.ox.ac.uk/) [41] GeneGrid
(http://www.qub.ac.uk/escience/projects/genegrid) [42] more OGSA-DAI grid projects
(http://www.ogsadai.org.uk/about/projects.php) [43] Computational Chemistry Grid
(https://www.gridchem.org) [44] cancer Biomedical Informatics Grid
(https://cabig.nci.nih.gov) [45] caBIG
(http:cabig.cancer.gov) [46] SETI@home
(http://setiathome.berkeley.edu/) [47] BOINC
(http://boinc.berkeley.edu/) [48] World Community Grid
(http://www.worldcommunitygrid.org/) [49] Condor
(http://www.cs.wisc.edu/condor/) [50] United Devices
(http://www.ud.com/) [51] Grid-MP ™
(http://www.ud.com/products/gridmp.php) [52] Enlightened Computing
(http://enlightenedcomputing.org) [53] Optiputer
(http://www.optiputer.net) [54] CANARIE*4
(http://www.canarie.ca/advnet) [55] CANARIE*4 customer-empowered networks
(http://www.canarie.ca/advnet/cen.html) [56] Foster, Ian, "The Grid: Computing without Bounds", Scientific American, April 2003.
[57] Teragrid
(http://www.teragrid.org)
|