Home |  Previous |  Next |  Print |  Contact

 What Grids Can Do For You

  
 Acknowledgments
 Preface
 Introduction
 History, Standards & Directions
 What Grids Can Do For You
 
 Payoffs and tradeoffs
 Examples of Evolving Grid-based Services and Environments
 A Future View of "the Grid"
 Bibliography
 Grid Case Studies
 Current Technology for Grids
 Programming Concepts & Challenges
 Joining a Grid: Procedures & Examples
 Typical Usage Examples
 Related Topics
 My Favorite Tips
 Glossary
 Appendices
 Use of This Material
 

What Grids Can Do For You


Payoffs and tradeoffs

The goal of grids is to enable and to simplify access to distributed resources. Based on the electric power grid as a model, a strong concept behind the development of grid technology is to provide a basic computational infrastructure that users could draw on for computation, visualization and data services. A person plugs in a toaster, radio or other appliance, without worrying about where the power is coming from or how it gets to them. In an ideal world, grid infrastructure would enable computational resources, data services, and even specialized instrumentation or sensors to be "plugged into" the grid, with user interfaces similarly "plugged in" to provide access without users needing to worry about many of the details as to where the devices, services or data reside. The challenge of grids is that the resources involved are distributed across a wide area, are administered and controlled by a variety of individuals and organizations, and adhere to a variety of usage policies and procedures. In addition, the performance characteristics and benefits will vary in that some grids are used to facilitate access to HPC resources (supercomputers), some bring together commodity computing capability, and all are dependent on the performance and reliability of the system-level, local, and also wide area network interconnects that tie them together.

In this chapter, we consider the cost-benefit analysis in terms of the effort required to coordinate the use of a heterogeneous set of resources that exist across administrative domains. That is, what makes such an extensive effort of coordination and software development (i.e., middleware) worth while? What are the tradeoffs that must be considered for an organization in the process of deciding whether or not to deploy or use resources on a grid? In this chapter, we will discuss some of the issues in general terms, with more detail further on in the cookbook.

Access to resources beyond those locally available

If a researcher were offered access to compute clusters, visualization engines, and a multitude of databases beyond what was locally available, most would be at least cautiously interested. Commonly anticipated advantages from an end-user perspective include:

  • Improved model resolution resulting from access to greater compute power
  • Increased size or number of calculations or applications that can be executed simultaneously
  • Access to specialized visualization resources, allowing the rendering of complex scientific results in forms more easily interpreted by researchers
  • Access to large amounts of preprocessed and well organized data across high speed networks and the ability to participate in and contribute to large, geographically dispersed research collaborations

Some difficulty arises, however, from the fact that resources on a grid are not often owned or controlled by a single administrative domain. This can affect the "cost" of computing — in terms of ready access, ease-of-use or even actual financial cost — beyond what may be initially obvious. Even so, grid computing arguably provides its greatest benefit when aggregating resources across project or organizations, enabling individuals within participating organizations to share resources and knowledge at unprecedented levels. There are a variety of regional, national and international-scale grid initiatives that provide shared access to specialized and general grid computing capabilities in support of the research and education mission. Later in this section we will provide several examples of existing grid initiatives providing a variety of services.

An alternate perspective on the inter-organizational sharing of resources comes from organizational management, who may ask "Why should I provide others with access to machines that came at my institution's cost and in response to specific needs and requests from my institution's users?" This question comes up time and again as institutions — or even departments within an institution — contemplate adding significant resources to a grid that is beyond their local domain. Accumulating resources locally may initially seem to be the most effective approach to meeting local needs, however, the drive for increased capability and diversity within a growing community can rapidly outpace local budget and resources for system acquisition and maintenance. Sharing resources through an inter-organizational grid can be a more cost-effective way to meet ranging and evolving local needs while increasing the capabilities available to the community at large. In addition, sharing resources with other organizations can provide users with access to a multiplicity of compute architectures and other types of resources not locally available, and, as importantly, to a larger community of potential collaborators and relationships for both technological and scientific advancement.

A notable challenge in the sharing of resources across institutions is determining the identity of users from different organizations so that local as well as grid-wide access and authorization policies can be applied. The successful coordination of authentication and authorization mechanisms with identity management technologies is key. For instance, Globus leverages Public Key Infrastructure (PKI) [1] as a basis for its management of access to grid resources. PKI offers a framework for organizations to share and trust assertions of identity through the exchange of digital certificates supported by public and private digital keys. If one's organization already utilizes PKI for identity management and is joining a grid that is Globus-based, integration at this level is fairly straightforward. If not, processes and technologies for mapping or converting organizational identities into appropriate PKI-based credentials need to be established. While this may not be complex in all situations, an organization must have sufficient IT resources and expertise to evaluate possible solutions, and, ideally, integration and cooperation with those who manage and administer the organization's existing identity management system(s).

Performance and speedup

Computational resources, and specifically high performance systems or clusters, are often the first type of resource one thinks of at the mention of a grid. High performance, high-end, "super" computing has been around for a long time. It can be difficult for an organization to engage its diverse audience in an effort to construct HPC infrastructure at a campus. It is easier to engage in these discussions in the context of establishing a grid, especially since the grid offers the potential of making compute resources available to a larger community as well as augmenting the resources available at its member institutions.

The tradeoff here is that the grid doesn't always provide a complete solution. Cross platform schedulers, accounting, message passing paradigms, and so forth are required. Ongoing work in both standards and product development is attempting to bridge these gaps and much of the detail can now be hidden from the user through th euse of web services and interfaces. Joining a grid and accessing it through web services will be covered in significant detail later.

Collaboration

As noted earlier, groups within an individual institution may be too small to justify or fund the type of resources they need and, in fact, they may only need those resources from time to time. As sponsoring agencies began to fund broader collaborations, the idea of "communities" evolved. Communities generally come in a number of categories such as "interest", "practice", "purpose" and so forth. (See Wikipedia "Community of interest" [2] for more explanation.) In our case, the people in these communities share interest, practice, purpose [and so forth] in a particular field of science or engineering. Grids help these communities build and share resources as well. The payoffs are in sharing knowledge, building expertise together (in both their shared area as well as in grid use), and enabling the community to build better cases together for more resources. The tradeoff is the increased complexity and management that grid use brings in order to use those resources. In this cookbook we will attempt to bridge the gaps and smooth out some of the complexity in the most simple terms possible.

Alignment with National Vision for 21st Century Discovery

In the National Science Foundation's recent report, "Cyberinfrastructure Vision for 21st Century Discovery", the term cyberinfrastructure is defined as, "... computing systems, data, information resources, networking, digitally enabled-sensors, instruments, virtual organizations, and observatories." From Arden Bement's introduction to this report:

"At the heart of the cyberinfrastructure vision is the development of a cultural community that supports peer-to-peer collaboration and new modes of education based upon broad and open access to leadership computing; data and information resources; online instruments and observatories; and visualization and collaboration services. Cyberinfrastructure enables distributed knowledge communities that collaborate and communicate across disciplines, distances and cultures. These research and education communities extend beyond traditional brick-and-mortar facilities, becoming virtual organizations that transcend geographic and institutional boundaries."

Clearly grid computing will have a central role in the development of the cyberinfrastructure capabilities envisioned by the NSF. Understanding the basics of grid computing and working with collaborative teams of scientists and computing professionals to use and help develop grid computing tools and techniques will be an increasingly important component of a successful agency funding strategy.


Examples of Evolving Grid-based Services and Environments


Aggregating computational resources

Aggregating computational resources

A grid layer can make otherwise separate, distributed and different computational hardware appear as a single, common resource to which the user can submit jobs in a standard way. For instance, users may submit a genome alignment application via a grid portal and the job will run on any of several clusters, whether those clusters are at one university or another, or whether the operating systems are different versions.

Several examples of projects that are developing frameworks and toolkits for aggregating resources include:

  • TeraGrid — From the TeraGrid website [57]: "TeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource. Using high-performance network connections, the TeraGrid integrates high-performance computers, data resources and tools, and high-end experimental facilities around the country. Currently, TeraGrid resources include more than 250 teraflops of computing capability and more than 30 petabytes of online and archival data storage, with rapid access and retrieval over high-performance networks. Researchers can also access more than 100 discipline-specific databases. With this combination of resources, the TeraGrid is the world's largest, most comprehensive distributed cyberinfrastructure for open scientific research."

    TeraGrid is coordinated through the Grid Infrastructure Group (GIG) at the University of Chicago, working in partnership with the Resource Provider sites: Indiana University, Oak Ridge National Laboratory, National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, Purdue University, San Diego Supercomputer Center, Texas Advanced Computing Center, University of Chicago/Argonne National Laboratory, and the National Center for Atmospheric Research.

  • SURAgrid — From the SURAgrid website [8], "SURAgrid is a consortium of organizations collaborating and combining resources to help bring grid technology to the level of seamless, shared infrastructure. The vision for SURAgrid is to orchestrate access to a rich set of distributed capabilities in order to meet diverse users' needs. Capabilities to be cultivated include locally contributed resources, project-specific tools and environments, highly specialized or HPC access, and gateways to national and international cyberinfrastructure. SURAgrid resources currently include over 10 teraflops of pooled computing resources, accessed through a common SURAgrid portal using a common authentication and authorization mechanism, the SURAgrid Bridge Certificate Authority."

  • Geodise — The Geodise project [3], aimed initially at Computational Fluid Dynamics (CFD) applications, has the mission "To bring together and further the technologies of Design Optimisation, CFD, GRID computation, Knowledge Management & Ontology in a demonstration of solutions to a challenging industrial problem". Funded by the Engineering and Physical Sciences Research Council (EPSRC) [4] in the United Kingdom (UK), Geodise involves multidisciplinary teams working on a state of the art design tool demonstrator. Intelligent design tools will steer the user through set up, execution, post-processing, and optimization activities. These tools are physically distributed, under the control of multiple elements, to improve design processes that can require assimilation of terabytes of distributed data.

  • Elastic Compute Cloud — Brush up that Amazon account! They aren't just about books and CDs anymore.

    Amazon Web Services [9] now provides application and service developers with direct access to Amazon's technology platform. From their website, "Build on Amazon's suite of web services to enable and enhance your applications. We innovate for you, so that you can innovate for your customers." Their Solutions catalog [10] shows services such as E-Commerce, Simple Storage, and so forth. Their Elastic Compute Cloud [11] (Amazon EC2) service is "a web service that provides resizable compute capacity in the cloud. It is designed to make web-scale computing easier for developers." Known also as utility computing by other service providers, Amazon EC2 presents a virtual computing environment that allows you to use web service interfaces to requisition machines for use, load them with your custom application environment, manage your network's access permissions, and run your image using as many or few systems as you desire. Pricing is per instance-hour consumed, per GB of storage transferred to/from Amazon, and per GB-month of Amazon S3 (Simple Storage Solution) used.

    InfoWorld's [12] article Amazon.com's rent-a-grid [13] provides an interesting and compact summary of the service. To quote them, "As the service's name suggests, though, if you need an elastic capability that can nimbly grow or shrink, EC2 is the only game in town." The author quickly points out that 3Tera [14] is coming out with their AppLogic grid system [15] soon though.


Improved access for data-intensive applications

In an ideal world, a grid user may start up a data-intensive application and the grid will assemble the data streams combining data from multiple, distributed sources, so that the user experiences fast responses and sees the data as a logical whole. Several service components are needed to realize that vision, including data discovery, storage, possibly replication and version control, and reliable data transfer.While still developing towards the ideal, current data grids can manage access to data that may have been collected and stored at different locations, and provide controlled, secure access for communities as well as individuals. A grid workflow can be developed to manage data integration transparently for the user, or handle data access such that an application can process the data with improved throughput.

Applications in fields such as high energy physics (HEP), life sciences, and climate and weather modeling not only use but also generate massive amounts of data. These compute intensive applications can realize great benefit from access to an expanded pool of computational and data storage and management resources brought together using grid technology. In this section we will concentrate on the data side of that puzzle.

  • The International Virtual Data Grid Laboratory (iVDgL) [16] was a global data grid that served forefront experiments in physics and astrophysics. Its resources were comprised of heterogeneous computing and storage. Networking resources spanned the U.S., Europe, Asia and South America, thus providing a unique laboratory that tested and validated Grid technologies at international and global scales. The iVDgL was operated as a single system for the purposes of interdisciplinary experimentation in Grid-enabled, data-intensive scientific computing. Its goal was to drive the development, and transition to every day production use, of Petabyte-scale virtual data applications.

    Applications that made use of the iVDgL include:

    • Compact Muon Solenoid (CMS) [17] — an experiment at the Large Hadron Collider (LHC) [18] at CERN [19] in Geneva Switzerland. U.S. CMS [20] is a collaboration of U.S. scientists participating in CMS. This collaboration includes scientists at universities and Fermi National Accelerator Laboratory (FNAL) [21]. As their website states "The CMS experiment is designed to study the collisions of protons at a center of mass energy of 14 TeV. The physics program includes the study of electroweak symmetry breaking, investigating the properties of the top quark, a search for new heavy gauge bosons, probing quark and lepton substructure, looking for supersymmetry and exploring other new phenomena." [U.S. CMS Overview [22]]
    • A Toroidal LHC ApparatuS (ATLAS) [23] — another experiment at the LHC, ATLAS is also designed to detect particles created by the proton-proton collisions, " the main goal for ATLAS is to look for a particle dubbed Higgs, which may be the source of mass for all matter. Findings may also offer insight into new physics theories as well as a better understanding of the origin of the universe." [U. S. ATLAS] [24]. U.S. Atlas includes scientists at universities and Brookhaven National Laboratory (BNL) [25].
    • The Sloan Digital Sky Survey (SDSS) [26] — when completed, SDSS will provide detailed optical images covering more than a quarter of the sky, and a 3-dimensional map of about a million galaxies and quasars. The SDSS is managed by the Astrophysical Research Consortium for its participating institutions, including universities, museums, and laboratories. The SDSS data server, SkyServer [27], holds two primary databases: BESTDR1 and TARGDR1. An identical schema is used for both, but BESTDR1 has been processed with the "best available software" for handling noise and is therefore somewhat bigger. Combined the databases take over 800 GB of storage which is over 3.4 billion rows (records) [28]. SDSS is now up to Data Release 5 [29].

    iVDgL sites in Europe and the U.S. were linked by a multi-gigabit per second transatlantic link funded by the European DataTAG project [30].


    Figure WGD-3. iVDgL Project map.

    (Interesting fact discovered while drafting this summary: "A TeV is a unit of energy used in particle physics. 1 TeV is about the energy of motion of a flying mosquito. What makes the LHC so extraordinary is that it squeezes energy into a space about a million million times smaller than a mosquito." [31])


  • The EU-DataGrid Project [32], funded by the European Union, had as its purpose " to build the next generation computing infrastructure providing intensive computation and analysis of shared large-scale databases, from hundreds of TeraBytes to PetaBytes, across widely distributed scientific communities." A collaboration of about twenty European research institutes, DataGrid fulfilled its objectives in March of 2004 and moved on to become the EGEE (Enabling Grids for E-sciencE) [33].

    The DataGrid project focused on three application areas:

    • High Energy Physics — As has iVDgL, DataGrid set the stage for handling the huge amounts of data produced by the LHC. A multi-tiered, hierarchical computing model has been adopted to share data and computing efforts among multiple institutions. The Tier-0 center is located at CERN and is linked by high speed networks to approximately ten major Tier-1 data processing centers. These fan out the data to a large number of smaller centers known as Tier-2s.
    • Biology and Medical Image Processing — The DataGrid project's biology testbed provided the platform for new algorithms on data mining, databases, code management, graphical interface tools and facilitated sharing of genomic and medical imaging databases for the benefit of international cooperation and health care.
    • Earth Observations — The European Space Agency missions involve the download, from space to ground, of about 100 Gigabytes of raw images per day. Dedicated ground infrastructures have been set up to handle the data produced by instruments onboard the satellites.
      DataGrid demonstrated an improved way to access and process large volumes of data stored in distributed European-wide archives.
  • See the DataGrid Project Description [34] for more information.

  • Looking at it from another perspective, projects like OGSA-DAI [35] develop middleware to assist with access and integration of data from separate sources via the grid. Directly from their website, "OGSA-DAI is motivated by the need to:
    • Allow different types of data resources — including relational, XML and files — to be exposed onto Grids.
    • Provide a way of querying, updating, transforming and delivering data via web services.
    • Provide access to data in a consistent, data resource-independent way.
    • Allow metadata about data, and the data resources in which this data is stored, to be accessed.
    • Support the integration of data from various data resources.
    • Provide web services that can be combined to provide higher-level web services that support data federation and distributed query processing.
    • To contribute to a future in which scientists move away from technical issues such as handling data location, data structure, data transfer and integration and instead focus on application-specific data analysis and processing."

    Many grid projects are using OGSA-DAI including

    • LEAD [36] — Linked Environments for Atmospheric Discovery
    • caGrid [37] — the Cancer Biomedical Informatics Grid
    • AstroGrid [38] — a project to build an infrastructure for the Virtual Observatory (VObs)
    • BRIDGES [39] — Biomedical Research Informatics Delivered by Grid Enabled Services
    • eDiaMoND [40] — a Grid for X-Ray Mammography
    • GeneGrid [41] — exploiting existing micro array and sequencing technologies and the large volumes of data generated through screening services. to develop specialist tissue specific datasets relevant to the particular type of disease being studied
    • and more [42].


Federation of shared resources toward global services

A particularly important aspect of the grid is that of support for "virtual organizations," or VOs. When the high-energy physics community began collaborating on large-scale physics problems, researchers from many different and widely separated organizations needed to work together. The problem domain was so vast that researchers at any one site needed the expertise from researchers at other sites in order to make progress. A project might represent dozens, hundreds or thousands of scientists collaborating together. The concept of the "virtual organization" recognized that such project groups would convene from various organizations and need to work together as if they were, in fact, from a single organization. In fact, VOs may be very dynamic and ad hoc, coming together for very specific purposes, working together for fixed time periods, adding and losing members over time.

Grid middleware can support sharing of resources using a federated approach, where participating organizations retain control over their local resources and services but also share these resources in a way that becomes globally scalable. For example, an institution would authenticate users locally for access to institutionally-controlled resources but leverage grid security infrastructure to enable those same users to access external grid resources. Additionally, users that are identified as members of a particular project, or VO, could be authorized to use resources in a way that has been pre-approved for members of that group.

  • Funded by the National Science Foundation, the Computational Chemistry Grid [43], (CCG) has developed a java client to facilitate access to a controlled set of applications, HPC and storage resources for use by the computational chemistry community. Project partners include the Center for Computational Sciences at the University of Kentucky, the Center for Computation & Technology at Louisiana State University, the National Center for Supercomputing Applications (NCSA), Texas Advanced Computing Center (UT Austin) and the Ohio Supercomputer Center. From their Web site: "The 'Computational Chemistry Grid' (CCG) is a virtual organization that provides access to high performance computing resources for computational chemistry with distributed support and services, intuitive interfaces and measurable quality of service." Access is granted through an approval process, with allocations "available to US academic and government research staff and to non-US academic researchers." Three types of project allocations are available: research, community research and instructional. Research allocations are intended to support large, often multi-year scientific research projects. Community allocations are shorter term and intended to be used towards development of a larger research effort. Instructional allocations can be used to support academic instruction in the field.

  • The cancer Biomedical Informatics Grid [44], (caBig) is a virtual organization of "over 800 people from approximately 50 NCI-designated Cancer Centers and other organizations" in a "voluntary network or grid...to enable the sharing of data and tools, creating a World Wide Web of cancer research." Development of the project is taking place under the leadership of the National Center Institute's Center for Bioinformatics and has the primary goal of "[speeding] the delivery of innovative approaches for the prevention and treatment of cancer". However, the concepts and technologies involved are also being developed with an eye towards reuse and adaptability outside of the cancer research community. Releases of software and components are publicly available on the project's community web site. A separate informational web site is available for those who are not intending to use services or tools but who are interested in knowing more about the initiative: http:cabig.cancer.gov [45].

  • The Open Science Grid (OSG) [46], is an outgrowth of three notable physics projects — the DOE-funded Particle Physics Data Grid (www.ppdg.net), and the NSF-funded Grid Physics Network (GriPhyN, www.griphyn.org) and the International Virtual Data Grid Laboratory (iVDGL, www.ivdgl.org). Collaborators leading and within these projects became interested in the benefits of grid technology for disciplines beyond physics and began to develop their grid middleware and related services with an eye towards broader use. Today, the concept of a "virtual organization" is central to the conceptual as well as operational functioning of OSG, and there are well over two-dozen VOs participating, representing a variety of scientific fields. Organizations that contribute resources to OSG retain control of those resources but enable use by project groups through access management tools that have been designed around the VO concept. From their Web site: "A Virtual Organization (VO) is a collection of people (VO members), computing/storage resources (sites) and services (e.g., databases). In OSG, we typically use the term VO to refer to the collection of people, and the terms Site, Computing Element (CE), and/or Storage Element (SE) to refer to the resources owned and operated by a VO." As an organization itself, OSG is also focused on establishing interoperability with other grids, such as Teragrid, international, regional and campus grids.


Harnessing unused cycles

Grids can enable an organization to capture the incredible amount of computing that exists in idle PCs and workstations. Users can use grid services to submit applications as if to a single resource — the grid manages submission to various computers, monitoring of status, and collection of the results.

Various tools, both open source and proprietary, exist to help an organization with this sort of grid-enabled service.

  • Probably the most famous application is the cycle sharing application SETI@home [46]. SETI@home was proposed in 1995 and launched in 1999. As their website states "SETI (Search for Extraterrestrial Intelligence) is a scientific area whose goal is to detect intelligent life outside Earth. One approach, known as radio SETI, uses radio telescopes to listen for narrow-bandwidth radio signals from space. Such signals are not known to occur naturally, so a detection would provide evidence of extraterrestrial technology." SETI@home has developed a large community around their project and they include various statistics about their participants on their website.

    Today SETI@home uses software called BOINC [47]. BOINC has the expanded mission to use the idle time on your computer (Windows, Mac, or Linux) to cure diseases, study global warming, discover pulsars, and do many other types of scientific research. You can use the BOINC software to create your own project. Worldwide projects, such as the World Community Grid [48], use BOINC. As their mission states "World Community Grid's mission is to create the world's largest public computing grid to tackle projects that benefit humanity. Our work has developed the technical infrastructure that serves as the grid's foundation for scientific research. Our success depends upon individuals collectively contributing their unused computer time to change the world for the better. World Community Grid is making technology available only to public and not-for-profit organizations to use in humanitarian research that might otherwise not be completed due to the high cost of the computer infrastructure required in the absence of a public grid. As part of our commitment to advancing human welfare, all results will be in the public domain and made public to the global research community."

  • Another well-known project is University of Wisconsin-Madison's Condor [49]. Condor is often used to manage clusters of dedicated processors, but it also has unique mechanisms that enable effective harnessing of wasted CPU power from otherwise idle desktop workstations.

    BOINC and Condor take very different approaching to the access and management of unused cycles. BOINC functions by enabling thousands or even millions of users to trust a small set of programs to run on their computer, typically leveraging the aggregate compute capacity towards the resolution of an overarching problem or inquiry. Condor harnesses unused cycles to run unspecified applications. This requires a deeper level of trust and so is likely to involve a smaller set of trusted computers. The benefit is the potential to run a much greater variety of applications, which significantly increases the utility of Condor as a high-throughput computing system.

    Condor can

    • be configured to identify idle machines under various criteria
    • checkpoint and migrate jobs when those machines are no longer available
    • work in shared or non-shared file environment (that is, it can migrate files or retrieve from source as needed)

    Condor also provides the job queueing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. So Condor provides seamless access to a combination of distributed computers.

  • United Devices [50] offers a number of commercial HPC products. Relevant to the discussion is Grid-MP ™ [51] which is an infrastructure solution for implementing and managing complex enterprise grids. GRID MP deployments can be single cluster management implementations to large-scale multi-resource grids. Per United Devices, the GRID-MP system has scaled to hundreds of thousands of CPUs and hundreds of thousands of jobs and can scale to over thousands of users.

    Grid MP was built from the ground up to have a comprehensive security architecture that includes transparent data encryption, secure authentication, digital signatures and tamper detection. A framework for rapid application integration is also included, based on open web services and standards. The interface provides controlled access to all aspects of the grid system. The system is designed for self-management via a web-based console, allowing administrative access from anywhere. Grid MP devices and users can be grouped with maximum flexibility. An administrator can set up priority allocation and provisioning policies.


High-speed optical networking, network-aware applications

As noted in the "Networks, switches and interconnects for grids" section of this Cookbook, "...networks are the virtual bus for the virtual grid computer and are central to the efficient, effective operation of grids." As grids evolve, they are beginning to use high bandwidth optical networks to interconnect grid nodes, increasing the speed and efficiency possible between input/output, CPUs, storage and other elements of the computational process. We are also seeing the advent of "smart" applications — those that are able to actively (or even proactively!) evaluate network conditions and react with dynamic adjustments to insure successful operation. Both of these trends can improve performance and thru-put as perceived by the users of grid applications today, however, they also hold great promise for the future. Some people feel that, to truly realize the potential of grid technology, applications, middleware and network services must interact much more frequently, intelligently and seamlessly than they do today, to produce an adaptive capability much more akin to using a single computer than distributing a problem across multiple systems. Several concepts mentioned in the "Networks, switches and interconnects for grid" section (virtual and dynamic circuits, advanced monitoring, end-to-end performance, QoS) form a foundation for further development in this area. In addition to the several project examples provided in the "Networks..." section, the following projects are exploring innovations relevant to the advancement of grid technology:

  • The focus of the Enlightened Computing [52] (Highly-dynamic Applications Driving Adaptive Grid Resources) project is "...on developing dynamic, adaptive, coordinated and optimized use of networks connecting geographically distributed high-end computing resources and scientific instrumentation. A critical feedback-loop consists of resource monitoring for discovery, performance, and SLA compliance, and feed back to co-schedulers for coordinated adaptive resource allocation and coscheduling... For this project we have assembled a global alliance of partners to develop, test, and disseminate advanced software and underlying technologies which provide generic applications with the ability to be aware of their network, Grid environment and capabilities, and to make dynamic, adaptive and optimized use of networks connecting various high end resources. We will develop advanced software and Grid middleware to provide the vertical integration starting from the application down to the optical control plane."

  • From the Optiputer [53] website: "The OptIPuter, so named for its use of Optical networking, Internet Protocol, computer storage, processing and visualization technologies, is an envisioned infrastructure that will tightly couple computational resources over parallel optical networks using the IP communication mechanism. The OptIPuter exploits a new world in which the central architectural element is optical networking, not computers — creating "supernetworks". This paradigm shift requires large-scale applications-driven, system experiments and a broad multidisciplinary team to understand and develop innovative solutions for a "LambdaGrid" world. The goal of this new architecture is to enable scientists who are generating terabytes and petabytes of data to interactively visualize, analyze, and correlate their data from multiple storage sites connected to optical networks."

  • From the CANARIE*4 [54] website (and the concept of customer-empowered networks [55]), "CA*net 4 will, as did its predecessor CA*net 3, interconnect the provincial research networks [of Canada], and through them universities, research centers, government research laboratories, schools, and other eligible sites, both with each other and with international peer networks. Through a series of point-to-point optical wavelengths, most of which are provisioned at OC-192 (10 Gbps) speeds, CA*net 4 will yield a total initial network capacity of between four and eight times that of CA*net 3...CA*net 4 will embody the concept of a "customer-empowered network" which will place dynamic allocation of network resources in the hands of end users and permit a much greater ability for users to innovate in the development of network-based applications. These applications, based upon the increasing use of computers and networks as the platform for research in many fields, are essential for the national and international collaboration, data access and analysis, distributed computing, and remote control of instrumentation required by researchers."


A Future View of "the Grid"

In an article in Scientific American [56], Ian Foster describes just how ubiquitous and transparent grids might be in the future. "By linking digital processors, storage systems and software on a global scale, grid technology is poised to transform computing from an individual and corporate activity into a general utility" — a utility similar to water distribution and electrical power systems in both its value and the invisibility of the system itself to the consumer. Today's researchers, information technology staff and commercial vendors are transforming grid technology in such a way that what are presently exclusive high performance computing and data services, may one day be widely available via a pervasive, daily (and perhaps somewhat mundane) utility.

It was barely a 100 years ago that the average citizen could only fantasize about fully wired houses (what did "fully wired" mean a century ago?) with ubiquitous, "always on" electric power. It is perhaps not too fanciful to imagine how academia, industry or even individuals might have utilitarian access in the future to what are today expensive, complex high performance computing resources. Such a grid of computing and data services could have widespread and socially valuable effects on the world. Given the rapidity with which grid technology is maturing and being deployed, it is possible to imagine scenarios in which entire communities benefit from grid activities in both ordinary and extraordinary circumstances.

The following scenario, set in 2012 in the southeastern United States, imagines how a ubiquitous "grid of grids" (or "the Grid") would serve as part of the technical infrastructure supporting community health science and services. In this scenario, entire user application communities are able to realize the benefits of the Grid infrastructure. The Grid is envisioned as supporting multiple, general grid functions that include computation, data management, collaboration services and knowledge discovery. In this scenario, these functions specifically support:

  • Pre-hospital data analysis
  • Bioinformatics
  • Medical records data mining and
  • Bio-medical simulations
News Release
September 12, 2012
Houston, Texas
Regional Grid Helps Heal Houston.

The aftermath of last week's category-4 tropical storm Hale has disrupted local services and displaced several hundred thousand citizens this week. While not reaching the devastation of 2005's category 5 storm Katrina, the city and surrounding area are severely impacted by wind, rain and flooding from the storm. Luckily the Katrina aftermath is not being replayed, in part because core Grid infrastructure allows vital services to continue seamlessly operating using other compute and data nodes on the broad grid-based cyberinfrastructure that now spans the southeastern United States.

The regional Grid cyberinfrastructure has a significant impact on the health care delivery systems in this city today. Though power outages from Hale have shut down many local computing facilities, the city's major hospitals are only minimally affected since they can use the Grid to access computing capabilities from sites across the southeast. Emergency first responders remain highly effective, receiving significant support from physicians in other states. Using grid-based telemedicine technologies for remote assessment of critical vital signs, local emergency medical teams work directly with remote physicians in determining medical triage decisions for the best medical care. Meanwhile, the scheduling and coordination of our city's patient care, involving the complex coordination of providers, equipment and facilities to match individual treatment requirements, uses a dynamic priority-based scheduler over the Grid. Using artificial intelligence, the scheduler helps manage and prioritize patient access to health care, expedites their treatment, and optimizes allocation of critical health care system resources. The complex algorithms to determine patient care decisions automatically find and run on the best available computing resources distributed across the southeast's regional grid, ensuring that patient wait times are kept to a minimum.

Patient outcomes from Hale-related injuries are being vastly improved, benefiting from early patient evaluations (pre-hospital data analysis) that medical first responders are able to upload directly to the grid from accident scenes. These evaluations are providing immediate, expansive physiologic readings on large numbers of trauma patients and helping ground-based medical first responders arrange air transports for the most critical patients. At trauma centers, the predictive ability of patient data is much more clinically relevant through the use of grid enabled data mining, neural networks, and decision tree analysis during the first 24 hours of admission. These grid-based systems feed physiologic databases with more useful, and patient specific, outcome data than the mere survival data typically used only a few years ago. Medical personnel are able to select the best treatment option.

Improved clinical outcomes, based on identifying predictive input markers, are derived by running sophisticated algorithms against the extensive medical health records data grid. Now a key part of the health care system, medical records data mining is conducted on a rich set of records redundantly stored across the secure grid infrastructure — so Houston's records remain available even though the local systems are temporarily off-line. Using optical, point-to-point networks, these distributed medical records are accessible from highly secure databases that have been deployed across the regional grid. Moreover, medical records data is the foundation of an extensive and readily accessible knowledge base. For example, a large collection of radiological data is available along with relevant patient history, clinical and histological information, for retrieval and comparative interpretation using computer assisted diagnostic (CADx) systems and other visualization tools. Further, Houston's medical records (with all person-specific information removed) are included with other valuable health status demographics that are used by Problem Knowledge Coupler (PKC) systems. Such systems, valuable as an alternative teaching tool for diagnostic skill development, also are providing improved diagnostics for patients during the Hale aftermath. The PKC systems use grid-accessible medical data from thousands of prior medical cases to suggest recommended procedures and to extrapolate best practices

Advanced bioinformatics and bio-medical simulation components of the southeastern Grid are also providing further benefits for Hale storm victims. In the first week after Hale, a rash began afflicting many of our city's residents. While initially confined to the Houston area, the illness soon spread to the neighboring Gulf Coast. Rumors about the 11th anniversary of 9/11 attacks and possible release of toxins by terrorists started to spread and threatened to complicate the area's storm relief efforts. Fortunately, a local medical research facility with a bioinformatics program worked with a team of biologists from other universities in the region and the Centers for Disease Control and Prevention in Atlanta. The team used dozens of the Grid's distributed computational resources to search many genomic and proteomic databases in parallel to identify the specific agent causing the rash.

With the identification of a probable agent, the teams are applying biomedical simulation techniques across many Grid resources to analyze models of how the disease vectors propagate the agent involved. The simulations are using a cognitive reasoning system with an advanced conceptual modeling approach for nuclear, biological and chemical (NBC) threat assessment, predictive analysis, and decision-making. These models are showing medical teams how to stem the agent's spread and, indeed, these same models are enabling additional health care system personnel to receive preventative training.

While the storm's impact on Houston and the surrounding area is definitely being felt, the overall experience has been significantly less difficult and traumatic due to the presence of a sophisticated grid across the southeast. The grid brings the southeast's extensive computation, data, simulation and collaboration resources together under a shared infrastructure that is serving emergency responders, medical teams and distributed health care systems to provide effective, patient-specific care that is so vital to minimizing long-term consequences to people and the region.

Of course, this is a hypothetical scenario, yet the future reality may quite likely be more surprising than even as imagined above. Grid infrastructure is maturing and represents a significant sea change in how computation, simulation, bioinformatics, collaboration and knowledge are supported. The ability to access resources anywhere at anytime, with the ability to survive interruptions from local conditions, is an important benefit offered by grids as part of a global cyberinfrastructure. Building that imagined infrastructure will certainly depend on the contributions being made now in grid implementations and deployments.


Bibliography

[1] Public Key Infrastructure (http://tinyurl.com/39kx4a)
[2] Community of interest (http://en.wikipedia.org/wiki/Community_of_interest)
[3] Geodise project (http://www.geodise.org/)
[4] Engineering and Physical Sciences Research Council (http://www.epsrc.ac.uk/default.htm)
[5] The Geodise Toolboxes, A User's Guide (http://www.geodise.org/documentation/html/index.htm)
[6] The Geodise Project: Making the Grid Usable Through Matlab (http://www.gridtoday.com/grid/343938.html)
[7] Grid Today (http://www.gridtoday.com/gridtoday.html)
[8] SURAgrid (http://www.sura.org/programs/sura_grid.html)
[9] Amazon Web Services (http://tinyurl.com/2sbgmv)
[10] [Amazon's] Solutions catalog (http://solutions.amazonwebservices.com/connect/index.jspa)
[11] [Amazon's] Elastic Compute Cloud (http://www.amazon.com/gp/browse.html?node=201590011)
[12] Infoworld (http://www.infoworld.com/)
[13] Amazon.com's rent-a-grid (http://www.infoworld.com/article/06/08/30/36OPstrategic_1.html)
[14] 3Tera (http://www.3tera.com/index.html)
[15] AppLogic grid system (http://www.infoworld.com/4449)
[16] International Virtual Data Grid Laboratory (http://www.ivdgl.org/)
[17] Compact Muon Solenoid (CMS) (http://cms.cern.ch/)
[18] Large Hadron Collider (LHS) (http://public.web.cern.ch/Public/Content/Chapters/AboutCERN/CERNFuture/WhatLHC/WhatLHC-en.html)
[19] CERN (http://public.web.cern.ch/Public/Welcome.html)
[20] U. S. CMS (http://www.uscms.org/)
[21] Fermi National Accelerator Laboratory (http://www.fnal.gov/)
[22] U. S. CMS Overview (http://www.uscms.org/Public/overview.html)
[23] A Toroidal LHC ApparatuS (ATLAS) (http://atlas.web.cern.ch/Atlas/index.html)
[24] U. S. ATLAS (http://www.usatlas.bnl.gov/)
[25] Brookhaven National Laboratory (BNL) (http://www.bnl.gov/world/)
[26] Sloan Digital Sky Survey (SDSS) (http://www.sdss.org/)
[27] SkyServer (http://cas.sdss.org/dr5/en/)
[28] SDSS Databases (http://cas.sdss.org/dr5/en/sdss/data/data.asp#databases)
[29] SDSS Data Release 5 (http://cas.sdss.org/dr5/en/sdss/release/)
[30] DataTAG (http://datatag.web.cern.ch/datatag/)
[31] TeV in layman's terms (http://public.web.cern.ch/Public/Content/Chapters/AboutCERN/CERNFuture/WhatLHC/WhatLHC-en.html)
[32] EU-DataGrid Project (http://web.datagrid.cnr.it/servlet/page?_pageid=1407&_dad=portal30&_schema=PORTAL30&_mode=3)
[33] Enabling Grids for E-sciencE (EGEE) (http://www.eu-egee.org/)
[34] DataGrid Project Description (http://web.datagrid.cnr.it/servlet/page?_pageid=873,879&_dad=portal30&_schema=PORTAL30&_mode=3)
[35] OGSA-DAI (http://www.ogsadai.org.uk/index.php)
[36] LEAD (http://www.lead.ou.edu/)
[37] caGrid (http://cabig.nci.nih.gov/)
[38] AstroGrid (http://www.astrogrid.org/)
[39] BRIDGES (http://www.brc.dcs.gla.ac.uk/projects/bridges/)
[40] eDiaMoND (http://www.ediamond.ox.ac.uk/)
[41] GeneGrid (http://www.qub.ac.uk/escience/projects/genegrid)
[42] more OGSA-DAI grid projects (http://www.ogsadai.org.uk/about/projects.php)
[43] Computational Chemistry Grid (https://www.gridchem.org)
[44] cancer Biomedical Informatics Grid (https://cabig.nci.nih.gov)
[45] caBIG (http:cabig.cancer.gov)
[46] SETI@home (http://setiathome.berkeley.edu/)
[47] BOINC (http://boinc.berkeley.edu/)
[48] World Community Grid (http://www.worldcommunitygrid.org/)
[49] Condor (http://www.cs.wisc.edu/condor/)
[50] United Devices (http://www.ud.com/)
[51] Grid-MP ™ (http://www.ud.com/products/gridmp.php)
[52] Enlightened Computing (http://enlightenedcomputing.org)
[53] Optiputer (http://www.optiputer.net)
[54] CANARIE*4 (http://www.canarie.ca/advnet)
[55] CANARIE*4 customer-empowered networks (http://www.canarie.ca/advnet/cen.html)
[56] Foster, Ian, "The Grid: Computing without Bounds", Scientific American, April 2003.
[57] Teragrid (http://www.teragrid.org)

© 2006-8, Southeastern Universities Research Association
Sponsored by SURA, TATRC (No. W81XWH-06-1-0419), OSG, and iVDGL
Updated September, 2007