Home |  Previous |  Next |  Print |  Contact

 Grid Case Studies

  
 Acknowledgments
 Preface
 Introduction
 History, Standards & Directions
 What Grids Can Do For You
 Grid Case Studies
 
 Grid Applications
 Grid Deployments
 Bibliography
 Current Technology for Grids
 Programming Concepts & Challenges
 Joining a Grid: Procedures & Examples
 Typical Usage Examples
 Related Topics
 My Favorite Tips
 Glossary
 Appendices
 Use of This Material
 

Grid Case Studies


Grid Applications


SCOOP Storm Surge Model

Collaborators

Lavanya Ramakrishnan, Renaissance Computing Institute
Brian O. Blanton, Renaissance Computing Institute
Howard M. Lander, Renaissance Computing Institute
Richard A. Luettich, Jr, UNC Chapel Hill Institute of Marine Sciences
Daniel A. Reed, Renaissance Computing Institute
Steven R. Thorpe, MCNC

Summary

Recently, large-scale ocean and meteorological modeling has resulted in the use of Grid resources and high performance environments for running these models. There is a need for an integrated system that can handle real-time data feeds, schedule and execute a set of model runs, manage the model input and output data, make results and status available to the larger audience. Here, we describe the distributed software infrastructure that we have built to run a storm surge model in a Grid environment. Our solution builds on existing standard grid and portal technologies including the Globus toolkit [2], Open Grid Computing Environment [4] (OGCE) and lessons learned from grid computing efforts in other science domains. Specifically, we implement specific techniques for resource management and increased fault tolerance due to the sensitivity of the application.

This framework was developed as a component of Southeastern Universities Research Association's (SURA) Southeastern Coastal Ocean Observing and Prediction [15] (SCOOP) program The SCOOP program is a distributed project that includes Gulf of Maine Ocean Observing System, Bedford Institute of Oceanography, Louisiana State University, Texas A&M, University of Miami, University of Alabama in Huntsville, University of North Carolina, University of Florida and Virginia Institute of Marine Science. SCOOP is creating an open-access grid environment for the southeastern coastal zone to help integrate regional coastal observing and modeling systems.

For full model details and more complete grid component descriptions, see SCOOP Storm Surge Model.

Technology Components

The front-end to the system is through a portal that provides the interface for users to interact with the ocean observing and modeling system. The real-time data for the ensemble forecast arrives through Unidata's Local Data Manager [10] (LDM), an event-driven data distribution system that selects, captures, manages and distributes meteorological data products. Once all the data for a given ensemble member has been received, available grid resources are discovered using a simple resource selection algorithm. After the files are staged, the model run is executed and the output data is staged back to the originating site. The final result of the surge computations is inserted back into the SCOOP LDM stream for subsequent analysis and visualization by other SCOOP partners [15a]. Thus specifically our architecture has the following Grid components:

  • An Application Coordinator that acts as a central component that orchestrates the data and job management actions and interacts with the Globus services.
  • A resource monitoring and notification framework that is used to collect monitoring data and track data flow status in the system.
  • A resource selection API that queries grid resource to determine the best resources available to run each of the jobs.
  • An application preparation component that prepares the application bundle that needs to be used on a remote resource.
  • A front-end portal that allows users to conduct retrospective analysis, access historical data from previous model runs and observe the status of daily forecast runs from the portal

Data and Control Flow of the NC SCOOP System

Before we describe in detail each of the components used in the framework, we briefly describe the control flow of our framework. The ADCIRC storm surge model can be run in two modes. The “forecast” mode is triggered by real-time data arrival of wind data from different sites through the Local Data Manager [10]. In the “hindcast” mode, the modeler can either use a portal or a shell interface to launch the jobs to investigate prior data sets (post-hurricane). The figure shows the architectural components and the control flow for the NC SCOOP system:

  1. In the forecast mode the wind data arrives at the LDM node (Step 1.F. in figure). In our current setup, the system receives wind files from University of Florida and Texas A&M. Alternatively, a scientist might log into the portal and choose the corresponding data to re-run a model (Step 1.H. in figure).
  2. In the hindcast run, the application coordinator locates relevant files using the SCOOP catalog at UAH[17] and retrieves them from the SCOOP archives located at TAMU and LSU[12]. In the forecast runs, once the wind data arrives, the application coordinator checks to see if the hotstart files are available locally or are available at the remote archive. If they are not available and not being generated currently (through a model run), a run is launched to generate the corresponding hotstart files to initialize the model for the current forecast cycle.
  3. Once the model is ready to run (i.e. all the data is available), the application coordinator will use the resource selection component to select the best resource for this model run.
  4. The resource selection component queries the status at each site and ranks the resources, accounting for queue delays and network connectivity between the resources.
  5. The application coordinator then calls an application specific component that prepares an application package that can be shipped to remote resources. The application package is customized with specific properties for the application on a particular resource and includes the binary, the input files and other initialization files required for the model run.
  6. The self-extracting application package is transferred to the remote resource and the job is launched using standard grid mechanisms.
  7. Once the application coordinator receives the “job finished” status message, it retrieves the output files from the remote sites.
  8. The results are then available through the portal (Step 8.H in figure). Additionally, in case of forecast mode, we push the data back through LDM (Step 8.F in figure) which is archived and visualized by other SCOOP partners downstream.
  9. The application coordinator publishes status messages at each of the above steps to a centralized messaging broker. Interested components such as the portal can subscribe to relevant messages to receive real-time status notification of the job run.
  10. In addition the resource status information is also collected across all the sites that can be observed through the portal as well as used for more sophisticated resource selection algorithms.


Figure CS-1. Architectural components and the control flow for the NC SCOOP system.

Contact

scoop-support@renci.org, Renaissance Computing Institute.

Acknowledgements

This framework was developed as a component of Southeastern Universities Research Association's (SURA) Southeastern Coastal Ocean Observing and Prediction (SCOOP) program [15]. The SCOOP program is a distributed project that includes numerous research partners [15a]. Funding for SCOOP has been provided by the Office of Naval Research, Award N00014-04-1-0721 and by the National Oceanic and Atmospheric Administration's NOAA Ocean Service, Award NA04NOS4730254. Full acknowledgements are provided in the detailed version of this paper, available in the Related Links section of this Cookbook.


Open Science Grid

Collaborators

The Open Science Grid consortium consists of around 23 member organizations and several partners. An up to date list can be found under the OSG Council [21] web page. The participants are called Virtual Organizations [22], or VOs, where a VO is a collection of people (VO members), computing/storage resources (sites) and services (e.g., databases.) Technical Activity [23] groups round out the organization through liaison, service and development activities.

Introduction and Overview

Scientists from many different fields use the Open Science Grid to advance their research. The OSG Consortium includes members from particle and nuclear physics, astrophysics, bioinformatics, gravitational-wave science and computer science collaborations. Consortium members contribute to the development of the OSG and benefit from advances in grid technology. Applications in other areas of science, such as mathematics, medical imaging and nanotechnology, benefit from the OSG through its partnership with local and regional grids or their communities' use of the Virtual Data Toolkit software stack.

The following chart shows running applications as well as the current load on the OSG over a one week period. The subsequent sections in this case study will look a little further into several of these applications.

Figure CS-2. Current running applications and load on the Open Science Grid. Plot provided by MonALISA [24].



 

CMS: The Compact Muon Solenoid

Figure CS-3. Simulated decay of Higgs boson in the future
CMS experiment at CERN. (Credit: CERN)

Collaborators, Organizations

The USCMS Collaboration consists of various US universities and Fermi National Accelerator Laboratory (FNAL). The Collaboration works closely with the CMS Collaboration at CERN to accomplish the missions of the experiment. Major funding of this program is provided by The US Department of Energy (DOE) and the National Science Foundation (NSF).

See US CMS Institutions and Members [25] for details.

Summary/Description

From the U.S. CMS website [26]:

"The CMS experiment is designed to study the collisions of protons at a center of mass energy of 14 TeV. The physics program includes the study of electroweak symmetry breaking, investigating the properties of the top quark, a search for new heavy gauge bosons, probing quark and lepton substructure, looking for supersymmetry and exploring other new phenomena."

The USCMS Software and Computing [27] project provides the computing and software resources needed to enable US scientists to participate in CMS activities.

According to the CERN Architectural Blueprint RTAG [28] (October, 2002) the configuration and control of Grid-based operation should be encapsulated in components and services intended for these purposes. Apart from these components and services, grid-based operation should be largely transparent to other components and services, application software, and users. Grid middleware constitutes optional libraries at the foundation level of the software structure. Services at the basic framework level encapsulate and employ middleware to offer distributed capability to service users while insulating them from the underlying middleware. For the USCMS, the OSG provides the necessary Grid middleware components (that are also made to be interoperable with the LCG/EGEE components.)

Data and Control Flow

The CMS experiment employs a tiered computing model. Tier0 is at CERN in Switzerland. FNAL is one of seven Tier1's and universities in the US and Brazil are the Tier2's. Experimental data is produced at the Tier0 and replicated at Tier1's. Tier2's have the responsibility of hosting data that is interesting for regional users and will be used for data analysis by users through OSG gatekeepers at those Tier2's. Monte Carlo simulated events (MC events) are produced at Tier2's and Tier1's. These MC events are transferred to region Tier1's (FNAL in case of USCMS) or the Tier0. Thus, the model for the CMS experiment calls for data to be passed by the CMS detector at CERN in Switzerland, to a series of large computing sites around the world (and MC events the opposite direction.)

The CMS Tier-2 centers in the United States and around the world have more work yet to do on their network infrastructure before they're ready to accept the large data rates expected when the experiment starts running — up to 100 megabytes per second. The eventual goal for the computing sites during 2007 is to sustain the use of more than 50% of their network capacity for an entire day. For example, for the Purdue-UCSD network link that would mean sustaining transfers at approximately four gigabits per second for one day [29].

Data storage responsibilities are shared between OSG, the VO, and the site. For example, OSG defines storage types and the API's and the information schema for finding storage. The VO manages the data transfers and the catalogues. The site chooses the storage type and amount, and implements publication of storage information according to the OSG rules (more specifically the Glue schema.) The following image is an example of CMS data transfer across several days in early 2007.


Figure CS-4. CMS data transfer at OSG sites. [30]

Likewise, job submission responsibilities are shared by OSG, the VO, and site. OSG defines the interface to the batch system and information schema and provides the middleware that implements them. The VO manages the job submissions and workflows. (This is through either the Condor-G job submission tools or the workload management systems developed by grid projects such as EGEE/LCC.) The site chooses which batch system to use but configures that system interface in accordance with OSG rules.

The workflow can be described as:

  • The VO administrators, called the software deployment team, install the application software. Users have read-only access from batch slots.
  • Data is produced at CERN. MC events are produced by the MC production teams at OSG or EGEE/LCG sites.
  • Data movement is carried out by a system called the PhEDEx. CERN controls the rate of data movement and sites or authorized personnel subscribe to necessary data through the PhEDEx system. The VO administrator moves MC events produced at the site to the upper Tiers via gftp. Users have read-only access from batch slots.
  • Users submit their jobs via condor-g. The jobs run in batch slots, writing output to local disks. The jobs copy their output from the local disks to the data area via gftp.
  • Users collect their output from the site(s) via gftp for follow-up analysis.

Contact

US CMS Organization, Institution, and Member Contacts [31]



 

SDSS: Sloan Digital Sky Survey

Figure CS-5. SDSS Image of the Week
(click for this week's image.)

Collaborators, Organizations

The SDSS collaboration includes 150 scientists at 25 institutions [32]. An advisory council [33] represents the institutions and advises the ARC Board of Governors on matters relating to the projects.

Summary/Description

The Sloan Digital Sky Survey (SDSS) is focused on producing a detailed optical image and 3-dimensional map covering a significant portion of the sky. With the amount of data that must be stored and managed, and the compute power required to produce the rich, integrated visual results, the project is a clear example of a scientific milestone that is dependent on advancements in distributed, collaborative high performance computing.

From the SDSS website: [34]

The SDSS uses a dedicated, 2.5-meter telescope on Apache Point, NM, equipped with two powerful special-purpose instruments. The 120-megapixel camera can image 1.5 square degrees of sky at a time, about eight times the area of the full moon. A pair of spectrographs fed by optical fibers can measure spectra of (and hence distances to) more than 600 galaxies and quasars in a single observation. A custom-designed set of software pipelines keeps pace with the enormous data flow from the telescope.

The SDSS completed its first phase of operations "SDSS-I" in June, 2005. Over the course of five years, SDSS-I imaged more than 8,000 square degrees of the sky in five bandpasses, detecting nearly 200 million celestial objects, and it measured spectra of more than 675,000 galaxies, 90,000 quasars, and 185,000 stars. These data have supported studies ranging from asteroids and nearby stars to the large scale structure of the Universe.

The SDSS has entered a new phase, SDSS-II, continuing through June, 2008. With a consortium that now includes 25 institutions around the globe, SDSS-II will carry out three distinct surveys — the Sloan Legacy Survey, SEGUE, and the Sloan Supernova Survey — to address fundamental questions about the nature of the Universe, the origin of galaxies and quasars, and the formation and evolution of our own Galaxy, the Milky Way."

For more background information on mapping universe and new discoveries, see About US [35] at the SDSS web site.

Contact

The SDSS business manager and institutional representatives are listed on the SDSS Contact US [36] web page.

Acknowledgements

Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Science Foundation, the U.S. Department of Energy, the National Aeronautics and Space Administration, the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education Funding Council for England.

 


 

ATLAS

Figure CS-6. The ATLAS Detector
(click for more images.)

Collaborators, Organizations

The ATLAS collaboration consists of various boards, institutions, committees, and working groups. Over 1,850 individuals at roughly 175 institutions across 37 countries work together. See The ATLAS Organization [36] for more details. A very interesting discussion on how the collaboration works can be found at How ATLAS Collaborates [37].

Summary/Description

One of the discoveries eagerly anticipated by particle physicists working on the world's next particle collider is that of supersymmetry, a predicted lost symmetry of nature. Physicists from the University of Wisconsin-Madison are using Open Science Grid resources to show that there is a good possibility of discovering supersymmetry with data collected during the first few months of the collider's operation, if the new symmetry exists in nature.

Supersymmetry, often called SUSY, predicts the existence of superpartner particles for every known particle, or sparticles, for every known fundamental particle.. Recent experiments have suggested that most of the matter in our universe is not made of familiar atoms, but of some new sort of dark matter. Discovering a hidden world of sparticles may shed light on the nature of this dark matter, connecting observations performed at earth-based accelerators with those performed by astrophysicists and cosmologists.

Data and Control Flow

To accurately simulate the search for supersymmetry required physicists to create a gateway to three different grid environments from their desks at CERN. They used the Virtual Data Toolkit, an ensemble of middleware tools distributed and maintained with the collaboration of OSG members, to create an access point to resources from the Open Science Grid, the LHC Computing Grid and the University of Wisconsin-Madison's Condor pool.

"The most difficult part was to make a grid which is interoperable, such that the requirements of all existing grid flavors could be included," they explained. "This was done by modifying the current VDT, and consuming more than 215 CPU years in less than two months using resources from the OSG and Madison's Condor Pool."

With so many computing resources at their disposal, they simulated for the first time an accurate background for SUSY searches. Comparing the simulated signals for several types of SUSY against the simulated background shows that physicists might be able to discover the long-sought sparticles with the first ATLAS experimental data.

See Simulating Supersymmetry with ATLAS [38] for the complete article.

Contact

See the ATLAS Experiment home page [39].

Acknowledgements


Figure CS-7. ATLAS Collaboration Map.


SURAgrid Applications

Simulation-Optimization for Threat Management in Urban Water Systems

Collaborators

Sarat Sreepathi and Mahinthakumr, NCSU
Von Laszewski and Haetgen, University of Chicago
Uber and Feng, University of Cincinnati
Harrison, University of South Carolina

Summary/Description

Contamination threat management is a very real and practical concern for any population utilizing a shared drinking water distribution system. Several components are involved including real-time characterization of the source and extent of the contamination, identification of control strategies, and design of incremental data sampling schedules. This requires dynamic integration of time-varying measurements of flow, pressure and contaminant concentration with analytical modules including models to simulate the state of the system, statistical methods for adaptive sampling, and optimization methods to search for efficient control strategies. The goal of this multi-disciplinary research project (NSF-funded from Jan 2006 to Dec 2008) is to develop a cyberinfrastructure system that will both adapt to and control changing needs in data, models, computer resources and management choices facilitated by a dynamic workflow design.

The application specifically incorporates dynamic water-usage data, in real-time, into a simulation-optimization process to inform decision making in threat management situations. The nature of this work is highly compute-intensive and requires multi-level parallel processing via computer clusters and high-performance computing architectures such as SURAgrid. The optimization component uses evolutionary computation based algorithms and the simulation component uses EPANET, a water distribution simulation code originally released by USEPA. Simulation-Optimization with EPANET is part of a multidisciplinary, three-year NSF-funded DDDAS (Dynamic Data-Driven Application Systems) research project to develop a cyberinfrastructure system that will both adapt to and control changing needs in data, models, computer resources and management choices facilitated by a dynamic workflow design. Project Partners: North Carolina State University; University of Chicago; University of Cincinnati University of South Carolina


Figure CS-8. Graphical Monitoring Interface

The analytical modules (composed of thousands to millions of simulation instances that are driven by optimization search algorithms) used to simulate realistic water distribution systems are highly compute-intensive and require multi-level parallel processing via computer clusters. While data often drive the analytical modules, data needs for improving the accuracy and certainty of the solutions generated by these modules dynamically change when a contamination event unfolds. Since such time-sensitive threat events require real-time responses, the computational needs must also be adaptively matched with available resources. Grid environments composed of independent or loosely coupled computer clusters (e.g., the TeraGrid, SURAgrid) are ideal for this application as the simulation instances can be easily clustered (or bundled) into semi-independent sets, often requiring synchronization at various stages, that can be effectively executed in these environments through an intelligent allocation and monitoring mechanism which is currently being implemented as a middleware feature.

SURAgrid Deployment

The integrated simulation-optimization system developed through this project is intended to be used by the project team members during the two-year development phase of this project. Team members include application engineers at North Carolina Statue University (NCSU) and the University of Cincinnati, optimization methodology developers (NCSU and the University of South Carolina), and computer scientists (NCSU and the University of Chicago). The application engineers will test and analyze various water distribution contamination problem scenarios using realistic networks. The methodology developers will investigate various optimization search algorithms for source characterization, demand uncertainty and sensor sampling design.

The computer scientists will undertake the grid implementation, integration of various components, and performance testing in different grid environments and computer clusters, including SURAgrid. The team is using SURAgrid as an “on-ramp” to the TeraGrid. Citing specific SURAgrid benefits such as compute resource heterogeneity and low overhead to participate, the team plans to ready the application for porting to the TeraGrid by uncovering and addressing potential programming and workflow issues on SURAgrid.

Grid Workflow

To be able to run jobs on SURAgrid, the NCSU user applies for an affiliate user certificate issued by SURAgrid site Georgia State University (GSU), who has a Certificate Authority (CA) that has been cross-certified with the SURAgrid Bridge CA (BCA). Cross-certification enables SURAgrid resource sites to trust the user certificate being presented by the NCSU user and, when the SURAgrid User Administrator at GSU also creates a SURAgrid account for the NCSU user, the user essentially has single-sign-on access to SURAgrid resources at cross-certified SURAgrid sites1. After they’ve authenticated to the SURAgrid resource, the user invokes the optimization method on the client workstation that initiates the middleware that directly communicates with the specific SURAgrid resource (authenticated through ssh keys) for job submission and intermediate file movement. Currently the application needs to be pre-staged by the user, but this functionality will be integrated into the middleware. The middleware, which uses public key cryptography, will provide a seamless, python-based application interface for staging initial data and executables, data movement, job submission, and real-time visualizations of application progress. The interface uses passwordless ssh commands to create the directory structure necessary to run the jobs and handles all data movement required by the application. It launches the jobs at each site in a seamless manner, through their respective batch commands. The middleware is able to minimize resource queue time by querying the resource at a given site to determine the size of resource to request. Most of the middleware functionality has been implemented at least at a rudimentary level and efforts are now focused on better integration and sophistication.

In addition to the middleware interface described above, the application consists of two major components: one for optimization, one for simulation. The optimization component presently used on the SURAgrid is called JEC (Java Evolutionary Computation toolkit), This is the client side that drives the simulation component by calling the middleware interface. Evolutionary algorithms call multiple instances of simulations (typically hundreds) at each generation (or iteration) and require synchronization at each generation as the simulation results have to be processed before beginning the next generation. Everything on the server side (middleware, simulation component, and the grid resources) is transparent to the client.

The simulation component is an MPI C wrapper written around EPANET that does a number of things. It bundles multiple simulations (typically hundreds) and performs simultaneous execution of these on a single cluster via a coarse-grained MPI-based parallelism feature. The wrapper saves a considerable amount of processing time by not duplicating I/O and parts of simulations that are common to all simulation instances. It also has a persistent capability such that, once an EPANET job is launched, it does not need to exit until all simulation instances have been completed across all generations of an evolutionary algorithm (i.e., once the simulation outputs are written for a given generation, it can maintain a wait state until the next set of evaluations arrives from the middleware). The output files are moved back to the client workstation as the simulation progresses on the resource side. A python/TK real-time visualization tool developed by NCSU then enables visualization of the progress of the algorithm on the water distribution network. The visualization tool also creates PNG files of various stages of the output.

Acknowledgements

Simulation-Optimization with EPANET is part of a multidisciplinary, three-year NSF-funded DDDAS (Dynamic Data-Driven Application Systems) research project to develop a cyberinfrastructure system that will both adapt to and control changing needs in data, models, computer resources and management choices facilitated by a dynamic workflow design.

 


 

Multiple Genome Alignment on the Grid

Collaborators

Georgia State University
SURA

Summary/Description

This application takes a number of genome sequences as input and gives an aligned sequence based on their structure by using a pairwise alignment algorithm. When run on grids like SURAgrid, carefully designed and grid-enabled algorithms like this, which implement a memory efficient method for computation and are also parallelized efficiently so that the workload is well distributed on grids, afford bioinformatics users a performance comparable to cluster environments while giving them added flexibility and scalability.

Biological sequence alignment is used to determine the nature of the biological relationship among organisms, for example, in finding evolutionary information, determining the causes and cures of diseases, and for gathering information about a new protein. Multiple genome sequence alignment (where several genome sequences are aligned rather than only two) is very important for analysis of genome and protein structures — particularly for showing relationships among structures being aligned. A significant challenge to researchers is the computational requirements to align multiple (more than three) sequences of very large size. With Georgia State University’s (GSU) core research initiatives in life sciences, and particularly protein structure analysis, Dr. Yi Pan, currently GSU Chair Computer Science, and Nova Ahmed, as his graduate student, provided a significant contribution in this area by deploying a parallelized multiple sequence alignment algorithm application in a grid environment, thus improving computer processing of the large sequence lengths typical of genomic and proteomic science.

SURAgrid Deployment

Although the parallel algorithm requires inter-processor communication to compute multiple aligned sequences, it actually reduces overall computation by independently solving and then merging a set of tasks. The new algorithm, which was initially designed for a shared memory architecture where it is helpful to reduce the memory requirement, did indeed improve performance during its initial runs. However, the resulting algorithm and its parallelization is also suited to grid environments such as SURAgrid that benefit this type of distributed, computationally intensive work. Ahmed’s tests of grid-enabled clusters showed comparable performance to that of non-grid-enabled clusters (there was negligible overhead from the grid layer services) and a significant improvement over older shared memory-type systems. Pan and Ahmed’s algorithm can provide very scalable, cost-effective computational performance for grid environments, where job submission and scheduling can be easier since users don’t need account on every node and can submit multiple jobs at one time.


Figure CS-9. Parallel load distribution among processors for multiple sequence alignment

There were several iterations of testing for both the code and Georgia State and SURAgrid’s access management infrastructure components. The end result of the collaboration is that Georgia State users run the multiple genome alignment application through the integration of their personal identity verification into Georgia State’s campus identity management environment, which is then leveraged to provide external access to all SURAgrid resources.

To create a local grid certificate, the user sends a request from their official campus email and is issued a grid certificate based on their unique CampusID. The ACS Certificate Authority (CA) that ACS created and cross-certified with the SURAgrid Bridge CA (BCA), provides the local user’s passport to SURAgrid resources. The cross-certification process enables a SURAgrid resource to trust the Georgia State local certificate being presented by the user. The user experience is further simplified by Georgia State’s use of the SURAgrid user account system that essentially provides single-sign-on access to SURAgrid resources at cross-certified SURAgrid sites. The account management system overlays the cross-certification process and empowers the SURAgrid User Administrator from Georgia State to easily issue SURAgrid user accounts. The user’s Georgia State issued certificate invokes the Globus Toolkit that allows Globus, on behalf of the algorithm application, to manage the grid services necessary to submit the application’s jobs to various SURAgrid resources.

Conclusion

As Georgia State continues to deploy grid technology, policies and processes of their campus grid, they expect the multiple genome algorithm alignment code will continue to be used to test and perfect the grid. Considering that it also provides a memory efficient, pair-wise alignment for large biological sequences in an optimal way, the application is an invaluable asset to Georgia State and to others interested in improved sequence alignment using SURAgrid resources.

Acknowledgements

Nova Ahmed, Ph.D. student CS, Georgia Tech
Victor Bolet, Analyst Programmer Intermediate, Advanced Campus Services Georgia State
Dharam Damani, MS student CS, Georgia State University
Nicole Geiger, Analyst Programmer Associate, Advanced Campus Services Georgia State
Yi Pan, Professor, Chair Computer Science, Georgia State


Grid Deployments


Texas Tech TechGrid

Texas Tech TechGrid

Collaborators

Texas Tech University

Summary/Description

The Texas Tech grid project, TechGrid, mission is to integrate the numerous and diverse computational, visualization, storage, data, and spare lab desktop resources of Texas Tech University into a comprehensive campus cyber infrastructure for research and education. The integration of these vast resources into TechGrid will enable resource access and sharing on an unprecedented scale, while new Web-based and command-line interfaces will facilitate new models for utilization and coordination. The goals of rapid deployment, adoption, and evolution of TechGrid will enable it to serve as a research and teaching computing infrastructure, while also providing a platform for grid computing R&D. TechGrid will thus present a unique campus environment for knowledge discovery and education.

About TechGrid

Texas Tech University grid, TechGrid, developed and deployed in 2002, is a comprehensive cyber infrastructure project to bring a distributed-knowledge environment to Texas Tech research and education. TechGrid consists of 600 Windows and Linux PC's donated from various parts of campus to share spare computational cycles while the donated resources are not being used. The grid software used to integrate these compute resources together is called Condor. Condor is a grid middleware package developed by the University of Wisconsin. During the past five years, TechGrid has helped facilitate the massive computing needs of research projects involving computational chemistry, bioinformatics, biology, physics, mathematics, engineering, and business statistical analysis. Additionally, TechGrid has been instrumental in teaching distributed and grid computing in the Texas Tech Advanced Technology Learning Center, Texas Tech Teaching Learning and Technology Center, Texas Tech Jerry Rawls School of Business, Texas Tech Computer Science department as well as the Texas Tech Mathematics and Statistics department. 

The goal of the TechGrid project is to enable significant advances in scientific discovery and to foster innovative educational programs. TechGrid will integrate and simplify the usage of the diverse computational, storage, visualization, and some data resources of Texas Tech to facilitate new, powerful paradigms for research and education. The project will serve as a model for other campuses wishing to develop an integrated cyber infrastructure for research and education.

Middleware

The grid distributes a compute job among compute nodes within the grid using grid middleware as the means to facilitate distributed computing.  The name of the grid middleware is Condor.

 What is Condor?

From the University of Wisconsin Condor site [68]:

Condor is a specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion.

While providing functionality similar to that of a more traditional batch queuing system, Condor's novel architecture allows it to succeed in areas where traditional scheduling systems fail. Condor can be used to manage a cluster of dedicated compute nodes (such as a "Beowulf" cluster). In addition, unique mechanisms enable Condor to effectively harness wasted CPU power from otherwise idle desktop workstations. For instance, Condor can be configured to only use desktop machines where the keyboard and mouse are idle. Should Condor detect that a machine is no longer available (such as a key press detected), in many circumstances Condor is able to transparently produce a checkpoint and migrate a job to a different machine which would otherwise be idle. Condor does not require a shared file system across machines — if no shared file system is available, Condor can transfer the job's data files on behalf of the user, or Condor may be able to transparently redirect all the job's I/O requests back to the submit machine. As a result, Condor can be used to seamlessly combine all of an organization's computational power into one resource.

Definitions, Components, and Software tools

Definitions

1. Grid Zone: is a department or lab associated with a campus department that has volunteered resources to be used by the grid.

2. Grid Zone Administrator: a person who is responsible for the grid zone in their individual departments.

3.Campus Grid Administrator: a person who is responsible for the maintenance, upkeep, and operation of the grid, HPCC grid research, grid training, and interfacing with the general computing user base to supply grid based and High Performance Computing support and services to the Texas Tech campus community.

4. Grid Node: is an individual computer within a Grid Zone that contributes compute cycles to the grid.

5. Grid Attribute: individual settings such as permissions, performance, or scheduling mechanism that can be controlled by the Grid Administrator.

6. Bootstrap Server: is the central grid server responsible for controlling grid functions and job management.

Components


Figure CS-9. Job distribution on TechGrid.

Applications

Applications on the TechGrid include:

The Proth [40] code was provided by Dr. Chris Monico and grid-enabled to run on TechGrid. The code used several thousand CPU hours to look for prime numbers from sieved candidates.

The Partial Differential Equation [41] grid project of Dr. Sandro Manservisi was grid-enabled and used 1200 CPU hours.

The grid-enabled Multivariate Minimization project was completed and published at Global Grid Forum 8 .   Title: Multivariate Minimization Using Grid Computing by K. Kulish, J. Perez, P. Smith. [42]

A Matlab executable was grid-enabled to simulate the lifespan of catfish for a PhD Thesis by Dr. Eric Albers [43].

Installation of and experimention with SRB (Storage Resource Broker) data grid [44] was completed.

The San Diego Supercomputing Center's supercomputing library of space movies were accessed.

In ccooperated with the Architecture department, a 3-D Studio Max graphics rendering grid was created [45]. Denny Mingus and Glenn Hill were the main contacts.

In collaborattion with the Biology department, a grid-based BLAST [46] was explored.  Basic grid BLAST jobs were possible; however a means to move data was still required to handle large BLAST datasets.  Dr. Natalya Klueva and Dr. Randy Allen were the contacts for this project.

In collaboration with the Rawls College of Business a SAS-based compute grid [52] was created.  The grid was designed and deployed in a 3 week period.  Dr. Peter Westfall is the major contact for this project.

A physics space simulation "Neighbors" for a physics graduate thesis [53] was grid-enabled. The purpose was to simulate the effects of tumbling debris on a spacecraft upon reentry into the Earth's atmosphere.  Several thousand simulations were processed.

Texas Tech HPCC and the University of Virginia joined Data Grid to test the Internet2 connectivity between universities. Results were published in the ACM Journal of Computing.

In the USDA Grid Bioinformatics Project [54] TechGrid helped Dr. Scot Dowd with the Administration of Blast jobs to analyze the pig genome using TechGrid and Rocks clustering.  This was a collaborative effort between Texas Tech and the USDA.

ENDYNE is a grid implementation of the electron nuclear dynamics theory: a coherent-states chemistry. ENDYNE is a TTU grid project that involves TTU computational chemists and TTU HPCC staff developing a grid-based method of calculating a coherent-states simulation that uses classical theoretical models and quantum mechanics to simulate the relationships between chemical atomic interactions.
Snapshots of a head-on collision of a proton and a hydrogen molecule
at three different times.


Snapshots of a collision of a proton splitting the bond of hydrogen
molecule at three different points of the trajectory.

3-D plot from R analysis Researchers use the "R" programming language/framework [55] to process R macros on the grid to calculate mathematical models as well as genomic bioinformatics data.

TechGrid Status

TechGrid's compute nodes are located in the Advanced Technology Learning Center (ATLC), the High Performance Computing Center (HPCC) at Reese Center, the Computer Science department, the Business Building, the North Computing Center, and the Math Building.  Currently, TechGrid is made up of 600+ compute nodes spanning several domains and three operating systems.

Figure CS-10. The campus-wide grid is distributed across the TTU campus.

Contact

Jerry Perez, Texas Tech University.

URL: http://www.hpcc.ttu.edu/techgrid.html [56]


White Rose Grid

White Rose Grid, WRG

Collaborators, Organizations

The White Rose Consortium in Yorkshire, England: The universities of Leeds, Sheffield, and York


Figure CS-11. The White Rose Grid.

Summary/Description

The White Rose Grid (WRG) e-Science Centre brings together those researchers from the Yorkshire region who are engaged in e-Science activities and through these in the development of Grid technology. The initiative focuses on building, expanding and exploiting the emerging IT infrastructure, the Grid, which employs many components to create a collaborative environment for research computing in the region.

The White Rose Grid (WRG) at Leeds also hosts one of the four core nodes of the National Grid Service (NGS), which offers a production quality grid service for use by UK academia. (The other nodes are at CCLRC-RAL, Oxford, and Manchester.)

Components and Software/Toolkits

The White Rose Grid comprises five large compute nodes of which three are located at the University of Leeds, one at the University of Sheffield and one at the University of York. It offers a heterogeneous computing environment based on Sun Microsystems [57] multiprocessor computers, and Intel Xeon and AMD Opteron based systems built by Streamline Computing [58]. These nodes are interconnected by the network managed by YHMAN.

  • The Leeds Grid Node 1 is a constellation of shared-memory systems based on Sun Fire 6800 and V880 systems configured with UltraSPARC III Cu 900MHz processors and large physical memory (32GB).
  • The Leeds Grid Node 2 comprises two Linux clusters based on 2.2 & 2.4 GHz Intel Xeon processors interconnected with Myrinet 2000 networks, and in total delivering 292 CPUs.
  • The Leeds Grid Node 3 comprises Sun Microsystems? Sun Fire V40z and V20z servers with dual-core AMD Opteron processors supplied by Esteem Systems and integrated by Streamline Computing. Seven of these (V40z) comprise four 2.2 GHz dual-core processors configured with 192 GB memory. Eighty seven V20z servers are interconnected with a Myrinet network; each of these comprises two 2.0 GHz dual-core processors sharing in total 0.7 TB of distributed memory across 348 processor cores. The system runs the Linux (64-bit SuSE) operating system.
  • The Leeds Nodes are connected to 12 TB SAN storage and two EMC Centera disk-based archiving systems set up to provide 12TB of archive space to users. Sun HPC ClusterTools, Sun Forte Developer software and Sun Grid Engine Enterprise Edition are installed on all systems.
  • The 160 processor WRG Sheffield node has been supplied by Sun Microsystems and integrated by Streamline Computing. Eighty of these 2.4GHz AMD Opteron processors are 4-way nodes with 16GB main memory coupled by a Myrinet network; the remaining eighty nodes are 2-way nodes with 4GB main memory.
  • At Sheffieled there is also a Tier-2 GridPP node supporting the particle physics grid. This system is configured with 160 processors in 2-way nodes, and it runs 64-bit Scientific Linux, which is Redhat based.
  • The York Node includes two Beowulf type clusters, one (24 machine cluster; each providing two 2.4GHz dual core processors and 8 GB memory) in total offering 96 processor cores, 192 GB memory and 4.8 TB local scratch space; and the other which comprises 3 large memory nodes, each consisting of four 2.4 GHz dual core processors (8 cores per machine) and 8GB memory, in total delivering 24 processor cores configured with 96GB memory and 0.9 local scratch space. All these nodes are connected into a 10GB/s infinipath network for fast file access. In addition the cluster nodes are able to use this network for very low latency <2m MPI applications. Over 9TB of backed up storage is provided for users on SATA drive arrays and a 1 TB networked scratch space on f/c arrays.

WRG systems support applications written in FORTRAN, C, and C++, implementing parallelism through MPI or OpenMP. A couple of the Sun Fire V880s serve the open source Grid Portal, which interoperates with Globus middleware and Sun Grid Engine Enterprise Edition.

Furthermore, at the University of Leeds there is also the Virtual Environments Laboratory which comprises a T.A.N. 3D Holobench, SGI Onyx2 with interactive devices and projectors. Also a recently acquired visualisation node is available at Leeds for WRG researchers.

See the White Rose Grid Compute Node [59] description for more information.

Applications

The following applications include current and past projects. See the White Rose Activities [60] page for more projects and more information on each project.

CARMEN is a 4-year EPSRC funded e-Science Pilot Project involving 11 Universities and 19 Investigators. It aims to use grid technologies to enable experimenters in neurophysiology to archive their datasets in a structure, making them widely accessible for computational modelers and algorithm developers to exploit. The project will provide integrated and coordinated services for the neuroscience data, enabling neuronal signal detection, sorting and analysis, as well as visualisation and modeling. Furthermore it will enable direct near real-time analysis of streamed experimental data, providing information to distributed teams of specialists that will allow difficult experiments to be optimised.

COLAB is a joint research project of the Universities of Leeds (UK) and Beihang in Beijing (China) co-led by Profs J Xu (Leeds) and J Huai (Beihang), and managed by the EPSRC White Rose Grid e‑Science Centre established between Universities of Leeds, York and Sheffield. The project relates to the CROWN (China Research environment Over Wide-area Network) grid middleware system originally developed at Beihang University. Two sub-groups research the areas of Fault and Attack Tolerance, and Fault Injection-based Evaluation. Amongst other topics they investigate the provision of topologically aware fault and intrusion tolerance in grid systems as well as the provision of revised fault models for grid applications.

Grid-FIT (Grid-Fault Injection Technology) is a fault injector that utilizes network level fault injection to assess grid systems. Grid-FIT has been implemented specifically to test SOAP based web services systems and Globus systems.

Integrative Biology addresses two key problems in medicine today: the causes of cardiac failure and cancer tumours. Scientists are developing multi-scale models (from cells to whole organs) to help understand these problems. The size and complexity of the models demands significant compute power, and so this project brings together scientists and Grid computing experts. The project is being led by the University of Oxford and involves partners across the world, including the USA and New Zealand. Our contribution is in the area of computational steering and visualization, and is led by Professor Ken Brodlie and Dr James Handley.

The MoSeS (Modeling and Simulation for e-social Science) project is undertaken by the National Centre for e-Social Science node at the University of Leeds. The objective of this project is to develop representation of the entire UK population as individuals and households, together with a package of modeling tools which allows specific research and policy questions to be addressed.

The Scientific e-Communities Architecture (SeCA) project focuses on the design and evaluation of a novel Collaborative e-Science Architecture and its application, in the first instance to combustion chemistry. The project exploits Peer-to-Peer (P2P) technologies for supporting this scientific community model and a grid-based workgroup architecture for providing access to large computation and data resources. There are a number of challenges in realising the vision, for example, effective P2P resource discovery.

DAME (Distributed Aircraft Maintenance Environment), led by Prof Austin of York, was a major (£3.5m) e-Science project, which has developed a generic test-bed for distributed diagnostics. The application demonstrator built within the project offers a distributed maintenance environment motivated by the needs of Rolls Royce and its information system partner, Data Systems and Solutions.

The e-Demand project was supported by the Leeds and Durham Grid consortium, which includes experts from both academia and industry. The project has developed a demand-led and service-centric architecture for building complex but dependable and secure Grid applications based on the notion of ultra-late binding, dynamically bound service components, combined with atomic actions as a powerful control abstraction.

GEMSS (Grid-enabled Medical Simulation Services) is funded by the EU FP5 programme and is concerned with creating an environment in which computationally demanding tools native to the Health-Care sector can be made available to a wide spectrum of users. The goal is to provide a transparently accessible health computing resource suited to solving problems of large magnitude, with the end user having no awareness of the Grid computing platform(s). The project will evaluate the viability of this approach through several sample applications, including maxillo-facial surgery planning, neuro-surgery support, medical image reconstruction, radiosurgery planning and lung/cardiovascular simulations — the latter two have their base in Sheffield (Medical Physics)

GOSPEL, led by Professor M Berzins of Leeds University, and carried out in collaboration with Shell Research, has brought together advanced visualization, problem-solving environments, and computational techniques to create a Grid based workbench for the computational modeling of lubricants.

This ESRC demonstrator and the follow-on HYDRA2 project, both led by Dr M Birkin and Prof P M Dew from the University of Leeds, have demonstrated the use of grid technologies in support of the decision-making process in health care planning. A disparate set of data sources as well as a decision support module and visualization have been integrated to present the results.

myGrid will design, develop and demonstrate higher level functionalities over an existing Grid infrastructure that support scientists in making use of complex distributed resources. The project will develop a virtual laboratory workbench that will serve the life sciences community.

Future Plans

Their future plans include determining ways to continue to fund grid computing across the universities, including the challenge that each school uses a different funding model. They are also looking at more relationship opportunities.

Contact

See Contact Details [61] for more information.

Acknowledgements

The White Rose Grid project operates under the auspices of the White Rose University Consortium, which is an affiliation of the three Yorkshire Universities of Leeds, York and Sheffield. This is a collaborative venture between the White Rose Universities and our IT partners: Esteem Systems, Sun Microsystems, and Streamline Computing.

The Yorkshire and Humber Development Agency, Yorkshire Forward, is enabling us to expand our activities into the region and engage research universities and companies in e-Science.

The project has also received funding from the UK e-Science Core Programme, Esteem Systems, and the White Rose Universities.


Grid in New York State

Collaborators, Organizations

This grid is led by Dr. Miller's Cyberinfrastructure Laboratory. Current collaborating institutions include Columbia University, the Hauptman-Woodward Medical Research Institute, Marist College, Niagara University, SUNY-Buffalo, SUNY-Geneseo, University of Rochester, and Syracuse University.

Summary/Description

The Cyberinfrastructure Laboratory [xx] designed and deployed a Buffalo-based grid (ACDC-Grid) and a Western New York Grid (WNY Grid) before branching out to create a Grid involving institutions throughout New York State. This statewide Grid [xx] includes resources from a variety of institutions and is available in a simple and seamless fashion to users worldwide. This statewide Grid contains a heterogeneous set of resources and utilizes general-purpose IP networks [62, 63, 64, 65]. A major feature of this grid is that it integrates a computational grid (compute clusters that have the ability to cooperate in serving the user) with a data grid (storage devices that are similarly available to the user) so that the user may deploy computationally intensive applications that read or write large volumes of data files in a very simple fashion. In particular, this statewide Grid was designed so that the user does not need to know where data files are physically stored or where the application is physically deployed, while providing the user with easy access to their files in terms of uploading, downloading, editing, viewing, and so on.

The core infrastructure for this Grid encompassing institutions throughout New York State includes the installation of standard grid middleware and the use of an active Web portal for deploying applications. Several key packages were used in the implementation of NYS Grid and other packages have been identified in order to allow for the anticipated expansion of the system. The Globus Toolkit provides APIs and tools using the Java SDK to simplify the development of OGSI-compliant services and clients. It supplies database services and Monitoring & Discovery System index services implemented in Java, GRAM service implemented in C with a Java wrapper, GridFTP services implemented in C, and a full set of Globus Toolkit components. The recently proposed Web Service-Resource Framework provides the concepts and interfaces developed by the OGSI specification exploiting the Web services architecture.

This statewide Grid represents the next Grid in an evolution from an experimental Buffalo-based grid that involved a variety of independently run organizations at SUNY-Buffalo, as well as other local institutions, including Buffalo State College, the Hauptman-Woodward Medical Research Institute, and Canisius College to a persistent and hardened heterogeneous Western New York Grid that includes Niagara University, Geneseo State College, the Hauptman-Woodward Medical Research Institute, and SUNY-Buffalo. This Grid that includes institutions throughout New York State provides a variety of applications in order to support the users at the affiliated institutions, other users in New York State, as well as users from Open Science Grid.

Middleware Efforts

The New York State Portal [46, 47, 48, 49]. which was derived from the ACDC-Grid Portal, provides access to a dozen or so compute-intensive software packages, large data storage devices, and the ability to submit applications to a variety of grids containing tens of thousands of processors. Our Grid Portal integrates several software packages and toolkits in order to produce a robust system that can be used to host a wide variety of scientific and engineering applications. Specifically, our portal is constructed using the Apache HTTP server, HTML, Java and PHP scripting, PHPMyAdmin, MDS/GRIS/GIIS from the Globus Toolkit, OpenLDAP, WSDL, and related open source software that interfaces with a MySQL database.

Our Grid Portal provides a single-point of access to our statewide Grid for those users who want to concentrate on their disciplinary research and scholarship and do not want to be burdened with low-level details of utilizing a Grid. Applications are typically ported to the Grid Portal through our Grid-Enabling Application Templates, which provide developers with a template for porting a fairly traditional science or engineering application to our Grid-based Web Portal. This approach provides the developer with access to various databases, APIs, PHP scripts, HTML files, shell scripts, and so on, in order to provide a common platform to port applications and for users to efficiently utilize such applications. The generic template for developing an application provides a well-defined standard scientific application workflow for a Grid application. This workflow includes a variety of functions that include data grid interactions, intermediate processing, job specification, job submission, collection of results, run-time status, and so forth. The template provides a flexible methodology that promotes efficient porting and utilization of scientific routines. It also provides a systematic approach for allowing users to take advantage of sophisticated applications by storing critical application and user information in a MySQL database. Most applications have been ported to our Grid Portal within 1-2 weeks.

Our lightweight Grid Monitoring software [66] is used to monitor resources from a variety of Grids, including the statewide Grid, Western New York Grid, Open Science Grid, Open Science Grid Testbed, and TeraGrid, to name a few. With production Grids still in their infancy, the ability to efficiently and effectively monitor a grid is important for users and administrators. Our Grid Monitoring System runs a variety of scripts continually, stores information in a MySQL database, and displays the information in an easy to digest and navigate Grid Dashboard. The Dashboard is served by an Apache Server and is written in Java and PHP scripts. It provides a display that consists of a radial plot in the center of the main page that presents an overview of an available Grid, surrounded by histograms and other visual cues that present critical statistics. By clicking on any of these individual components, the user can drill down for more details on the information in question. These drilldown presentations include dynamic and interactive representations of current and historical information. For example, a user or administrator can easily determine the number of jobs running or queued on every system of any available Grid, the amount of data being added or removed from nodes on a grid, as well as a wealth of current and historical information pertaining to the individual nodes, Grids, or virtual organizations on an available Grid. Our work contributes to the widespread monitoring initiative in the distributed computing community that includes NetLogger, GridRM, Ganglia, and Network Weather Service, to name a few.

Our Grid Operations Dashboard [67] was designed to provide discovery, diagnosis, and the opportunity for rapid publication and repair of critical issues to grid administrators. The operational status of a given resource is determined by its ability to support a wide variety of Grid services, which Prescott typically refers to as site functional tests. Tests are performed regularly and sequentially in order to verify an every more complex set of services on a node. These results are reported in our Operations Dashboard in an easy to read chart.

The development of data storage solutions for the Grid and the integration of such solutions into Grid Portals is critical to the success of heterogeneous production-level Grids that incorporate high-end computing, storage, visualization, sensors, and instruments. Data grids typically house and serve data to grid users by providing virtualization services to effectively manage data in the storage network. The Storage Resource Broker is an example of such a system. Our Intelligent Migrator, currently being integrated into our Grid Portal, represents an effort to provide a scalable and robust data service to the users of this statewide Grid. The Intelligent Migrator examines and models user utilization patterns in an effort to make efficient use of limited storage so that the performance of our physical data grid and the services provided to our computational grid are significantly enhanced. Our integrated Data Grid provides users with seamless access to their files, which may be distributed across multiple storage devices. Our system implements data virtualization and a simple storage element installation procedure that provides a scalable and robust system to the users. In addition, our system provides a set of on-line tools for the users so that they may maintain and utilize their data while not having to be burdened with details of physical storage location.

Applications

The Cyberinfrastructure Laboratory has enabled the successful porting and implementation of numerous applications to this statewide Grid.

  • Shake-and-Bake(SnB) — Molecular Structure Determination Application
  • Buffalo-and-Pittsburgh (BnP) — SnB and PHASES Complete Protein Phasing
  • Ostrich — Optimization and Parameter Estimation Tool for Groundwater Modeling
  • Aseismic Design & Retrofit (EADR) — Passive Energy Dissipation System for Designing Earthquake Resilient Structures
  • Princeton Ocean Model Great Lakes (POMGL) — Great Lakes Hydrodynamic Circulation Model
  • Titan — Computational Modeling of Hazardous Geophysical Mass Flows
  • Chem — Commercial Quantum Chemistry Software Package
  • NWChem — Computational Chemistry Software Package developed and maintained by DOE
  • Split — Modeling Groundwater Flow with the Analytic Element Method

Future Plans

The goal of this Grid is to bring a mixture of organizations, both public and private, onto a shared grid within New York State. The nodes on the grid will include compute systems, storage devices, visualization systems, sensors, imaging systems, and a wide variety of Internet-ready devices. To date, the Cyberinfrastructure Laboratory has reached more than a dozen organizations throughout the state.

An on-going project with very positive, yet preliminary, results is our intelligent scheduling system. This system uses optimization algorithms and profiles of users, their data, their applications, as well as network bandwidth and latency, to improve a grid meta-scheduling system.

Acknowledgements

Funding for these grid initiatives was provided by the National Science Foundation under a series of grants including an ITR, MRI, and CRI. Additional support was provided by the Center for Computational Research at SUNY-Buffalo. Critical personnel responsible for establishing and maintaining the grid include students, staff, and post-docs from SUNY-Buffalo, namely, Jon Bednasz, Steve Gallo, Mark Green, Cathy Ruby, and Naimesh Shah. In addition, grid administrators at the participating institutions have been extremely responsive and worked in an extraordinarily collaborative fashion. We would also like to thank the members of Open Science Grid for all of their technical support, input, and advice.


Bibliography

[1] I. Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of Supercomputer Applications, 15(3), 2001.
[2] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit,” International Journal of Supercomputer Applications, 11(2):115-128, 1997.
[3] J. Novotny, S. Tuecke and V. Welch, “An Online Credential Repository for the Grid: MyProxy,” Proceedings of the Tenth International Symposium on High Performance Distributed Computing (HPDC-10), August 2001.
[4] Open Grid Computing Environment. (http://www.collab-ogce.org/nmi/index.jsp)
[5] W. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnal and S. Tuecke, “Data Management and Transfer in High Performance Computational Grid Environments,” Parallel Computing, 28 (5), pp. 749-771, May 2002.
[6] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith and S. Tuecke, “A Resource Management Architecture for Metacomputing Systems,” Workshop on Job Scheduling Strategies for Parallel Processing, pg. 62-82, 1998.
[7] I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, “A Security Architecture for Computational Grids,” Fifth ACM Conference on Computer and Communications Security, pp. 83-92, 1998.
[8] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke. “A Resource Management Architecture for Metacomputing Systems.” Proc. IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing, pg. 62-82, 1998.
[9] R.A. Luettich, J. J. Westerink, and N. W. Scheffner, ADCIRC: An advanced three-dimensional circulation model for shelves, coasts and estuaries; Report 1: theory and methodology of ADCIRC- 2DDI and ADCIRC-3DL, Technical Report DRP-92-6, Coastal Engineering Research Center, U.S. Army Engineer Waterways Experiment Station, Vicksburg, MS, 1992.
[10] Unidata Local Data Manager, 2006. (http://www.unidata.ucar.edu/software/ldm/)
[11] P. Bogden, G. Allen, G. Stone, J. Bintz, H. Graber, S. Graves, R. Luettich, D. Reed, P. Sheng, H. Wang,W. Zhao, The Southeastern University Research Association Coastal Ocean Observing and Prediction Program: Integrating Marine Science and Information Technology," Proceedings of the OCEANS 2005 MTS/IEEE Conference. Sept 18-23, 2005.
[12] D. Huang, G. Allen, C. Dekate, H. Kaiser, Z. Lei and J. MacLaren "getdata: A Grid Enabled Data Client for Coastal Modeling," HPC2006.
[13] P. Bogden, "The SURA Coastal Ocean Observing and Prediction Program (SCOOP) Service-Oriented Architecture," Proceedings of MTS/IEEE 06 Conference in Boston, Session 3.4 on Ocean Observing Systems, September 18-21, 2006.
[14] J. Bintz et al. "SCOOP: Enabling a Network of Ocean Observations for Mitigating Coastal Hazards," Proceedings of the Coastal Society 20th International Conference, 2006.
[15] SCOOP Website, 2006. (http://scoop.sura.org/)
[15a] SCOOP Partners (http://scoop.sura.org/partners.html)
[16] North Carolina Forecasting System. (http://www.renci.org/projects/indexdr.php)
[17] S. Graves, K. Keiser, H. Conver, M. Smith. “Enabling Coastal Research and Management with Advanced Information Technology,” 17th Federation Assembly Virtual Poster Session, July 2006.
[18] G. von Laszewski, I. Foster, J. Gawor, and P. Lane, "A Java Commodity Grid Kit," Concurrency and Computation: Practice and Experience, vol. 13, no. 8-9, pp. 643-662, 2001. (http:/www.cogkit.org/)
[19] K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman, “Grid Information Services for Distributed Resource Sharing.” Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001.
[20] R. Wolski, N. Spring, C. Peterson, “Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service,” in Proceedings of SC97, November, 1997.
[21] OSG Council (http://www.opensciencegrid.org/About/Who_is_the_Open_Science_Grid%3F/OSG_Council_Members)
[22] OSG Virtual Organizations (http://www.opensciencegrid.org/About/OSG_Organization/Virtual_Organizations)
[23] OSG Technical Activity Groups (http://www.opensciencegrid.org/About/OSG_Organization/Technical_Activities)
[24] MonALISA Graph of OSG Activity (http://monalisa.grid.iu.edu:8080/show?page=index.html)
[25] US CMS Institutions and Members (http://uscms.fnal.gov/uscms/organization/uscms_institutes_t_members.html)
[26] U.S. CMS website (http://www.uscms.org/Public/overview.html)
[27] USCMS Software and Computing (http://www.uscms.org/SoftwareComputing/index.html)
[28] CERN Archtectural Blueprint RTAG (http://lcgapp.cern.ch/project/blueprint/BlueprintReport-final.doc)
[29] Feature: Meeting the Data Transfer Challenge, ISGTW, Jan 17, 2007 (http://www.isgtw.org/?pid=1000226)
[30] 2007 Open Science Grid Consortium Meeting, UCSD, San Diego, CA, March 5-8, 2007, Frank Wurthwein, OSG Application Coordinator, OSG Extension Lead, Experimental Elementary Particle Physics, UCSD
[31] US CMS Organization, Institution, and Member Contacts (http://www.uscms.org/Public/contact.html)
[32] SDSS Institutions (http://www.sdss.org/members/index.html)
[33] SDSS Advisory Council (http://www.sdss.org/directorate/adco.html)
[34] SDSS Website (http://www.sdss.org/)
[35] SDSS — About US (http://www.sdss.org/background/)
[36] SDSS — Contact US (http://www.sdss.org/contacts.html)
[37] How ATLAS Collaborates (http://atlasexperiment.org/hac.html)
[38] Simulating Supersymmetry with ATLAS (http://tinyurl.com/2q79p9)
[39] ATLAS Experiment Home Page (http://atlasexperiment.org/)
[40] Proth (http://primes.utm.edu/programs/gallot/)
[41] Partial Differential Equation (http://www.math.ttu.edu/~smanserv/)
[42] Title: Multivariate Minimization Using Grid Computing by K. Kulish, J. Perez, P. Smith. (http://www.cs.vu.nl/ggf/apps-rg/meetings/ggf8/kulish.pdf)
[43] PhD Thesis by Dr. Eric Albers (http://www.iemss.org/iemss2002/proceedings/pdf/volume%20uno/298_albers.pdf)
[44] SRB (Storage Resource Broker) data grid (http://www.sdsc.edu/srb/index.php/Main_Page)
[45] 3-D Studio Max graphics rendering grid (http://www.arch.ttu.edu/resources/FAQ/3D/net_render_max_animation.asp)
[46] BLAST (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html)
[47] Query tutorial (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/query_tutorial.html)
[48] BLAST tutorial (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1.html)
[49] BLAST Guide (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/guide.html)
[50] PSI-BLASTtutorial (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html)
[51] More Information on BLAST (http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/auxiliary.html)
[52] SAS-based compute grid (http://www.sas.com/technologies/architecture/grid/index.html)
[53] "Neighbors" space simulation (http://dspace.lib.ttu.edu/bitstream/2346/1219/1/thesis.pdf)
[54] Bioinformatics Project (http://www.animalgenome.org/pigs/)
[55] "R" programming language/framework (http://www.r-project.org/)
[56] Texas Tech TechGrid (http://www.hpcc.ttu.edu/techgrid.html)
[57] Sun Microsystems (http://www.sun.com/)
[58] Streamline Computing (http://www.streamline-computing.com/)
[59] White Rose Grid Compute Node (http://www.wrgrid.org.uk/ComputeNodes.html)
[60] White Rose Grid Activities (http://www.wrgrid.org.uk/Activities.html)
[61] White Rose Grid Contact Details (http://www.wrgrid.org.uk/Contactus.html)
[62] M.L. Green and R. Miller, Grid computing in Buffalo, New York, Annals of the European Academy of Sciences, 2003, pp. 191-218.
[63] M.L. Green and R. Miller, Molecular structure determination on a computational & data grid, Parallel Computing Journal 30 (2004), pp. 1001-1017.
[64] M.L. Green and R. Miller, Evolutionary molecular structure determination using grid-enabled data mining, Parallel Computing Journal 30 (2004), pp. 1057-1071.
[65] M.L. Green and R. Miller, A client-server prototype for grid-enabling application template design, Parallel Processing Letters, Vol. 14, No. 2 (2004), pp. 241-253.
[66] C.L. Ruby, M.L. Green, and R. Miller, The Operations Dashboard: A Collaborative Environment for Monitoring Virtual Organization-Specific Compute Element Operational Status, Parallel Processing Letters, Vol. 16, No. 4 (2006), pp. 485-500.
[67] C.L. Ruby and R. Miller, Effectively Managing Data on a Grid, Handbook of Parallel Computing: Models, Algorithms, and Applications, S. Rajasekaran and J. Reif, eds., CRC Press, 2007, in press.
[68] What is Condor? (http://www.cs.wisc.edu/condor/description.html)

© 2006-8, Southeastern Universities Research Association
Sponsored by SURA, TATRC (No. W81XWH-06-1-0419), OSG, and iVDGL
Updated September, 2007