Grid Case Studies
Grid Applications
SCOOP Storm Surge Model Collaborators
Lavanya Ramakrishnan, Renaissance Computing Institute
Brian O. Blanton, Renaissance Computing Institute
Howard M. Lander, Renaissance Computing Institute
Richard A. Luettich, Jr, UNC Chapel Hill Institute of Marine Sciences
Daniel A. Reed, Renaissance Computing Institute
Steven R. Thorpe, MCNC
Summary
Recently, large-scale ocean and meteorological modeling has resulted in the
use of Grid resources and high performance environments for running these models.
There is a need for an integrated system that can handle real-time data feeds,
schedule and execute a set of model runs, manage the model input and output
data,
make results and status available to the larger audience. Here, we describe
the distributed software infrastructure that we have built to run a storm surge
model
in a Grid environment. Our solution builds on existing standard grid and
portal technologies including the
Globus toolkit
[2],
Open Grid Computing Environment
[4] (OGCE)
and lessons learned from grid computing efforts in other science domains.
Specifically, we implement specific techniques for resource management and
increased fault tolerance due to the sensitivity of the application.
This framework was developed as a component of Southeastern Universities Research Association's (SURA)
Southeastern Coastal Ocean Observing and Prediction
[15]
(SCOOP) program The SCOOP program is a distributed project that includes Gulf of Maine Ocean Observing System, Bedford
Institute of Oceanography, Louisiana State University, Texas A&M, University of Miami, University of Alabama in Huntsville,
University of North Carolina, University of Florida and Virginia Institute of Marine Science. SCOOP is creating an open-access
grid environment for the southeastern coastal zone to help integrate regional coastal observing and modeling systems.
For full model details and more complete grid component descriptions, see SCOOP Storm Surge Model.
Technology Components
The front-end to the system is through a portal that provides the interface for users to interact with the ocean observing
and modeling system. The real-time data for the ensemble forecast arrives through
Unidata's Local Data Manager
[10]
(LDM), an event-driven data distribution system that selects, captures, manages and distributes meteorological data products.
Once all the data for a given ensemble member has been received, available grid resources are discovered using a simple
resource selection algorithm. After the files are staged, the model run is executed and the output data is staged back
to the originating site. The final result of the surge computations is inserted back into the SCOOP LDM stream for
subsequent analysis and visualization by other
SCOOP partners
[15a].
Thus specifically our architecture has the following Grid components:
- An Application Coordinator that acts as a central component that
orchestrates the data and job management actions and interacts with the Globus
services.
- A resource monitoring and notification framework that is used to collect
monitoring data and track data flow status in the system.
- A resource selection API that queries grid resource to determine the best resources available
to run each of the jobs.
- An application preparation component
that prepares the application bundle that needs to be used on a remote resource.
- A front-end portal that allows users to conduct retrospective analysis,
access historical data from previous model runs and observe the status of
daily forecast
runs from the portal
Data and Control Flow of the NC SCOOP System
Before we describe in detail each of the components used in the framework,
we briefly describe the control flow of our framework. The ADCIRC storm surge
model can be run in two modes. The “forecast” mode is triggered
by real-time data arrival of wind data from different sites through the
Local Data Manager
[10]. In the “hindcast” mode,
the modeler can either use a portal or a shell interface to launch the jobs
to investigate prior data
sets (post-hurricane). The figure shows the architectural components and
the
control flow for the NC SCOOP system:
- In the forecast mode the wind data arrives at the LDM node (Step 1.F. in
figure). In our current setup, the system receives wind files from University
of Florida and Texas A&M. Alternatively, a scientist might log into
the portal and choose the corresponding data to re-run a model (Step 1.H.
in figure).
- In the hindcast run, the application coordinator locates relevant files
using the SCOOP catalog at UAH[17] and retrieves them from the SCOOP archives
located at TAMU and LSU[12]. In the forecast runs, once the wind data arrives,
the application coordinator checks to see if the hotstart files are available
locally or are available at the remote archive. If they are not available and
not being generated currently (through a model run), a run is launched to generate
the corresponding hotstart files to initialize the model for the current forecast
cycle.
- Once the model is ready to run (i.e. all the data is available), the application
coordinator will use the resource selection component to select the best resource
for this model run.
- The resource selection component queries the status at each site and ranks
the resources, accounting for queue delays and network connectivity between
the resources.
- The application coordinator then calls an application specific component
that prepares an application package that can be shipped to remote resources.
The application package is customized with specific properties for the application
on a particular resource and includes the binary, the input files and other
initialization files required for the model run.
- The self-extracting application package is transferred to the remote resource
and the job is launched using standard grid mechanisms.
- Once the application coordinator receives the “job finished” status
message, it retrieves the output files from the remote sites.
- The results are then available through the portal (Step 8.H in figure).
Additionally, in case of forecast mode, we push the data back through LDM
(Step 8.F in figure) which is archived and visualized by other SCOOP partners
downstream.
- The application coordinator publishes status messages at each of the above
steps to a centralized messaging broker. Interested components such as the
portal can subscribe to relevant messages to receive real-time status notification
of the job run.
- In addition the resource status information is also collected across all
the sites that can be observed through the portal as well as used for more
sophisticated resource selection algorithms.

Figure CS-1. Architectural components and the control flow for the NC SCOOP system.
Contact
scoop-support@renci.org, Renaissance Computing Institute.
Acknowledgements
This framework was developed as a component of Southeastern Universities Research Association's (SURA)
Southeastern Coastal Ocean Observing and Prediction (SCOOP) program
[15].
The SCOOP program is a distributed project that includes numerous research partners
[15a].
Funding for SCOOP has been provided by the Office of Naval Research, Award N00014-04-1-0721 and by the National Oceanic and Atmospheric Administration's NOAA Ocean Service, Award NA04NOS4730254. Full acknowledgements are provided in the detailed version of this paper, available in the Related Links section of this Cookbook.
Open Science Grid Collaborators
The Open Science Grid consortium consists of around 23 member organizations and several partners. An up to date list can
be found under the OSG Council
[21]
web page. The participants are called
Virtual Organizations
[22],
or VOs, where a VO is a collection of people (VO members), computing/storage resources (sites) and services (e.g., databases.)
Technical Activity
[23]
groups round out the organization through liaison, service and development activities.
Introduction and Overview
Scientists from many different fields use the Open
Science Grid to advance their research. The OSG Consortium includes members from particle and nuclear
physics, astrophysics, bioinformatics, gravitational-wave science and computer
science collaborations. Consortium members contribute to the development
of the OSG and benefit from advances in grid technology. Applications in other
areas of science, such as mathematics, medical imaging and nanotechnology,
benefit from the OSG through its partnership with local and regional grids
or their communities' use of the Virtual Data Toolkit software stack.
The following chart shows running applications as well as the
current load on the OSG over a one week period. The subsequent sections in this case study will look
a little further into several of these applications.
Figure CS-2. Current running applications and load on the Open Science Grid. Plot provided by
MonALISA
[24].
| CMS: The Compact Muon Solenoid |
Figure CS-3. Simulated decay of Higgs boson in the future
CMS experiment at CERN.
(Credit: CERN)
|
Collaborators, Organizations
The USCMS Collaboration consists of various US universities and Fermi National Accelerator Laboratory (FNAL).
The Collaboration works closely with the CMS Collaboration at CERN to accomplish the missions of the experiment.
Major funding of this program is provided by The US Department of Energy (DOE) and the National Science Foundation (NSF).
See
US CMS Institutions and Members
[25]
for details.
Summary/Description
From the U.S. CMS website
[26]:
"The CMS experiment is designed to study the collisions of protons at a center
of mass energy of 14 TeV. The physics program includes the study of electroweak
symmetry breaking, investigating the properties of the top quark, a search
for new heavy gauge bosons, probing quark and lepton substructure, looking
for supersymmetry and exploring other new phenomena."
The USCMS Software and Computing
[27]
project provides the computing and software resources needed to enable US scientists to participate in CMS activities.
According to the CERN
Architectural Blueprint RTAG
[28]
(October, 2002)
the configuration and control of Grid-based operation should be encapsulated in components and services intended for these
purposes. Apart from these components and services, grid-based operation should be largely transparent to other components
and services, application software, and users. Grid middleware constitutes optional libraries at the foundation level of
the software structure. Services at the basic framework level encapsulate and employ middleware to offer distributed capability
to service users while insulating them from the underlying middleware. For the USCMS, the OSG provides the necessary
Grid middleware components (that are also made to be interoperable with the LCG/EGEE components.)
Data and Control Flow
The CMS experiment employs a tiered computing model. Tier0 is at CERN in Switzerland. FNAL is one of seven Tier1's and
universities in the US and Brazil are the Tier2's. Experimental data is produced at the Tier0 and replicated at Tier1's.
Tier2's have the responsibility of hosting data that is interesting for regional users and will be used for data
analysis by users through OSG gatekeepers at those Tier2's. Monte Carlo simulated events (MC events) are produced at Tier2's
and Tier1's. These MC events are transferred to region Tier1's (FNAL in case of USCMS) or the Tier0. Thus,
the model for the CMS experiment calls for data to be passed by the CMS detector at CERN in Switzerland, to a
series of large computing sites around the world (and MC events the opposite direction.)
The CMS Tier-2 centers in the United States and around the world
have more work yet to do on their network infrastructure before they're ready to accept the large data rates expected
when the experiment starts running — up to 100 megabytes per second. The eventual goal for the computing sites during
2007 is to sustain the use of more than 50% of their network capacity for an entire day. For example, for the Purdue-UCSD
network link that would mean sustaining transfers at approximately four gigabits per second for one day
[29].
Data storage responsibilities are shared between OSG, the VO, and the site. For example, OSG defines storage types and the
API's and the information schema for finding storage. The VO manages the data transfers and the catalogues. The site chooses
the storage type and amount, and implements publication of storage information according to the OSG rules
(more specifically the Glue schema.) The following image is an example of CMS data transfer across several days in early 2007.

Figure CS-4. CMS data transfer at OSG sites.
[30]
Likewise, job submission responsibilities are shared by OSG, the VO, and site. OSG defines the interface to the batch system
and information schema and provides the middleware that implements them. The VO manages the job submissions and workflows.
(This is through either the Condor-G job submission tools or the workload management systems developed by grid projects
such as EGEE/LCC.) The site chooses which batch system to use but configures that system interface in accordance with OSG rules.
The workflow can be described as:
- The VO administrators, called the software deployment team, install the application software. Users have read-only
access from batch slots.
- Data is produced at CERN. MC events are produced by the MC production teams at OSG or EGEE/LCG sites.
- Data movement is carried out by a system called the PhEDEx. CERN controls the rate of data movement and sites or
authorized personnel subscribe to necessary data through the PhEDEx system. The VO administrator moves MC events produced
at the site to the upper Tiers via gftp. Users have read-only access from batch slots.
- Users submit their jobs via condor-g. The jobs run in batch slots, writing output to local disks. The jobs copy their
output from the local disks to the data area via gftp.
- Users collect their output from the site(s) via gftp for follow-up analysis.
Contact
US CMS Organization, Institution, and Member Contacts
[31]
| SDSS: Sloan Digital Sky Survey |

Figure CS-5. SDSS Image of the Week (click for this week's image.) |
Collaborators, Organizations
The SDSS collaboration includes 150 scientists at 25
institutions
[32].
An advisory
council
[33]
represents the institutions and advises the ARC Board of Governors
on matters relating to the projects.
Summary/Description
The Sloan Digital Sky Survey (SDSS) is focused on producing a detailed optical image and 3-dimensional map covering a
significant portion of the sky. With the amount of data that must be stored and managed, and the compute power required
to produce the rich, integrated visual results, the project is a clear example of a scientific milestone that is
dependent on advancements in distributed, collaborative high performance computing.
From the SDSS website:
[34]
The SDSS uses a dedicated, 2.5-meter telescope on Apache Point, NM, equipped with two powerful special-purpose instruments. The 120-megapixel camera can image 1.5 square degrees of sky at a time, about eight times the area of the full moon. A pair of spectrographs fed by optical fibers can measure spectra of (and hence distances to) more than 600 galaxies and quasars in a single observation. A custom-designed set of software pipelines keeps pace with the enormous data flow from the telescope.
The SDSS completed its first phase of operations "SDSS-I" in June, 2005.
Over the course of five years, SDSS-I imaged more than 8,000 square degrees
of the sky in five bandpasses, detecting nearly 200 million celestial objects,
and it measured spectra of more than 675,000 galaxies, 90,000 quasars, and
185,000 stars. These data have supported studies ranging from asteroids and
nearby stars to the large scale structure of the Universe.
The SDSS has entered a new phase, SDSS-II, continuing through June, 2008.
With a consortium that now includes 25 institutions around the globe, SDSS-II
will
carry out three distinct surveys — the Sloan Legacy Survey, SEGUE, and the
Sloan Supernova Survey — to address fundamental questions about the nature
of the Universe, the origin of galaxies and quasars, and the formation and
evolution of our own Galaxy, the Milky Way."
For more background information on mapping universe and new discoveries,
see About US
[35]
at the SDSS web site.
Contact
The SDSS business manager and institutional representatives are listed on
the SDSS Contact US
[36]
web page.
Acknowledgements
Funding for the SDSS and SDSS-II has been provided by the Alfred P. Sloan
Foundation, the Participating Institutions, the National Science Foundation,
the U.S. Department of Energy, the National Aeronautics and Space Administration,
the Japanese Monbukagakusho, the Max Planck Society, and the Higher Education
Funding Council for England.
| ATLAS |

Figure CS-6. The ATLAS Detector (click for more images.)
|
Collaborators, Organizations
The ATLAS collaboration consists of various boards, institutions, committees, and working groups. Over 1,850 individuals
at roughly 175 institutions across 37 countries work together. See The ATLAS Organization
[36]
for more details. A very interesting discussion on how the collaboration works can
be found at How ATLAS Collaborates
[37].
Summary/Description
One of the discoveries eagerly anticipated by particle physicists working on the world's next particle
collider is that of supersymmetry, a predicted lost symmetry of nature. Physicists from the University of
Wisconsin-Madison are using Open Science Grid resources to show that there is a good possibility of discovering
supersymmetry with data collected during the first few months of the collider's operation, if the new symmetry exists in nature.
Supersymmetry, often called SUSY, predicts the existence of superpartner particles for every known particle, or
sparticles, for every known fundamental particle..
Recent experiments have suggested that most of the matter in our universe is not made of familiar atoms, but of
some new sort of dark matter. Discovering a hidden world of sparticles may shed light on the nature of this dark
matter, connecting observations performed at earth-based accelerators with those performed by astrophysicists and
cosmologists.
Data and Control Flow
To accurately simulate the search for supersymmetry required physicists to create a gateway to three different grid
environments from their desks at CERN. They used the Virtual Data Toolkit, an ensemble of middleware tools distributed
and maintained with the collaboration of OSG members, to create an access point to resources from the Open Science Grid,
the LHC Computing Grid and the University of Wisconsin-Madison's Condor pool.
"The most difficult part was to make a grid which is interoperable, such that the requirements of all existing grid flavors
could be included," they explained. "This was done by modifying the current VDT, and consuming more than 215 CPU years in
less than two months using resources from the OSG and Madison's Condor Pool."
With so many computing resources at their disposal, they simulated for the first time an accurate background for SUSY
searches. Comparing the simulated signals for several types of SUSY against the simulated background shows that physicists
might be able to discover the long-sought sparticles with the first ATLAS experimental data.
See Simulating Supersymmetry with ATLAS
[38]
for the complete article.
Contact
See the ATLAS Experiment home page
[39].
Acknowledgements

Figure CS-7. ATLAS Collaboration Map.
SURAgrid Applications Simulation-Optimization for Threat Management in Urban Water Systems
Collaborators
Sarat Sreepathi and Mahinthakumr, NCSU
Von Laszewski and Haetgen, University of Chicago
Uber and Feng, University of Cincinnati
Harrison, University of South Carolina
Summary/Description
Contamination threat management is a very real and practical concern for any population utilizing a shared
drinking water distribution system. Several components are involved including real-time characterization of the
source and extent of the contamination, identification of control strategies, and design of incremental data
sampling schedules. This requires dynamic integration of time-varying measurements of flow, pressure and
contaminant concentration with analytical modules including models to simulate the state of the system,
statistical methods for adaptive sampling, and optimization methods to search for efficient control strategies. The
goal of this multi-disciplinary research project (NSF-funded from Jan 2006 to Dec 2008) is to develop a
cyberinfrastructure system that will both adapt to and control changing needs in data, models, computer
resources and management choices facilitated by a dynamic workflow design.
The application specifically incorporates dynamic water-usage data, in real-time, into
a simulation-optimization process to inform decision making in threat management situations.
The nature of this work is highly compute-intensive
and requires multi-level parallel processing via computer clusters and high-performance
computing architectures such as SURAgrid. The optimization component uses
evolutionary computation based algorithms and the simulation component uses
EPANET, a water distribution simulation code originally released by USEPA.
Simulation-Optimization with EPANET is part of a multidisciplinary, three-year
NSF-funded DDDAS (Dynamic Data-Driven Application Systems) research project
to develop a cyberinfrastructure system that will both adapt to and control
changing needs in data, models, computer resources and management choices
facilitated by a dynamic workflow design. Project Partners: North Carolina
State University; University of Chicago; University of Cincinnati University
of South Carolina

Figure CS-8. Graphical Monitoring Interface
The analytical modules (composed of thousands to millions of simulation
instances that are driven by optimization search algorithms) used to simulate
realistic water distribution systems are highly compute-intensive and require
multi-level parallel processing via computer clusters. While data often drive
the analytical modules, data needs for improving the accuracy and certainty
of the solutions generated by these modules dynamically change when a contamination
event unfolds. Since such time-sensitive threat events require real-time
responses, the computational needs must also be adaptively matched with available
resources. Grid environments composed of independent or loosely coupled computer
clusters (e.g., the TeraGrid, SURAgrid) are ideal for this application as
the simulation instances can be easily clustered (or bundled) into semi-independent
sets, often requiring synchronization at various stages, that can be effectively
executed in these environments through an intelligent allocation and monitoring
mechanism which is currently being implemented as a middleware feature.
SURAgrid Deployment
The integrated simulation-optimization
system developed through this project is intended to be used by the project
team members during the two-year
development phase of this project. Team members include application engineers
at North Carolina Statue University (NCSU) and the University of Cincinnati,
optimization methodology developers (NCSU and the University of South Carolina),
and computer scientists (NCSU and the University of Chicago). The application
engineers will test and analyze various water distribution contamination
problem scenarios using realistic networks. The methodology developers
will investigate various optimization search algorithms for source characterization,
demand uncertainty and sensor sampling design.
The computer scientists
will undertake the grid implementation, integration of various components,
and performance testing in different grid environments
and computer clusters, including SURAgrid. The team is using SURAgrid as
an “on-ramp” to the TeraGrid. Citing specific SURAgrid benefits
such as compute resource heterogeneity and low overhead to participate, the
team plans to ready the application for porting to the TeraGrid by uncovering
and addressing potential programming and workflow issues on SURAgrid.
Grid Workflow
To be able to run jobs on SURAgrid, the NCSU user applies for an
affiliate user certificate issued by SURAgrid site Georgia State University
(GSU),
who has a Certificate Authority (CA) that has been cross-certified with the
SURAgrid Bridge CA (BCA). Cross-certification enables SURAgrid resource sites
to trust the user certificate being presented by the NCSU user and, when
the SURAgrid User Administrator at GSU also creates a SURAgrid account for
the NCSU user, the user essentially has single-sign-on access to SURAgrid
resources at cross-certified SURAgrid sites1. After they’ve authenticated
to the SURAgrid resource, the user invokes the optimization method on the
client workstation that initiates the middleware that directly communicates
with the specific SURAgrid resource (authenticated through ssh keys) for
job submission and intermediate file movement. Currently the application
needs to be pre-staged by the user, but this functionality will be integrated
into the middleware. The middleware, which uses public key cryptography,
will provide a seamless, python-based application interface for staging initial
data and executables, data movement, job submission, and real-time visualizations
of application progress. The interface uses passwordless ssh commands to
create the directory structure necessary to run the jobs and handles all
data movement required by the application. It launches the jobs at each site
in a seamless manner, through their respective batch commands. The middleware
is able to minimize resource queue time by querying the resource at a given
site to determine the size of resource to request. Most of the middleware
functionality has been implemented at least at a rudimentary level and efforts
are now focused on better integration and sophistication.
In addition to
the middleware interface described above, the application consists of two
major components: one for optimization, one for simulation.
The optimization component presently used on the SURAgrid is called JEC
(Java Evolutionary Computation toolkit), This is the client side that drives
the
simulation component by calling the middleware interface. Evolutionary
algorithms call multiple instances of simulations (typically hundreds) at
each generation
(or iteration) and require synchronization at each generation as the simulation
results have to be processed before beginning the next generation. Everything
on the server side (middleware, simulation component, and the grid resources)
is transparent to the client.
The simulation component is an MPI C wrapper
written around EPANET that does a number of things. It bundles multiple
simulations (typically hundreds)
and performs simultaneous execution of these on a single cluster via
a coarse-grained
MPI-based parallelism feature. The wrapper saves a considerable amount
of processing time by not duplicating I/O and parts of simulations that
are
common to all simulation instances. It also has a persistent capability
such that, once an EPANET job is launched, it does not need to exit until
all
simulation instances have been completed across all generations of an
evolutionary algorithm (i.e., once the simulation outputs are written for
a given generation,
it can maintain a wait state until the next set of evaluations arrives
from the middleware). The output files are moved back to the client workstation
as the simulation progresses on the resource side. A python/TK real-time
visualization tool developed by NCSU then enables visualization of the
progress
of the algorithm on the water distribution network. The visualization
tool also creates PNG files of various stages of the output.
Acknowledgements
Simulation-Optimization with EPANET is part of a multidisciplinary, three-year
NSF-funded DDDAS (Dynamic Data-Driven Application Systems) research project
to develop a cyberinfrastructure system that will both adapt to and control
changing needs in data, models, computer resources and management choices
facilitated by a dynamic workflow design.
Multiple Genome Alignment on the Grid
Collaborators
Georgia State University
SURA
Summary/Description
This application takes a number of genome sequences as input and gives an
aligned sequence based on their structure by using a pairwise alignment algorithm.
When run on grids like SURAgrid, carefully designed and grid-enabled algorithms
like this, which implement a memory efficient method for computation and
are also parallelized efficiently so that the workload is well distributed
on grids, afford bioinformatics users a performance comparable to cluster
environments while giving them added flexibility and scalability.
Biological sequence alignment is used to determine the nature of the biological
relationship among organisms, for example, in finding evolutionary information,
determining the causes and cures of diseases, and for gathering information
about a new protein. Multiple genome sequence alignment (where several genome
sequences are aligned rather than only two) is very important for analysis
of genome and protein structures — particularly for showing relationships
among structures being aligned. A significant challenge to researchers is
the computational requirements to align multiple (more than three) sequences
of very large size. With Georgia State University’s (GSU) core research
initiatives in life sciences, and particularly protein structure analysis,
Dr. Yi Pan, currently GSU Chair Computer Science, and Nova Ahmed, as his
graduate student, provided a significant contribution in this area by deploying
a parallelized multiple sequence alignment algorithm application in a grid
environment, thus improving computer processing of the large sequence lengths
typical of genomic and proteomic science.
SURAgrid Deployment
Although the parallel algorithm requires inter-processor communication to
compute multiple aligned sequences, it actually reduces overall computation
by independently solving and then merging a set of tasks. The new algorithm,
which was initially designed for a shared memory architecture where it is
helpful to reduce the memory requirement, did indeed improve performance
during its initial runs. However, the resulting algorithm and its parallelization
is also suited to grid environments such as SURAgrid that benefit this type
of distributed, computationally intensive work. Ahmed’s tests of grid-enabled
clusters showed comparable performance to that of non-grid-enabled clusters
(there was negligible overhead from the grid layer services) and a significant
improvement over older shared memory-type systems. Pan and Ahmed’s
algorithm can provide very scalable, cost-effective computational performance
for grid environments, where job submission and scheduling can be easier
since users don’t need account on every node and can submit multiple
jobs at one time.

Figure CS-9. Parallel load distribution among processors
for multiple sequence alignment
There were several iterations of testing for both the code and Georgia State
and SURAgrid’s access management infrastructure components. The end
result of the collaboration is that Georgia State users run the multiple
genome alignment application through the integration of their personal identity
verification into Georgia State’s campus identity management environment,
which is then leveraged to provide external access to all SURAgrid resources.
To create a local grid certificate, the user sends a request from their
official campus email and is issued a grid certificate based on their unique
CampusID. The ACS Certificate Authority (CA) that ACS created and cross-certified
with the SURAgrid Bridge CA (BCA), provides the local user’s passport
to SURAgrid resources. The cross-certification process enables a SURAgrid
resource to trust the Georgia State local certificate being presented by
the user. The user experience is further simplified by Georgia State’s
use of the SURAgrid user account system that essentially provides single-sign-on
access to SURAgrid resources at cross-certified SURAgrid sites. The account
management system overlays the cross-certification process and empowers the
SURAgrid User Administrator from Georgia State to easily issue SURAgrid user
accounts. The user’s Georgia State issued certificate invokes the Globus
Toolkit that allows Globus, on behalf of the algorithm application, to manage
the grid services necessary to submit the application’s jobs to various
SURAgrid resources.
Conclusion
As Georgia State continues to deploy grid technology, policies and processes
of their campus grid, they expect the multiple genome algorithm alignment
code will continue to be used to test and perfect the grid. Considering that
it also provides a memory efficient, pair-wise alignment for large biological
sequences in an optimal way, the application is an invaluable asset to Georgia
State and to others interested in improved sequence alignment using SURAgrid
resources.
Acknowledgements
Nova Ahmed, Ph.D. student CS, Georgia Tech
Victor Bolet, Analyst Programmer Intermediate, Advanced Campus Services Georgia
State
Dharam Damani, MS student CS, Georgia State University
Nicole Geiger, Analyst Programmer Associate, Advanced Campus Services Georgia
State
Yi Pan, Professor, Chair Computer Science, Georgia State
Grid Deployments
Texas Tech TechGrid Texas Tech TechGrid
Collaborators
Texas Tech University
Summary/Description
The Texas Tech grid project, TechGrid, mission is to integrate the numerous
and diverse computational, visualization, storage, data, and spare lab desktop
resources of Texas Tech University into a comprehensive campus cyber infrastructure
for research and education. The integration of these vast resources into
TechGrid will enable resource access and sharing on an unprecedented scale,
while new Web-based and command-line interfaces will facilitate new models
for utilization and coordination. The goals of rapid deployment, adoption,
and evolution of TechGrid will enable it to serve as a research and teaching
computing infrastructure, while also providing a platform for grid computing
R&D. TechGrid will thus present a unique campus environment for knowledge
discovery and education.
About TechGrid
Texas Tech University grid, TechGrid, developed
and deployed in 2002, is a comprehensive cyber infrastructure project to
bring a distributed-knowledge environment to Texas Tech research and education.
TechGrid consists of 600 Windows and Linux PC's donated from various parts
of campus to share spare computational cycles while the donated resources
are not being used. The grid software used to integrate these compute resources
together is called Condor. Condor is a grid middleware package developed
by the University of Wisconsin. During the past five years, TechGrid has
helped facilitate the massive computing needs of research projects involving
computational chemistry, bioinformatics, biology, physics, mathematics, engineering,
and business statistical analysis. Additionally, TechGrid has been instrumental
in teaching distributed and grid computing in the Texas Tech Advanced Technology
Learning Center, Texas Tech Teaching Learning and Technology Center, Texas
Tech Jerry Rawls School of Business, Texas Tech Computer Science department
as well as the Texas Tech Mathematics and Statistics department.
The goal of the TechGrid project is to enable significant
advances in scientific discovery and to foster innovative educational programs. TechGrid
will integrate and simplify the usage of the diverse computational, storage,
visualization, and some data resources of Texas Tech to facilitate new, powerful
paradigms for research and education. The project will serve as a model for
other campuses wishing to develop an integrated cyber infrastructure for
research and education.
Middleware
The grid distributes a compute job among compute nodes within the grid using
grid middleware as the means to facilitate distributed computing. The
name of the grid middleware is Condor.
What is Condor?
From the University of Wisconsin Condor site
[68]:
Condor is a specialized workload management system for compute-intensive
jobs. Like other full-featured batch systems, Condor provides a job queuing
mechanism, scheduling policy, priority scheme, resource monitoring, and resource
management. Users submit their serial or parallel jobs to Condor, Condor
places them into a queue, chooses when and where to run the jobs based upon
a policy, carefully monitors their progress, and ultimately informs the user
upon completion.
While providing functionality similar to that of a more traditional batch
queuing system, Condor's novel architecture allows it to succeed in areas
where traditional scheduling systems fail. Condor can be used to manage a
cluster of dedicated compute nodes (such as a "Beowulf" cluster).
In addition, unique mechanisms enable Condor to effectively harness wasted
CPU power from otherwise idle desktop workstations. For instance, Condor
can be configured to only use desktop machines where the keyboard and mouse
are idle. Should Condor detect that a machine is no longer available (such
as a key press detected), in many circumstances Condor is able to transparently
produce a checkpoint and migrate a job to a different machine which would
otherwise be idle. Condor does not require a shared file system across machines
— if no shared file system is available, Condor can transfer the job's data
files on behalf of the user, or Condor may be able to transparently redirect
all the job's I/O requests back to the submit machine. As a result, Condor
can be used to seamlessly combine all of an organization's computational
power into one resource.
Definitions, Components, and Software tools
Definitions
1. Grid Zone: is a department or lab associated with a campus department
that has volunteered resources to be used by the grid.
2. Grid Zone Administrator: a person who is responsible for the grid zone
in their individual departments.
3.Campus Grid Administrator: a person who is responsible for the maintenance,
upkeep, and operation of the grid, HPCC grid research, grid training, and
interfacing with the general computing user base to supply grid based and
High Performance Computing support and services to the Texas Tech campus
community.
4. Grid Node: is an individual computer within a Grid Zone that contributes
compute cycles to the grid.
5. Grid Attribute: individual settings such as permissions, performance,
or scheduling mechanism that can be controlled by the Grid Administrator.
6. Bootstrap Server: is the central grid server responsible for controlling
grid functions and job management.
Components
Figure CS-9. Job distribution on TechGrid.
Applications
Applications on the TechGrid include:
The Proth
[40]
code was provided
by Dr. Chris Monico and grid-enabled
to run on TechGrid. The code used several thousand CPU
hours to look for prime numbers from sieved candidates.
The Partial Differential Equation
[41]
grid project of Dr. Sandro Manservisi
was grid-enabled and used 1200 CPU hours.
The grid-enabled Multivariate
Minimization project was completed and published at Global Grid Forum 8
. Title: Multivariate Minimization Using Grid Computing by K. Kulish, J. Perez, P. Smith.
[42]
Installation of and experimention with
SRB (Storage Resource Broker) data grid
[44]
was completed.
The San Diego Supercomputing Center's supercomputing library of space movies were accessed.
| In collaborattion with the Biology department, a grid-based
BLAST
[46]
was explored. Basic grid BLAST jobs were possible; however a means to move data was still required to handle
large BLAST datasets. Dr. Natalya Klueva and Dr. Randy
Allen were the contacts for this project.
|
|
| In collaboration with the Rawls College of Business a
SAS-based compute grid
[52]
was created. The grid was designed and deployed in a 3
week period. Dr. Peter Westfall is the major contact for this project.
|
|
|
A physics space simulation
"Neighbors" for a physics graduate thesis
[53]
was grid-enabled. The purpose was to simulate
the effects of tumbling debris on a spacecraft upon reentry into the Earth's
atmosphere. Several thousand simulations were processed.
|
|
Texas
Tech HPCC and the University of Virginia joined Data Grid to test the Internet2
connectivity between universities. Results were published in the ACM Journal
of Computing.
 |
In the USDA Grid
Bioinformatics Project
[54]
TechGrid helped Dr. Scot
Dowd with the Administration of Blast jobs to analyze the
pig genome using TechGrid and Rocks clustering. This was a collaborative
effort between Texas Tech and the USDA.
|
| ENDYNE is a grid implementation of the
electron nuclear dynamics theory: a coherent-states chemistry.
ENDYNE is a TTU grid project that involves TTU computational chemists and
TTU HPCC staff developing a grid-based method of calculating a coherent-states
simulation that uses classical theoretical models and quantum mechanics to
simulate the relationships between chemical atomic interactions.
|
Snapshots of a head-on collision of a proton and a hydrogen molecule at three different times.
Snapshots of a collision of a proton splitting the bond of hydrogen molecule at three different points of the trajectory.
|
TechGrid Status
TechGrid's compute nodes are located in the Advanced Technology Learning
Center (ATLC), the High Performance Computing Center (HPCC) at Reese Center,
the Computer Science department, the Business Building, the North Computing
Center, and the Math Building. Currently, TechGrid is made up of 600+
compute nodes spanning several domains and three operating systems.
Figure CS-10. The campus-wide
grid is distributed across the TTU campus.
Contact
Jerry Perez, Texas Tech University.
URL: http://www.hpcc.ttu.edu/techgrid.html [56]
White Rose Grid White Rose Grid, WRG
Collaborators, Organizations
The White Rose Consortium in Yorkshire, England: The universities of Leeds,
Sheffield, and York
Figure CS-11. The White Rose Grid.
Summary/Description
The White Rose Grid (WRG) e-Science Centre brings together those researchers
from the Yorkshire region who are engaged in e-Science activities and through
these in the development of Grid technology. The initiative focuses on building,
expanding and exploiting the emerging IT infrastructure, the Grid, which
employs many components to create a collaborative environment for research
computing in the region.
The White Rose Grid (WRG) at Leeds also hosts one of the four core nodes
of the National Grid Service (NGS), which offers a production quality grid
service
for use by UK academia. (The other nodes are at CCLRC-RAL, Oxford, and Manchester.)
Components and Software/Toolkits
The White Rose Grid comprises five large compute nodes of
which three are located at the University of Leeds, one at the University of
Sheffield and one at the University of York. It offers a heterogeneous computing
environment based on
Sun Microsystems
[57]
multiprocessor computers, and Intel Xeon and AMD Opteron
based systems built by
Streamline Computing
[58].
These nodes are interconnected by the network managed by YHMAN.
- The Leeds Grid Node 1 is a constellation
of shared-memory systems based on Sun Fire 6800 and V880 systems configured
with UltraSPARC III Cu 900MHz processors and large physical memory (32GB).
- The Leeds Grid Node 2 comprises two Linux clusters based on 2.2 & 2.4
GHz Intel Xeon processors interconnected with Myrinet 2000 networks, and
in total delivering 292 CPUs.
- The Leeds Grid Node 3 comprises
Sun Microsystems? Sun Fire V40z and V20z servers with dual-core AMD
Opteron processors supplied by Esteem Systems and integrated by Streamline
Computing.
Seven of these (V40z) comprise four 2.2 GHz dual-core processors configured
with 192 GB memory. Eighty seven V20z servers are interconnected with
a Myrinet network; each of these comprises two 2.0 GHz dual-core processors
sharing in
total 0.7 TB of distributed memory across 348 processor cores. The
system
runs the Linux (64-bit SuSE) operating system.
- The Leeds Nodes are connected to 12 TB SAN storage
and two EMC Centera disk-based archiving systems set up to provide
12TB of archive space to users. Sun HPC ClusterTools, Sun Forte Developer
software
and Sun Grid Engine Enterprise Edition are installed on all systems.
- The 160 processor WRG Sheffield node
has been supplied by Sun Microsystems and integrated by Streamline
Computing. Eighty of these 2.4GHz AMD Opteron processors are 4-way nodes
with 16GB
main memory coupled by a Myrinet network; the remaining eighty nodes
are 2-way
nodes with 4GB main memory.
- At Sheffieled there is also a Tier-2 GridPP node supporting
the particle physics grid. This system is configured with 160 processors
in 2-way nodes, and it runs 64-bit Scientific Linux, which is Redhat
based.
- The York Node includes two Beowulf type clusters, one
(24 machine cluster; each providing two 2.4GHz dual
core processors and 8 GB memory) in total offering 96 processor cores,
192 GB memory
and 4.8 TB local scratch space; and the other which
comprises 3 large memory nodes, each consisting of four 2.4 GHz dual
core processors
(8 cores per machine) and 8GB memory, in total delivering 24 processor
cores configured with 96GB memory and 0.9 local scratch space. All
these nodes are
connected into a 10GB/s infinipath network for fast file access. In
addition the cluster nodes are able to use this network for very low latency <2m MPI
applications. Over 9TB of backed up storage is provided for users on
SATA drive arrays and a 1 TB networked scratch space on f/c arrays.
WRG systems support applications written in FORTRAN, C,
and C++, implementing parallelism through MPI or OpenMP. A couple
of the Sun Fire V880s serve the open source Grid Portal, which interoperates
with Globus middleware and Sun Grid Engine Enterprise Edition.
Furthermore, at the University of Leeds there is also the
Virtual Environments Laboratory which comprises a T.A.N. 3D Holobench, SGI Onyx2
with interactive devices and projectors. Also a recently acquired visualisation
node is available at Leeds for WRG researchers.
See the
White Rose Grid Compute Node
[59]
description for more information.
Applications
The following applications include current and past projects. See the
White Rose Activities
[60]
page for more projects and more information on each project.
CARMEN is
a 4-year EPSRC funded e-Science Pilot Project involving 11 Universities
and 19 Investigators. It aims to use grid technologies to enable
experimenters in neurophysiology to archive their datasets in a structure,
making them widely accessible for computational modelers and algorithm
developers to exploit. The project will provide integrated and
coordinated services for the neuroscience data, enabling neuronal signal
detection, sorting and analysis, as well as visualisation and modeling.
Furthermore it will enable direct near real-time analysis of streamed
experimental data, providing information to distributed teams of specialists
that will allow difficult experiments to be optimised.
COLAB is
a joint research project of the Universities of Leeds (UK) and Beihang
in Beijing (China) co-led by Profs J Xu (Leeds) and J Huai (Beihang),
and managed by the EPSRC White Rose Grid e‑Science Centre established
between Universities of Leeds, York and Sheffield. The project
relates to the CROWN (China Research environment Over Wide-area Network)
grid middleware system originally developed at Beihang University.
Two sub-groups research the areas of Fault and Attack Tolerance, and
Fault
Injection-based Evaluation. Amongst other topics they investigate the
provision of topologically aware fault and intrusion tolerance in grid
systems as well as the provision of revised fault models for grid applications.
Grid-FIT (Grid-Fault
Injection Technology) is a fault injector that utilizes network level
fault injection to assess grid systems. Grid-FIT has been implemented
specifically to test SOAP based web services systems and Globus systems.
Integrative Biology addresses
two key problems in medicine today: the causes of cardiac failure and cancer
tumours. Scientists are
developing multi-scale models (from cells to whole organs) to help
understand these problems. The size and complexity of the models
demands significant compute power, and so this project brings together
scientists and Grid computing experts. The project is being led
by the University of Oxford and involves partners across the world,
including the USA and New Zealand. Our contribution is in the
area of computational steering and visualization, and is led by Professor
Ken Brodlie and Dr James Handley.
The MoSeS
(Modeling and Simulation for e-social Science) project is
undertaken by the National Centre for e-Social Science node at the
University of Leeds. The objective of this project is to develop
representation of the entire UK population as individuals and households,
together with a package of modeling tools which allows specific research
and policy questions to be addressed.
The Scientific
e-Communities Architecture (SeCA) project
focuses on the design and evaluation of a novel Collaborative e-Science
Architecture and its application, in the first instance to combustion
chemistry. The project exploits Peer-to-Peer (P2P) technologies
for supporting this scientific community model and a grid-based workgroup
architecture for providing access to large computation and data resources.
There are a number of challenges in realising the vision, for example,
effective P2P resource discovery.
DAME
(Distributed Aircraft Maintenance Environment), led by Prof
Austin of York, was a major (£3.5m) e-Science
project, which has developed a generic test-bed for distributed diagnostics.
The application demonstrator built within the project offers a distributed
maintenance environment motivated by the needs of Rolls Royce and
its information system partner, Data Systems and Solutions.
The
e-Demand project was supported by the Leeds and Durham Grid
consortium, which includes experts from both academia and industry.
The project has developed a demand-led and service-centric architecture
for building complex but dependable and secure Grid applications
based on the notion of ultra-late binding, dynamically bound service
components, combined with atomic actions as a powerful control abstraction.
GEMSS
(Grid-enabled Medical Simulation Services) is
funded by the EU FP5 programme and is concerned with creating an
environment in which computationally demanding tools native to the
Health-Care sector can be made available to a wide spectrum of users.
The goal is to provide a transparently accessible health computing
resource suited to solving problems of large magnitude, with the
end user having no awareness of the Grid computing platform(s). The
project will evaluate the viability of this approach through several sample applications,
including maxillo-facial surgery planning, neuro-surgery support,
medical image reconstruction, radiosurgery planning and lung/cardiovascular
simulations — the latter two have their base in Sheffield (Medical
Physics)
GOSPEL, led
by Professor M Berzins of Leeds University,
and carried out in collaboration with Shell Research, has brought
together advanced visualization, problem-solving
environments, and computational techniques to create a Grid based
workbench for the
computational modeling of lubricants.
This
ESRC demonstrator and the follow-on HYDRA2 project, both
led by Dr M Birkin and Prof P M Dew from the University of Leeds,
have
demonstrated the use of grid technologies in support of the decision-making
process in health care planning. A disparate set of data sources
as well as a decision support module and visualization have
been integrated to present the results.
myGrid will
design, develop and demonstrate higher level functionalities over an existing
Grid infrastructure that support scientists in making use of complex distributed
resources. The project will develop a virtual laboratory workbench that
will serve the life sciences community.
Future Plans
Their future plans include determining ways to continue to fund grid
computing across the universities, including the challenge that each school
uses a different funding model. They are also looking at more relationship
opportunities.
Contact
See Contact Details
[61]
for more information.
Acknowledgements
The White Rose Grid project operates under the auspices of the White Rose
University Consortium, which is an affiliation of the three Yorkshire Universities
of Leeds, York and Sheffield. This is a collaborative venture between the
White Rose Universities and our IT partners: Esteem Systems, Sun Microsystems,
and Streamline Computing.
The Yorkshire and Humber Development Agency, Yorkshire
Forward, is enabling us to expand our activities into the region and engage
research universities
and companies in e-Science.
The project has also received funding from the
UK e-Science Core Programme, Esteem Systems, and the White Rose Universities.
Grid in New York State Collaborators, Organizations
This grid is led by Dr. Miller's Cyberinfrastructure Laboratory. Current collaborating institutions include Columbia
University, the Hauptman-Woodward Medical Research Institute, Marist College, Niagara University, SUNY-Buffalo,
SUNY-Geneseo, University of Rochester, and Syracuse University.
Summary/Description
The Cyberinfrastructure Laboratory
[xx]
designed and deployed a Buffalo-based grid (ACDC-Grid) and a Western New York Grid (WNY Grid) before branching out to
create a Grid involving institutions throughout New York State. This
statewide Grid
[xx]
includes resources from a variety of institutions and is available in a simple and seamless fashion to users worldwide.
This statewide Grid contains a heterogeneous set of resources and utilizes general-purpose IP networks
[62,
63,
64,
65].
A major feature of this grid is that it integrates a computational grid (compute clusters that have the ability to
cooperate in serving the user) with a data grid (storage devices that are similarly available to the user) so that the
user may deploy computationally intensive applications that read or write large volumes of data files in a very simple fashion.
In particular, this statewide Grid was designed so that the user does not need to know where data files are physically
stored or where the application is physically deployed, while providing the user with easy access to their files
in terms of uploading, downloading, editing, viewing, and so on.
The core infrastructure for this Grid encompassing institutions throughout New York State includes the installation of
standard grid middleware and the use of an active Web
portal for deploying applications. Several key packages were used in the implementation of NYS Grid and other packages
have been identified in order to allow for the anticipated expansion of the system. The Globus Toolkit provides APIs
and tools using the Java SDK to simplify the development of OGSI-compliant services and clients. It supplies database
services and Monitoring & Discovery System index services implemented in Java, GRAM service implemented in C with a
Java wrapper, GridFTP services implemented in C, and a full set of Globus Toolkit components. The recently proposed
Web Service-Resource Framework provides the concepts and interfaces developed by the OGSI specification exploiting
the Web services architecture.
This statewide Grid represents the next Grid in an evolution from an experimental Buffalo-based grid that involved a
variety of independently run organizations at SUNY-Buffalo, as well as other local institutions, including Buffalo
State College, the Hauptman-Woodward Medical Research Institute, and Canisius College to a persistent and hardened
heterogeneous Western New York Grid that includes Niagara University, Geneseo State College, the Hauptman-Woodward
Medical Research Institute, and SUNY-Buffalo. This Grid that includes institutions throughout New York State provides
a variety of applications in order to support the users at the affiliated institutions, other users in New York State,
as well as users from Open Science Grid.
Middleware Efforts
The New York State Portal
[46,
47,
48,
49].
which was derived from the ACDC-Grid Portal, provides access to a dozen
or so compute-intensive software packages, large data storage devices, and the ability to submit applications to a
variety of grids containing tens of thousands of processors. Our Grid Portal integrates several software packages and
toolkits in order to produce a robust system that can be used to host a wide variety of scientific and engineering
applications. Specifically, our portal is constructed using the Apache HTTP server, HTML, Java and PHP scripting,
PHPMyAdmin, MDS/GRIS/GIIS from the Globus Toolkit, OpenLDAP, WSDL, and related open source software that interfaces
with a MySQL database.
Our Grid Portal provides a single-point of access to our statewide Grid for those users who want to concentrate on their
disciplinary research and scholarship and do not want to be burdened with low-level details of utilizing a Grid.
Applications are typically ported to the Grid Portal through our Grid-Enabling Application Templates, which provide
developers with a template for porting a fairly traditional science or engineering application to our Grid-based Web
Portal. This approach provides the developer with access to various databases, APIs, PHP scripts, HTML files, shell
scripts, and so on, in order to provide a common platform to port applications and for users to efficiently utilize
such applications. The generic template for developing an application provides a well-defined standard scientific
application workflow for a Grid application. This workflow includes a variety of functions that include data grid
interactions, intermediate processing, job specification, job submission, collection of results, run-time status, and
so forth. The template provides a flexible methodology that promotes efficient porting and utilization of scientific
routines. It also provides a systematic approach for allowing users to take advantage of sophisticated applications by
storing critical application and user information in a MySQL database. Most applications have been ported to our
Grid Portal within 1-2 weeks.
Our lightweight Grid Monitoring software
[66]
is used to monitor resources from a variety of Grids, including the
statewide Grid, Western New York Grid, Open Science Grid, Open Science Grid Testbed, and TeraGrid, to name a few.
With production Grids still in their infancy, the ability to efficiently and effectively monitor a grid is important
for users and administrators. Our Grid Monitoring System runs a variety of scripts continually, stores information
in a MySQL database, and displays the information in an easy to digest and navigate Grid Dashboard. The Dashboard is
served by an Apache Server and is written in Java and PHP scripts. It provides a display that consists of a radial
plot in the center of the main page that presents an overview of an available Grid, surrounded by histograms and
other visual cues that present critical statistics. By clicking on any of these individual components, the user can
drill down for more details on the information in question. These drilldown presentations include dynamic and
interactive representations of current and historical information. For example, a user or administrator can easily
determine the number of jobs running or queued on every system of any available Grid, the amount of data being added
or removed from nodes on a grid, as well as a wealth of current and historical information pertaining to the individual
nodes, Grids, or virtual organizations on an available Grid. Our work contributes to the widespread monitoring
initiative in the distributed computing community that includes NetLogger, GridRM, Ganglia, and Network Weather
Service, to name a few.
Our Grid Operations Dashboard
[67]
was designed to provide discovery, diagnosis, and the opportunity for rapid
publication and repair of critical issues to grid administrators. The operational status of a given resource is
determined by its ability to support a wide variety of Grid services, which Prescott typically refers to as site
functional tests. Tests are performed regularly and sequentially in order to verify an every more complex set of
services on a node. These results are reported in our Operations Dashboard in an easy to read chart.
The development of data storage solutions for the Grid and the integration of such solutions into Grid Portals is
critical to the success of heterogeneous production-level Grids that incorporate high-end computing, storage,
visualization, sensors, and instruments. Data grids typically house and serve data to grid users by providing
virtualization services to effectively manage data in the storage network. The Storage Resource Broker is an
example of such a system. Our Intelligent Migrator, currently being integrated into our Grid Portal, represents
an effort to provide a scalable and robust data service to the users of this statewide Grid. The Intelligent Migrator
examines and models user utilization patterns in an effort to make efficient use of limited storage so that the
performance of our physical data grid and the services provided to our computational grid are significantly enhanced.
Our integrated Data Grid provides users with seamless access to their files, which may be
distributed across multiple storage devices. Our system implements data virtualization and a simple storage element
installation procedure that provides a scalable and robust system to the users. In addition, our system provides
a set of on-line tools for the users so that they may maintain and utilize their data while not having to be
burdened with details of physical storage location.
Applications
The Cyberinfrastructure Laboratory has enabled the successful porting and implementation of numerous applications to this statewide
Grid.
- Shake-and-Bake(SnB) — Molecular Structure Determination Application
- Buffalo-and-Pittsburgh (BnP) — SnB and PHASES Complete Protein Phasing
- Ostrich — Optimization and Parameter Estimation Tool for Groundwater Modeling
- Aseismic Design & Retrofit (EADR) — Passive Energy Dissipation System for Designing Earthquake Resilient Structures
- Princeton Ocean Model Great Lakes (POMGL) — Great Lakes Hydrodynamic Circulation Model
- Titan — Computational Modeling of Hazardous Geophysical Mass Flows
- Chem — Commercial Quantum Chemistry Software Package
- NWChem — Computational Chemistry Software Package developed and maintained by DOE
- Split — Modeling Groundwater Flow with the Analytic Element Method
Future Plans
The goal of this Grid is to bring a mixture of organizations, both public and private, onto a shared grid within
New York State. The nodes on the grid will include compute systems, storage devices, visualization systems, sensors,
imaging systems, and a wide variety of Internet-ready devices. To date, the Cyberinfrastructure Laboratory has
reached more than a dozen organizations throughout the state.
An on-going project with very positive, yet preliminary, results is our intelligent scheduling system. This system
uses optimization algorithms and profiles of users, their data, their applications, as well as network bandwidth and
latency, to improve a grid meta-scheduling system.
Acknowledgements
Funding for these grid initiatives was provided by the National Science Foundation under a series of grants including an
ITR, MRI, and CRI. Additional support was provided by the Center for Computational Research at SUNY-Buffalo.
Critical personnel responsible for establishing and maintaining the grid include students, staff, and post-docs from
SUNY-Buffalo, namely, Jon Bednasz, Steve Gallo, Mark Green, Cathy Ruby, and Naimesh Shah. In addition, grid administrators
at the participating institutions have been extremely responsive and worked in an extraordinarily collaborative fashion.
We would also like to thank the members of Open Science Grid for all of their technical support, input, and advice.
Bibliography[1] I. Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of Supercomputer Applications, 15(3), 2001.
[2] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit,” International Journal of Supercomputer Applications, 11(2):115-128, 1997.
[3] J. Novotny, S. Tuecke and V. Welch, “An Online Credential Repository for the Grid: MyProxy,” Proceedings of the Tenth International Symposium on High Performance Distributed Computing (HPDC-10), August 2001.
[4] Open Grid Computing Environment.
(http://www.collab-ogce.org/nmi/index.jsp) [5] W. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnal and S. Tuecke, “Data Management and Transfer in High Performance Computational Grid Environments,” Parallel Computing, 28 (5), pp. 749-771, May 2002.
[6] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith and S. Tuecke, “A Resource Management Architecture for Metacomputing Systems,” Workshop on Job Scheduling Strategies for Parallel Processing, pg. 62-82, 1998.
[7] I. Foster, C. Kesselman, G. Tsudik and S. Tuecke, “A Security Architecture for Computational Grids,” Fifth ACM Conference on Computer and Communications Security, pp. 83-92, 1998.
[8] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke. “A Resource Management Architecture for Metacomputing Systems.” Proc. IPPS/SPDP '98 Workshop on Job Scheduling Strategies for Parallel Processing, pg. 62-82, 1998.
[9] R.A. Luettich, J. J. Westerink, and N. W. Scheffner, ADCIRC: An advanced three-dimensional circulation model for shelves, coasts and estuaries; Report 1: theory and methodology of ADCIRC- 2DDI and ADCIRC-3DL, Technical Report DRP-92-6, Coastal Engineering Research Center, U.S. Army Engineer Waterways Experiment Station, Vicksburg, MS, 1992.
[10] Unidata Local Data Manager, 2006.
(http://www.unidata.ucar.edu/software/ldm/) [11] P. Bogden, G. Allen, G. Stone, J. Bintz, H. Graber, S. Graves, R. Luettich, D. Reed, P. Sheng, H. Wang,W. Zhao, The Southeastern University Research Association Coastal Ocean Observing and Prediction Program: Integrating Marine Science and Information Technology," Proceedings of the OCEANS 2005 MTS/IEEE Conference. Sept 18-23, 2005.
[12] D. Huang, G. Allen, C. Dekate, H. Kaiser, Z. Lei and J. MacLaren "getdata: A Grid Enabled Data Client for Coastal Modeling," HPC2006.
[13] P. Bogden, "The SURA Coastal Ocean Observing and Prediction Program (SCOOP) Service-Oriented Architecture," Proceedings of MTS/IEEE 06 Conference in Boston, Session 3.4 on Ocean Observing Systems, September 18-21, 2006.
[14] J. Bintz et al. "SCOOP: Enabling a Network of Ocean Observations for Mitigating Coastal Hazards," Proceedings of the Coastal Society 20th International Conference, 2006.
[15] SCOOP Website, 2006.
(http://scoop.sura.org/) [15a] SCOOP Partners
(http://scoop.sura.org/partners.html) [16] North Carolina Forecasting System.
(http://www.renci.org/projects/indexdr.php) [17] S. Graves, K. Keiser, H. Conver, M. Smith. “Enabling Coastal Research and Management with Advanced Information Technology,” 17th Federation Assembly Virtual Poster Session, July 2006.
[18] G. von Laszewski, I. Foster, J. Gawor, and P. Lane, "A Java Commodity Grid Kit," Concurrency and Computation: Practice and Experience, vol. 13, no. 8-9, pp. 643-662, 2001.
(http:/www.cogkit.org/) [19] K. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman, “Grid Information Services for Distributed Resource Sharing.” Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press, August 2001.
[20] R. Wolski, N. Spring, C. Peterson, “Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service,” in Proceedings of SC97, November, 1997.
[21] OSG Council
(http://www.opensciencegrid.org/About/Who_is_the_Open_Science_Grid%3F/OSG_Council_Members) [22] OSG Virtual Organizations
(http://www.opensciencegrid.org/About/OSG_Organization/Virtual_Organizations) [23] OSG Technical Activity Groups
(http://www.opensciencegrid.org/About/OSG_Organization/Technical_Activities) [24] MonALISA Graph of OSG Activity
(http://monalisa.grid.iu.edu:8080/show?page=index.html) [25] US CMS Institutions and Members
(http://uscms.fnal.gov/uscms/organization/uscms_institutes_t_members.html) [26] U.S. CMS website
(http://www.uscms.org/Public/overview.html) [27] USCMS Software and Computing
(http://www.uscms.org/SoftwareComputing/index.html) [28] CERN Archtectural Blueprint RTAG
(http://lcgapp.cern.ch/project/blueprint/BlueprintReport-final.doc) [29] Feature: Meeting the Data Transfer Challenge, ISGTW, Jan 17, 2007
(http://www.isgtw.org/?pid=1000226) [30] 2007 Open Science Grid Consortium Meeting, UCSD, San Diego, CA, March 5-8, 2007, Frank Wurthwein, OSG Application Coordinator, OSG Extension Lead, Experimental Elementary Particle Physics, UCSD
[31] US CMS Organization, Institution, and Member Contacts
(http://www.uscms.org/Public/contact.html) [32] SDSS Institutions
(http://www.sdss.org/members/index.html) [33] SDSS Advisory Council
(http://www.sdss.org/directorate/adco.html) [34] SDSS Website
(http://www.sdss.org/) [35] SDSS — About US
(http://www.sdss.org/background/) [36] SDSS — Contact US
(http://www.sdss.org/contacts.html) [37] How ATLAS Collaborates
(http://atlasexperiment.org/hac.html) [38] Simulating Supersymmetry with ATLAS
(http://tinyurl.com/2q79p9) [39] ATLAS Experiment Home Page
(http://atlasexperiment.org/) [40] Proth
(http://primes.utm.edu/programs/gallot/) [41] Partial Differential Equation
(http://www.math.ttu.edu/~smanserv/) [42] Title: Multivariate Minimization Using Grid Computing by K. Kulish, J. Perez, P. Smith.
(http://www.cs.vu.nl/ggf/apps-rg/meetings/ggf8/kulish.pdf) [43] PhD Thesis by Dr. Eric Albers
(http://www.iemss.org/iemss2002/proceedings/pdf/volume%20uno/298_albers.pdf) [44] SRB (Storage Resource Broker) data grid
(http://www.sdsc.edu/srb/index.php/Main_Page) [45] 3-D Studio Max graphics rendering grid
(http://www.arch.ttu.edu/resources/FAQ/3D/net_render_max_animation.asp) [46] BLAST
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html) [47] Query tutorial
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/query_tutorial.html) [48] BLAST tutorial
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/tut1.html) [49] BLAST Guide
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/guide.html) [50] PSI-BLASTtutorial
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/psi1.html) [51] More Information on BLAST
(http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/auxiliary.html) [52] SAS-based compute grid
(http://www.sas.com/technologies/architecture/grid/index.html) [53] "Neighbors" space simulation
(http://dspace.lib.ttu.edu/bitstream/2346/1219/1/thesis.pdf) [54] Bioinformatics Project
(http://www.animalgenome.org/pigs/) [55] "R" programming language/framework
(http://www.r-project.org/) [56] Texas Tech TechGrid
(http://www.hpcc.ttu.edu/techgrid.html) [57] Sun Microsystems
(http://www.sun.com/) [58] Streamline Computing
(http://www.streamline-computing.com/) [59] White Rose Grid Compute Node
(http://www.wrgrid.org.uk/ComputeNodes.html) [60] White Rose Grid Activities
(http://www.wrgrid.org.uk/Activities.html) [61] White Rose Grid Contact Details
(http://www.wrgrid.org.uk/Contactus.html) [62] M.L. Green and R. Miller, Grid computing in Buffalo, New York, Annals of the European Academy of Sciences, 2003, pp. 191-218.
[63] M.L. Green and R. Miller, Molecular structure determination on a computational & data grid, Parallel Computing Journal 30 (2004), pp. 1001-1017.
[64] M.L. Green and R. Miller, Evolutionary molecular structure determination using grid-enabled data mining, Parallel Computing Journal 30 (2004), pp. 1057-1071.
[65] M.L. Green and R. Miller, A client-server prototype for grid-enabling application template design, Parallel Processing Letters, Vol. 14, No. 2 (2004), pp. 241-253.
[66] C.L. Ruby, M.L. Green, and R. Miller, The Operations Dashboard: A Collaborative Environment for Monitoring Virtual Organization-Specific Compute Element Operational Status, Parallel Processing Letters, Vol. 16, No. 4 (2006), pp. 485-500.
[67] C.L. Ruby and R. Miller, Effectively Managing Data on a Grid, Handbook of Parallel Computing: Models, Algorithms, and Applications, S. Rajasekaran and J. Reif, eds., CRC Press, 2007, in press.
[68] What is Condor?
(http://www.cs.wisc.edu/condor/description.html)
|