Joining a Grid: Procedures & Examples
Introduction
One the of most effective ways to become familiar with the ins and outs of grid technology and usage is to join a grid
initiative with goals and objectives that encompass or overlap with those of your institution, and with opportunities to
develop peer collaboration and support. Through participation in such initiatives, you can leverage shared resources to
meet institutional goals and begin contributing your perspective and increasing expertise back to the community for the
collective improvement and advancement of effective use of grid technology. Several grid initiatives invite this type of
participation today; two notable examples are described below.
SURAgrid: A regional-scale multi-institutional grid
SURAgrid is a consortium of organizations collaborating and combining resources to help bring grid technology to the level of seamless, shared infrastructure. The project arose from the desire for ongoing collaboration among institutions that had been participating with SURA (Southeastern Universities Research Association) in NSF Middleware Initiative (NMI) Integration Testbed, a program that provided comprehensive evaluation of NMI middleware in the earliest years of that initiative. Facilitated by SURA, the vision for SURAgrid is to orchestrate access to a rich set of distributed capabilities in order to meet diverse users' needs. Capabilities to be cultivated include locally contributed resources, project-specific tools and environments, highly specialized or HPC access, and gateways to national and international cyberinfrastructure.
Figure JG-1. An overview of SURAgrid.
To meet the needs of its broad participant and user community, SURAgrid focused on three primary goals:
-
Develop a scalable infrastructure that leverages local institutional identity and authorization while managing access to shared resources across institutional boundaries.
-
Promote the use of this infrastructure for the broad research and education community, creating a whole that is greater than the sum of its parts.
-
Provide a forum for participating institutions to gain additional experience with grid technology and to promote collaborative project development.
With the long-term view of grids as generalized infrastructure, an emphasis on diversity and inclusion, and a persistent objective to discover and understand grid use outside the scope of expected or typical use today, SURAgrid is positioned to become an essential tool to build the scientific and educational capacity of the Southeastern U.S. and beyond.
Applications on SURAgrid
The identification of research applications that can be significantly advantaged through the application of grid technologies is a key factor in fostering grid development and deployment and also a key factor to grow and sustain SURAgrid. The deployment of an intentionally diverse set of applications is contributing to the advancement of research and education within a variety of disciplines. Applications under development on SURAgrid are detailed on the SURAgrid Web site, with a few notable examples listed below:
-
SCOOP (SURA Coastal Ocean Observing & Prediction) Coastal Ocean Modeling)
The SCOOP (http://scoop.sura.org) Cyberinfrastructure (CI) is being developed to support coastal research and operations, by providing a modular, distributed system for real time prediction and visualization of the impacts of extreme atmospheric events on coastal areas, and enabling advances in multi-scale, multi-model, and DDDAS science. The SCOOP CI enables complex workflows, which integrate coastal models such as ADCIRC, ELCIRC, WW3, and CH3D with various wind models and sensor information. SCOOP presently uses SURAgrid resources for added computational capacity.
-
EPANET Simulation-Optimization for Threat Management in Urban Water Systems
This application incorporates dynamic demand data, in real-time, into a simulation-optimization process for contamination threat management in drinking water distribution systems. The nature of this work is highly compute-intensive and requires multi-level parallel processing via computer clusters and high-performance computing architectures such as SURAgrid. Simulation-Optimization with EPANET is part of a multi-disciplinary, three-year NSF-funded DDDAS (Dynamic Data-Driven Application Systems) research project to develop a cyberinfrastructure system that will both adapt to and control changing needs in data, models, computer resources and management choices facilitated by a dynamic workflow design. Project Partners: North Carolina State University; University of Chicago; University of Cincinnati; University of South Carolina.
-
Grid-Enabled Distributed BLAST
BLAST is a database search application for matching protein and nucleotide sequences. Maximizing the throughput of searches is key to improving research results. This distributed implementation of BLAST developed by the University of Alabama at Birmingham uses the DynamicBLAST Meta-scheduler to select appropriate grid resources for select query strings. Globus is used for job staging, submission and retrieval. ncbiBLAST performs the computations. Jobs are submitted using a web-based interface that leverages campus identity credentials via Pubcookie and manages grid authentication on behalf of the user via MyProxy, providing a simplified user authentication experience.
-
SURAgrid Teaching Environment
Effective teaching about grids, within Computer Science as well as other disciplines, is greatly enhanced by students and instructors having hands-on access to a stable grid environment. Through coordinated commitment, operation and support across a subset of SURAgrid resources, SURAgrid is developing a predictable, secure and reliable grid-based teaching facility for use by SURAgrid sites in their grid course development and/or delivery. Old Dominion University has made initial use of this capability by providing basic grid access for students to supplement theory in a Distributed Computing course during Spring 2007. Targeted improvements include more scalable group account management, accommodation for varying levels of access, and space for faculty to participate in joint course development.
How SURAgrid works
As an inter-institutional grid infrastructure, SURAgrid provides a variety of application users with a common point of access to a shared set of distributed, heterogeneous resources. As of August 2007, thirty academic organizations and institutions are participating in SURAgrid. Most but not all are members of SURA, athough SURA membership is not a requirement.
Figure JG-2. The SURAgrid map.
Resources to be shared are contributed by the participating organizations and remain under autonomous control of the
resource owner, with shared access enabled through grid-wide coordination of authentication and authorization mechanisms,
and operational procedures.
Most of the resources being contributed are computational in nature, providing just over 10.5 Teraflops of combined capability
as of April 2007, for sharing among the SURAgrid community (although capacity does fluctuate as resources are
added, swapped, upgraded, etc.).
More diverse resources such as databases, instruments, storage, and application services are anticipated in the future.
SURAgrid resources can be viewed and accessed through the SURAgrid portal at https://gridportal.sura.org, which is
maintained by the Texas Advanced Computing Center (TACC).
Institutions that participate in SURAgrid are also expected to share in its organization, planning and development, as a cooperative effort to foster collaboration and build a shared asset to help meet local, regional and national goals for the advancement of science through grid technologies. Informal and "grass roots" structure and procedures for governance and decision-making are gradually being replaced by more formal components and processes, while retaining the spirit of community and collaboration that is the foundation of the initiative. New organizations can join SURAgrid by following the contact process detailed on the
SURAgrid Web site
[1],
see tab/menu item "Join SURAgrid".)
The SURAgrid infrastructure
The SURAgrid software stack, grid services and application environment have evolved to include a minimal set of requirements and
recommendations intended to be as loose as possible while providing a foundation of interoperability. Originally, the
primary need was for management and coordination of resources and applications within SURAgrid itself. More recently,
grid-to-grid integration has become of greater importance to SURAgrid participants who need to share resources with or bridge
access to other grid projects such as TeraGrid, Open Science Grid, TIGRE (Texas Internet Grid for Research and Education), and
project-specific grids such as GridChem.
SURAgrid presently uses Globus middleware to facilitate access to a variety of computational resources, such as Linux-based
clusters, IBM P575 HPC systems, Condor pools, and virtualized resources.
Adding resources to SURAgrid is facilitated through user documentation, peer support and some direct assistance from SURA staff.
Continued development of the SURAgrid environment is required to accommodate new user communities, future integration with
other grid initiatives, and an anticipated increase in corporate partnerships (such as the
SURAgrid-IBM partnership
[2].
The SURAgrid stack specification is in alignment with middleware that is in use by these and many other major academic
grid initiatives. Different applications may have requirements beyond the currently specified software stack.
Such requirements are treated as application-specific needs until they are shown to be more commonly required and so
should be incorporated into overall resource requirements.
SURAgrid resource requirements & recommendations (server side) as of June 2007:
- Required: Globus 4.x, WS-GRAM, gridFTP, WS-MDS and RFT. Pre-WS GRAM and MDS are strongly recommended to support existing legacy applications.
- GSI-OpenSSH is strongly recommended for application staging. If enabled, it is required that you advertise the port
through WS or system detail in the SURAgrid portal. We recommend using either port 22 or 2222.
- Any version of operating system that supports the required services above, with Linux 2.4 or higher recommended in
order to provide a common platform for application development.
- Addition of resource and relevant system detail to the resource monitor (GPIR) of the SURAgrid portal.
- A scheduler installed as part of your underlying resource configuration.
- Cross-certification with SURAgrid Bridge CA — strongly recommended at this time and likely to be required in the future. (See
https://www.pki.virginia.edu/nmi-bridge
[3])
- Configuration of the required environment variables as defined in SURAgrid Environment Variables. Configuration of the optional environment variables also recommended.
Users must be both authenticated and authorized to access SURAgrid resources. The Globus GSI (Grid Security Infrastructure)
relies on PKI (public key infrastructure) and its related exchange of certificates for authentication and provides for
authorization through a "grid-mapfile" that associates identities with individual system accounts. SURAgrid augments this
authentication process by leveraging authoritative campus identity management where possible for user authentication
between participating sites. Scalable exchange of this trusted information is enabled through the use of the SURAgrid
Bridge Certificate Authority (Bridge CA), maintained by the University of Virginia, SURAgrid's lead in this area.
Each site establishes a trusted relationship with the SURAgrid Bridge CA, which
essentially then "vouches" for each site to the others. In the absence of a Bridge CA, each site within a PKI infrastructure
must establish a trusted relationship with all the other participating sites, which can become exceedingly difficult, if not
impossible, to manage effectively as the number of participants increase. Within SURAgrid, participating sites typically
run their own Certificate Authority (CA) to provide both user and system certificates for participation in the SURAgrid PKI.
A SURAgrid CA is also under development, to provide certificates for sites that are not running their own CA or do not
have access to one, SURAgrid guest access, etc.
Once a SURAgrid user is successfully authenticated, he or she accesses SURAgrid resources through use of a pre-established
individual SURAgrid user account. This account is recognized on all properly configured SURAgrid resources and the
permissions inherent in the user account determine the levels of authorization (what the user is able to do). The
setup and management of SURAgrid user accounts is facilitated through several tools developed for SURAgrid by the
University of Virginia. These tools include Web-based account management, a shared LDAP directory that maintains
SURAgrid user information, and scripts that provide various levels of automation to be used for mapping user information
to the Globus GSI, to the degree desired by each site. Account access mechanisms in use on SURAgrid range from user access
through the SURAgrid portal, remote login by the user to individual resources, and software-automated access through
applications and scripts.
Implementation closeup: Installing the SURAgrid server stack SURAgrid Server Software Stack
To accommodate heterogeneity, the SURAgrid software stack, grid services and application
environment evolve based on setting a minimal set of requirements and recommendations that
increase in specificity as needs dictate. However, SURA has defined a common set of software that should be
available on all SURA server systems at this point in time, to insure interoperability among systems and support
for the current and near-term application set. To facilitate installation of the appropriate software, the SURAgrid
team is collaborating with the
TIGRE
[4]
project in the development of a "one-button" installation with stack for SURAgrid. This installation package includes
both services and clients for those services, and leverages the
Virtual Data Toolkit
[5]
(VDT) to provide a convenient way to install and configure this software. The excerpt below illustrates parts of this
automated process for adding a resource to SURAgrid.
Please check the official SURAgrid
Server Stack website
[6]
for the most current material.
Contents
The SURAgrid software stack consists of the following components:
- Globus Toolkit 4.0
[7]
(servers and clients)
- Grid Proxy programs. For obtaining X.509 credentials.
- Pre-WS and WS-GRAM. The GRAM2 (pre-web services) and Gram4 (web services) Globus
client and server components. These components provide remote job submission.
Also included are supporting services such as the Reliable File Transfer
Service and the Delegation Service.
- GridFTP. GridFTP server and clients that provide secure, high-bandwidth file
transfers.
- GSI OpenSSH
[8].
Provides ssh access to SURAgrid systems using grid credentials.
- UberFTP
[9].
An interactive command line client for GridFTP.
- MyProxy
[10] client.
One way for caching proxies obtained from grid credentials.
- Condor-G
[11].
Job submission and management.
Requirements
VDT supports a variety of operating system and OS versions. Please make sure
your platform is one of the supported operating systems
[12].
The SURAgrid software stacks require the following underlying software to be
available:
- Perl 5.8.0 or greater
- tar (any version)
- diff+patch (any recent version should suffice)
- Python 2.2 or greater (pacman itself will install if necessary)
The disk space requirements vary per platform but generally 1-2 GB of free disk space will suffice.
General Preparations and Steps
The basic steps in an installation scenario include:
- Install pacman
The SURAgrid software stack is installed and managed with
pacman
[13].
pacman is a utility which manages software packages in
Linux. It uses simple compressed files as a package format, and maintains a text-based package database (more of a hierarchy),
just in case some hand tweaking is necessary.
- Install the SURAgrid server software stack
The root directory, where the SURAgrid server software stack is installed, is created.
Then pacman is used to begin installing the server software stack.
pacman asks you questions and downloads a relatively large number of packages.
- Configure the SURAgrid Software Stack
After the installation is complete, there are a number of post-install configuration
steps to perform before the SURAgrid server software is fully
functional. They include:
- Install Credentials
Credentials are required for your host.
The SURAgrid authentication and authorization infrastructure is based on
a two-tier PKI approach coupled with an optional LDAP-based PAM callout.
See the SURAgrid PKI Bridge Certification Authority and User Management System
[14]
pages for more information.
- Map SURAgrid Users
You have the option of either using the SURAgrid LDAP callout
to control local account mapping and authorization or simply setting up a
grid-mapfile to do so. Such mapping is required to associate the subject
distinguished name for a particular user in their X.509 certificate to a
local Unix account.
For further information, see the grid-mapfile section
[15]
of the SURAgrid PKI Bridge Certification Authority and User Management System
pages.
(Note that this topic is undergoing some evolution within SURagrid,
and we hope in particular to provide a mechanism for a fully-accredited approach
in cooperation with the International Grid Trust Federation
[16] soon.)
- Configure GSISSH
The GSI version of SSH
[17]
allows users to ssh into SURAgrid system using their grid credentials. They do
not have to provide a password and are automatically mapped to their local
assigned grid userid upon gsissh login. To set this up,
enable your gsissh server
[18].
- Configure GridFTP
The Globus GridFTP provides high-performance file movement between SURAgrid systems.
- Configure Globus account
For added security, the Globus web services container should be run as an ordinary user.
The typical userid used is 'globus'.
- Configure WS-GRAM
The Web Services GRAM (WS-GRAM) component allows remote users to execute applications
on each SURAgrid system.
- Set special local conditions
The globus tcp port range can be set to match local preferred values, as well as
any other special local conditions for your installation.
Other commands needed to handle variations that apply to a local cluser environment may also be added.
- Start services
Globus services can now be started, including WS-GRAM.
To get help
If you have any problems installing the SURAgrid software, please contact
the SURAgrid Support e-mail list
[19].
There is a VDT Discuss and Announce list which might be helpful for specific
advanced usage scenarios. Please see the VDT Support page
[20]
for a addtional information.
The Open Science Grid
(The following Open Science Grid example borrows liberally from the information available through the
OSG website
[21].)
The Open Science Grid, formed in 2004 is a distributed computing infrastructure
for large-scale scientific research, built and operated by a consortium of
universities, national laboratories, scientific collaborations and software
developers. The goal of the OSG Consortium to enable diverse communities of
scientists to access a common grid infrastructure and shared resources,
The OSG is supported by the National Science Foundation and the U.S. Department
of Energy's Office of Science.
Members of the OSG Consortium
[22]
contribute effort and resources to the OSG infrastructure and reap the benefits
of a shared infrastructure that integrates computing and storage resources
from more than 50 sites in the United States, Asia and South America. OSG also has partners
[23],
including campus, regional, national and international grids.
Researchers from many fields
[24],
including astrophysics, bioinformatics, computer science, medical imaging,
nanotechnology and physics use the OSG infrastructure to advance their research.
OSG provides help for new communities to adapt their applications to use the
distributed facility and make their resources accessible. The OSG also works to enable scientists to seamlessly harness grid-computing
resources worldwide and interoperates with multiple other Grid infrastructures.
The OSG is a continuation of Grid3
[25],
a community grid built in 2003 as a joint project of the U.S. LHC software
and computing programs, the National Science Foundation's Grid Physics Network
(GriPhyN) and International Virtual Data Grid Laboratory (iVDGL) projects,
and the U.S. Department of Energy's Particle Physics Data Grid (PPDG) project.
The OSG includes two grids: an Integration Grid and a Production Grid. The Integration
Grid is used to test new grid applications, sites and technologies, while the
Production Grid provides a stable, supported environment on which researchers
run their scientific applications. OSG partners, include campus, regional,
national and international grids. The OSG also works to enable scientists to
seamlessly harness grid-computing resources worldwide and interoperates with
multiple other Grid infrastructures.
Figure JG-3. Location of the Open Science Grid Production Resources.
Software The Open Science Grid provides and supports a reference set of software called
the "OSG Software Stack" for download and use by administrators and users of
the OSG.
The software stack relies on the Virtual Data Toolkit (VDT)
[26]
middleware, which is itself a
packaging and distribution based on the NSF Middleware Institute (NMI)
[27]
releases of Condor, Globus and other standard Grid middleware.
OSG@Work
[28]
pages provide detailed instructions on how to
prepare a facility and/or resource
and how to download and configure the OSG Software Stack in order to provide
or access resources on the OSG.
A production release of the OSG Software Stack comes only after validation of
the proposed software on the Integration Grid, and is based on released versions
of the VDT.
Applications on OSG Scientists from many different fields use the Open Science Grid to advance their
research. The OSG Consortium includes members from particle and nuclear physics,
astrophysics, bioinformatics, gravitational-wave science and computer science
collaborations. Consortium members contribute to the development of the OSG
and benefit from advances in grid technology. Applications in other areas of
science, such as mathematics, medical imaging and nanotechnology, benefit from
the OSG through its partnership with local and regional grids or their communities'
use of the Virtual Data Toolkit software stack.
The Consortium members contribute the resources available to the OSG. The owners
of the resources control their use, with an expectation that 10-20% are on
average available for opportunistic use, and with policies such that OSG members
can use any unused cycles.
Thus the existence of the OSG does not obviate the need for the purchase of hardware
and building of computational facilities by and for each science community.
The scope of OSG is to:
- Enable scientists to use a greater % of the available compute and storage cycles.
- Help scientists to use distributed systems and software with less effort.
- Enable more sharing and reuse of software and reduce duplication through providing
effort in integration and extensions.
- Establish an "open-source" community working together to communicate knowledge
and experience and reduce overheads for new participants.
The benefits come from reducing risk in and sharing support for large, complex
systems, which must be run for many years with a short lifetime workforce.
And also from leveraging the expertise and support for such systems to enable
new communities to more easily participate in distributed science including:
- Savings in effort for integration, system and software support,
- Opportunity and flexibility to distribute load and address peak needs.
- Maintenance of an experienced workforce in a common system.
- Lowering the cost of entry to new contributors.
- Enabling of new computational opportunities to communities that would not otherwise
have access to such resources.
The deliverables and milestones of OSG are driven directly by the needs of the
current set of scientific stakeholders and evolve through balancing of their
needs and those of the new communities to the available effort.
The OSG Grid Operations Center at Indiana University provides front line support
for all areas of the OSG and the OSG web site document repository and @Work
Twiki site give a lot of information about all aspects of the facility. We
will only touch on a few representative areas of activity here.
Use of the OSG There are more than sixty active computational sites on the OSG. The infrastructure
supports job throughput of more than a hundred thousand CPUhours a day and
supports several hundred users. About twenty of the sites are part of the US
LHC distributed data handling and analysis systems (Brookhaven and Fermilab
Tier-1, University Tier-2s). These sites are supporting ongoing data distribution
at aggregate of tens of terabytes a day. Four sites are owned by LIGO and are
being used for transitioning analysis codes from the existing LIGO data grid
to full production on the common infrastructure. Four sites are owned (or partially
owned) by STAR and are being used to bring STAR data distribution, simulation
and production codes. The Tevatron experiments are also making good opportunistic
use of the OSG
Bringing new users onto the OSG The Open Science Grid (OSG) engagement activity works closely with new user communities
over periods of several months to bring them to production running. These activities
include: providing an understanding of how to use the distributed infrastructure; adapting
applications to run effectively on OSG sites; engaging in the deployment of
community owned distributed infrastructures; working with the OSG Facility
to ensure the needs of the new community are met; providing common tools and
services in support of the new communities; and working directly with and in
support of the new end users with the goal to have them transition to be full
contributing members of the OSG. To date there are the following Engagement
users:
- Adaptation and production running opportunistically using more than a hundred
thousand CPUhours of the Rosetta application from the Kuhlman Laboratory in North Carolina across more than thirteen
OSG sites which has resulted in structure predictions for more than ten proteins.
We have so far tested the robustness of the system to the submission of up
to about three thousand jobs simultaneously
- Production runs of the Weather Research and Forecast (WRF) application using
more than one hundred and fifty thousand CPUhours on the NERSC OSG site at
Lawrence Berkeley National Laboratory
- Improvement of the performance of the nanoWire application from the nanoHUB project
on sites on the OSG and TeraGrid, such that stable running of batches of
five hundred jobs across more than five sites is routine. Work was also done
in support of nanoHUB scientists to use OSG resources to run BioMoca simulation
jobs last year and the first couple months of this year.
- Production running using more than twenty thousand CPU hours of the CHARMM molecular
dynamic simulation to the problem of water penetration in staphylococcal
nuclease using the ATLAS workload management system, PANDA and opportunistically
available resources across more than ten OSG sites.
Sites and VOs A Site is a set of processing and/or storage resources and/or services co-located
and centrally administered. A Virtual Organization (VO) is an organization
that includes people using the resources (users, developers, administrators,
and managers), the services needed by the organization and the resources owned
by the organization. The OSG architecture defines interfaces between sites
and VOs to the common infrastructure. The OSG provides an integrated and tested
reference set of software the Virtual Data Toolkit (VDT) for the sites and
the VOs to use to interface to the OSG distributed facility. Both sites and
VOs have responsibility and authority over the resources, software and services
that they own and they control and mange their use.
The OSG implementation architecture is cognizant that any resource may be supporting
use through multiple interfaces — from local submission and access, from the
OSG, and from other similar infrastructures such as Campus Grids (e.g. FermiGrid)
or other national infrastructures such as TeraGrid. Similarly, the implementation
architecture is cognizant that any VO may be using multiple infrastructures
simultaneously and may have a deep set of (sometimes complex) shared software
and services that are specific to the VO and operate across these infrastructures.
These are additional drivers to the model that sites retain local control and
management for all use, services and processes, and that VOs have control and
management over their internal processes, priorities and use. In addition to
levels of service and resource use being agreed between resource owners and
users, the site and VO processes are implemented to support sharing and opportunistic
use of the resources accessible to the OSG.
OSG services The OSG provides common services across the distributed facility in support of
VOs and Sites: monitoring, validation and information about the full infrastructure;
tracking of any and all problems and ensuring they are resolved; the Virtual
Data Toolkit software packaging and support; integration and testing facilities;
security; troubleshooting of the end-to-end system; and support for existing
and new user communities. The OSG also provides effort to bring new services
and software into the facility and to collaborate with the external projects,
as well as documentation and training of site and VO administrators and users.
Benefits from a common, integrated software stack OSG software releases consist of the collection of software integrated and distributed
as the Virtual Data Toolkit (VDT) with a thin layer of additional OSG specific
configuration scripts. Modules in the VDT are included at the request of the
stakeholders. The Condor and Globus software provide the base technologies.
VDT includes about thirty additional modules, including components from other
computer science groups, the Enabling Grids for EScience (EGEE), DOE Laboratory
facilities (Brookhaven, Fermilab and LBNL), and the application communities
themselves, as well as standard open source software components such as Apache
and Tomcat. OSG also supports the VDT for external projects. For example, VDT
is used by the EGEE and Australian distributed computing infrastructures. Additionally,
in support of interoperability across their infrastructures, the OSG and TeraGrid
software stacks include aligned versions of the Condor and Globus software.
The VDT provides a reference software stack for use by OSG sites. Once the software
is installed a site supports remote job submission, shared storage at a site,
data transfer between sites, has services to manage priorities and access between
VOs, and can participate in the OSG monitoring, validation and accounting services.
OSG supports use of the reference software. Sites must provide the common interfaces
to OSG services but the actual implementation is not dictated. The VDT also
provides client libraries and tools for the applications to use to access OSG
resources and services.
Operational security and the security infrastructure OSG is well aware of the essential and integrated nature of security operations
and management. We have comprehensive risk analysis and security plans. We
respond to software security notifications by a prompt analysis of the problem
and assigning high priority to patches and fixes. Over the past year we have
had about ten such notifications which have resulted in new software, downloads
within between a day and a week or two. The VDT, Condor, Globus, and EGEE software
teams communicate security risks as soon as they are identified and work together
on patches and solutions. The collaborative nature of the OSG means that communication
is natural and happening all the time and security is part of the day-to-day
normal processes. OSG has mechanisms to deny user's access to sites and resources.
The grouping of users into VOs gives us a small number of well-identified responsible
managers who control user entry to the infrastructure. This leads us to a model
of trust between sites, VOs and the OSG, with delegated trust between the VO and the
end users.
The OSG security infrastructure is based on: X509 user, host and service certificates
gained through one of the International Grid Trust Federation accredited Certificate
Authorities; user identity proxy certificates obtained through the VO Management
Service (EGEE VOMS) which provides checking of the user as part of a VO; management
of extended certificate attributes to assign "roles" to a particular access
by a user; flexible definition of mapping of user certificates to accounts
and ACLs (access control lists) on a site; and policy (including blacklist)
enforcement points at the site (processing and storage) services themselves.
Jobs, data, and storage Job Management and Execution
OSG sites present interfaces allowing remotely submitted jobs to be accepted,
queued and executed locally. The priority and policies of execution are controlled
both by the VO and the site itself. VO policies are defined through "roles"
given to the user through the VOMS service. Site policies and priorities are
defined through mapping the user and their roles to specific accounts used
to submit the job to the batch queue. OSG supports the Condor-G job submission
client which interfaces to either the pre-web service or web services GRAM
Globus interface at the executing site. Job managers at the backend of the
GRAM gatekeeper support job execution by local Condor, LSF, PBS, or SGE batch
systems.
Data Transport, Storage and Access
Many of the OSG physics user communities have large file based data transport
and application level high data I/O needs. The data transport, access, and
storage implementations on OSG take account of these needs. OSG relies on GridFTP
protocol for the raw transport of the data — using Globus GridFTP in all cases
except where interfaces to storage management systems (rather than file systems)
dictate individual implementations. The community has been heavily involved
in the early testing of new versions of Globus GridFTP as well as defining
needed changes in the GridFTP protocol.
Storage Resource Management
OSG supports the Storage Resource Management (SRM) interface to storage resources
to enable management of space and data transfers to prevent unexpected errors
due to running out of space, to prevent overload of the GridFTP services, and
to provide capabilities for pre-staging, pinning and retention of the data
files. OSG currently provides reference implementations of two storage systems the
LBNL Disk Berkeley Storage Manager (BeStMan)
[34]
and
dCache
[35].
In addition, because functionalities to support space reservation and sharing
are not yet available through grid interfaces, OSG defines a set of environmental
variables that a site must implement and a VO can rely on to point them to
available space, space shared between all nodes on a compute cluster, and for
the use of high performance I/O disk caches.
Gateways to other facilities and grids We are seeing a rapid growth in the interest and deployment of shared computational
infrastructures at the local and regional level. We are also seeing a rapid
growth in research communities' needing to move data and jobs between heterogeneous
facilities and build integrated community computational systems across high
performance computing (HPC) facilities and more traditional computing clusters.
OSG provides interfaces to these HPC facilities to support these use cases. The
OSG also federates with other large infrastructures — notably the TeraGrid and
EGEE — by providing gateways between them and OSG, and supporting groups to submit
jobs across and move data between them. For example, the OSG collects information
from the resources and publishes them in the format needed by the EGEE. The CMS VO "Resource Broker" job dispatcher then submits jobs transparently
across EGEE and OSG resources.
Participating in the OSG New organizations contribute to OSG by providing resources, using the infrastructure,
working with the communities to provide software, participating in training
and documentation activities, and/or participating in the security, troubleshooting,
or other OSG activity areas. The overhead to participation is low and the benefit
is matched to the principle that "what you get out depends on what you put
in". An organization registers with the Grid Operations Center and provides
contact and planned usage information. The OSG staff then helps to interface
the resources to the OSG and provides support for the VOs usage.
Training on the OSG The OSG Education and Training
[29]
program provides training for student users, researchers and educators of the
OSG and site administrators.At the core of the student education program are
the Workshops
[30],
organized by OSG and its partners. These grid schools give advanced undergraduate
and graduate students a basic foundation in distributed computing and provide
valuable hands-on training in distributed and grid computing techniques. The
schools introduce essential skills that will be needed by students in the fields
of natural and applied science, engineering and computer science to conduct
and support scientific analysis in grid computing environments.
See OSG's Research Highlights
[31] for more details.
Bibliography[1] SURAgrid Web site
(http://www.sura.org/suragrid) [2] SURAgrid-IBM partnership
(http://www.sura.org/news/docs/IBMSURAgrid.doc) [3] SURAgrid User Management and PKI Bridge Certification Authority
(https://www.pki.virginia.edu/nmi-bridge) [4] TIGRE
(http://tigreportal.hipcat.net) [5] Virtual Data Toolkit
(http://vdt.cs.wisc.edu/) [6] SURAgrid Server Stack website
(http://omnius.hpcc.ttu.edu/SURAgrid_wiki/ServerStack) [7] Globus Toolkit 4.0
(http://www.globus.org/toolkit/docs/4.0/) [8] GSI OpenSSH
(http://grid.ncsa.uiuc.edu/ssh/) [9] UberFTP
(http://dims.ncsa.uiuc.edu/set/uberftp/) [10] MyProxy
(http://grid.ncsa.uiuc.edu/myproxy/) [11] Condor-G
(http://www.cs.wisc.edu/condor/condorg/) [12] VDT supported operating systems
(http://vdt.cs.wisc.edu/releases/1.6.1/requirements.html) [13] pacman
(http://www.archlinux.org/pacman/) [14] SURAgrid PKI Bridge Certification Authority and User Management System
(https://www.pki.virginia.edu/sura-bridge/scl/) [15] grid-mapfile section
(https://www.pki.virginia.edu/nmi-bridge/scl/#gridmapfile) [16] International Grid Trust Federation
(http://gridpma.org) [17] GSI version of SSH
(http://grid.ncsa.uiuc.edu/ssh/) [18] Step 6: Install the GSI-OpenSSH Server
(http://grid.ncsa.uiuc.edu/ssh/install.html#install_server) [19] SURAgrid Support e-mail list
(mailto:suragrid-support@sura.org) [20] VDT Support page
(http://vdt.cs.wisc.edu/support.html) [21] OSG website
(http://www.opensciencegrid.org) [22] Members of the OSG Consortium
(http://www.opensciencegrid.org/About/Who_is_the_Open_Science_Grid%3F/Consortium_Members) [23] OSG partners
(http://www.opensciencegrid.org/About/Who_is_the_Open_Science_Grid%3F/Partners) [24] OSG Researcher fields
(http://www.opensciencegrid.org/Science_on_the_OSG/Research_Highlights) [25] Grid3
(http://www.ivdgl.org/grid2003/) [26] Virtual Data Toolkit (VDT)
(http://vdt.cs.wisc.edu//index.html) [27] NSF Middleware Institute (NMI)
(http://www.nsf-middleware.org/default.aspx) [28] OSG@Work
(http://twiki.grid.iu.edu/twiki/bin/view) [29] OSG Education and Training
(http://twiki.grid.iu.edu/twiki/bin/view/Education/WebHome) [30] OSG Workshops
(http://twiki.grid.iu.edu/twiki/bin/view/Education/GridWorkshops) [31] OSG Research Highlights
(http://www.opensciencegrid.org/Science_on_the_OSG/Research_Highlights) [32] SRM collaboration working group
(http://sdm.lbl.gov/srm-wg) [33https://twiki.grid.iu.edu/twiki/bin/view/Storage] Storage Group
(OSG) [34] BeStMan
(http://datagrid.lbl.gov/bestman) [35] dCache
(http://www.dcache.org)
|