Home |  Previous |  Next |  Print |  Contact

 Joining a Grid: Procedures & Examples

  
 Acknowledgments
 Preface
 Introduction
 History, Standards & Directions
 What Grids Can Do For You
 Grid Case Studies
 Current Technology for Grids
 Programming Concepts & Challenges
 Joining a Grid: Procedures & Examples
 
 Introduction
 SURAgrid: A regional-scale multi-institutional grid
 The Open Science Grid
 Bibliography
 Typical Usage Examples
 Related Topics
 My Favorite Tips
 Glossary
 Appendices
 Use of This Material
 

Joining a Grid: Procedures & Examples


Introduction

One the of most effective ways to become familiar with the ins and outs of grid technology and usage is to join a grid initiative with goals and objectives that encompass or overlap with those of your institution, and with opportunities to develop peer collaboration and support. Through participation in such initiatives, you can leverage shared resources to meet institutional goals and begin contributing your perspective and increasing expertise back to the community for the collective improvement and advancement of effective use of grid technology. Several grid initiatives invite this type of participation today; two notable examples are described below.


SURAgrid: A regional-scale multi-institutional grid

SURAgrid is a consortium of organizations collaborating and combining resources to help bring grid technology to the level of seamless, shared infrastructure. The project arose from the desire for ongoing collaboration among institutions that had been participating with SURA (Southeastern Universities Research Association) in NSF Middleware Initiative (NMI) Integration Testbed, a program that provided comprehensive evaluation of NMI middleware in the earliest years of that initiative. Facilitated by SURA, the vision for SURAgrid is to orchestrate access to a rich set of distributed capabilities in order to meet diverse users' needs. Capabilities to be cultivated include locally contributed resources, project-specific tools and environments, highly specialized or HPC access, and gateways to national and international cyberinfrastructure.


Figure JG-1. An overview of SURAgrid.

To meet the needs of its broad participant and user community, SURAgrid focused on three primary goals:

  • Develop a scalable infrastructure that leverages local institutional identity and authorization while managing access to shared resources across institutional boundaries.
  • Promote the use of this infrastructure for the broad research and education community, creating a whole that is greater than the sum of its parts.
  • Provide a forum for participating institutions to gain additional experience with grid technology and to promote collaborative project development.

With the long-term view of grids as generalized infrastructure, an emphasis on diversity and inclusion, and a persistent objective to discover and understand grid use outside the scope of expected or typical use today, SURAgrid is positioned to become an essential tool to build the scientific and educational capacity of the Southeastern U.S. and beyond.


Applications on SURAgrid

The identification of research applications that can be significantly advantaged through the application of grid technologies is a key factor in fostering grid development and deployment and also a key factor to grow and sustain SURAgrid. The deployment of an intentionally diverse set of applications is contributing to the advancement of research and education within a variety of disciplines. Applications under development on SURAgrid are detailed on the SURAgrid Web site, with a few notable examples listed below:

  • SCOOP (SURA Coastal Ocean Observing & Prediction) Coastal Ocean Modeling)
    The SCOOP (http://scoop.sura.org) Cyberinfrastructure (CI) is being developed to support coastal research and operations, by providing a modular, distributed system for real time prediction and visualization of the impacts of extreme atmospheric events on coastal areas, and enabling advances in multi-scale, multi-model, and DDDAS science. The SCOOP CI enables complex workflows, which integrate coastal models such as ADCIRC, ELCIRC, WW3, and CH3D with various wind models and sensor information. SCOOP presently uses SURAgrid resources for added computational capacity.

  • EPANET Simulation-Optimization for Threat Management in Urban Water Systems
    This application incorporates dynamic demand data, in real-time, into a simulation-optimization process for contamination threat management in drinking water distribution systems. The nature of this work is highly compute-intensive and requires multi-level parallel processing via computer clusters and high-performance computing architectures such as SURAgrid. Simulation-Optimization with EPANET is part of a multi-disciplinary, three-year NSF-funded DDDAS (Dynamic Data-Driven Application Systems) research project to develop a cyberinfrastructure system that will both adapt to and control changing needs in data, models, computer resources and management choices facilitated by a dynamic workflow design. Project Partners: North Carolina State University; University of Chicago; University of Cincinnati; University of South Carolina.

  • Grid-Enabled Distributed BLAST
    BLAST is a database search application for matching protein and nucleotide sequences. Maximizing the throughput of searches is key to improving research results. This distributed implementation of BLAST developed by the University of Alabama at Birmingham uses the DynamicBLAST Meta-scheduler to select appropriate grid resources for select query strings. Globus is used for job staging, submission and retrieval. ncbiBLAST performs the computations. Jobs are submitted using a web-based interface that leverages campus identity credentials via Pubcookie and manages grid authentication on behalf of the user via MyProxy, providing a simplified user authentication experience.

  • SURAgrid Teaching Environment
    Effective teaching about grids, within Computer Science as well as other disciplines, is greatly enhanced by students and instructors having hands-on access to a stable grid environment. Through coordinated commitment, operation and support across a subset of SURAgrid resources, SURAgrid is developing a predictable, secure and reliable grid-based teaching facility for use by SURAgrid sites in their grid course development and/or delivery. Old Dominion University has made initial use of this capability by providing basic grid access for students to supplement theory in a Distributed Computing course during Spring 2007. Targeted improvements include more scalable group account management, accommodation for varying levels of access, and space for faculty to participate in joint course development.


How SURAgrid works

As an inter-institutional grid infrastructure, SURAgrid provides a variety of application users with a common point of access to a shared set of distributed, heterogeneous resources. As of August 2007, thirty academic organizations and institutions are participating in SURAgrid. Most but not all are members of SURA, athough SURA membership is not a requirement.


Figure JG-2. The SURAgrid map.

Resources to be shared are contributed by the participating organizations and remain under autonomous control of the resource owner, with shared access enabled through grid-wide coordination of authentication and authorization mechanisms, and operational procedures.

Most of the resources being contributed are computational in nature, providing just over 10.5 Teraflops of combined capability as of April 2007, for sharing among the SURAgrid community (although capacity does fluctuate as resources are added, swapped, upgraded, etc.). More diverse resources such as databases, instruments, storage, and application services are anticipated in the future. SURAgrid resources can be viewed and accessed through the SURAgrid portal at https://gridportal.sura.org, which is maintained by the Texas Advanced Computing Center (TACC).

Institutions that participate in SURAgrid are also expected to share in its organization, planning and development, as a cooperative effort to foster collaboration and build a shared asset to help meet local, regional and national goals for the advancement of science through grid technologies. Informal and "grass roots" structure and procedures for governance and decision-making are gradually being replaced by more formal components and processes, while retaining the spirit of community and collaboration that is the foundation of the initiative. New organizations can join SURAgrid by following the contact process detailed on the SURAgrid Web site [1], see tab/menu item "Join SURAgrid".)


The SURAgrid infrastructure

The SURAgrid software stack, grid services and application environment have evolved to include a minimal set of requirements and recommendations intended to be as loose as possible while providing a foundation of interoperability. Originally, the primary need was for management and coordination of resources and applications within SURAgrid itself. More recently, grid-to-grid integration has become of greater importance to SURAgrid participants who need to share resources with or bridge access to other grid projects such as TeraGrid, Open Science Grid, TIGRE (Texas Internet Grid for Research and Education), and project-specific grids such as GridChem.

SURAgrid presently uses Globus middleware to facilitate access to a variety of computational resources, such as Linux-based clusters, IBM P575 HPC systems, Condor pools, and virtualized resources. Adding resources to SURAgrid is facilitated through user documentation, peer support and some direct assistance from SURA staff.

Continued development of the SURAgrid environment is required to accommodate new user communities, future integration with other grid initiatives, and an anticipated increase in corporate partnerships (such as the SURAgrid-IBM partnership [2]. The SURAgrid stack specification is in alignment with middleware that is in use by these and many other major academic grid initiatives. Different applications may have requirements beyond the currently specified software stack. Such requirements are treated as application-specific needs until they are shown to be more commonly required and so should be incorporated into overall resource requirements.

SURAgrid resource requirements & recommendations (server side) as of June 2007:

  • Required: Globus 4.x, WS-GRAM, gridFTP, WS-MDS and RFT. Pre-WS GRAM and MDS are strongly recommended to support existing legacy applications.
  • GSI-OpenSSH is strongly recommended for application staging. If enabled, it is required that you advertise the port through WS or system detail in the SURAgrid portal. We recommend using either port 22 or 2222.
  • Any version of operating system that supports the required services above, with Linux 2.4 or higher recommended in order to provide a common platform for application development.
  • Addition of resource and relevant system detail to the resource monitor (GPIR) of the SURAgrid portal.
  • A scheduler installed as part of your underlying resource configuration.
  • Cross-certification with SURAgrid Bridge CA — strongly recommended at this time and likely to be required in the future. (See https://www.pki.virginia.edu/nmi-bridge [3])
  • Configuration of the required environment variables as defined in SURAgrid Environment Variables. Configuration of the optional environment variables also recommended.

Users must be both authenticated and authorized to access SURAgrid resources. The Globus GSI (Grid Security Infrastructure) relies on PKI (public key infrastructure) and its related exchange of certificates for authentication and provides for authorization through a "grid-mapfile" that associates identities with individual system accounts. SURAgrid augments this authentication process by leveraging authoritative campus identity management where possible for user authentication between participating sites. Scalable exchange of this trusted information is enabled through the use of the SURAgrid Bridge Certificate Authority (Bridge CA), maintained by the University of Virginia, SURAgrid's lead in this area. Each site establishes a trusted relationship with the SURAgrid Bridge CA, which essentially then "vouches" for each site to the others. In the absence of a Bridge CA, each site within a PKI infrastructure must establish a trusted relationship with all the other participating sites, which can become exceedingly difficult, if not impossible, to manage effectively as the number of participants increase. Within SURAgrid, participating sites typically run their own Certificate Authority (CA) to provide both user and system certificates for participation in the SURAgrid PKI. A SURAgrid CA is also under development, to provide certificates for sites that are not running their own CA or do not have access to one, SURAgrid guest access, etc.

Once a SURAgrid user is successfully authenticated, he or she accesses SURAgrid resources through use of a pre-established individual SURAgrid user account. This account is recognized on all properly configured SURAgrid resources and the permissions inherent in the user account determine the levels of authorization (what the user is able to do). The setup and management of SURAgrid user accounts is facilitated through several tools developed for SURAgrid by the University of Virginia. These tools include Web-based account management, a shared LDAP directory that maintains SURAgrid user information, and scripts that provide various levels of automation to be used for mapping user information to the Globus GSI, to the degree desired by each site. Account access mechanisms in use on SURAgrid range from user access through the SURAgrid portal, remote login by the user to individual resources, and software-automated access through applications and scripts.


Implementation closeup: Installing the SURAgrid server stack

SURAgrid Server Software Stack

To accommodate heterogeneity, the SURAgrid software stack, grid services and application environment evolve based on setting a minimal set of requirements and recommendations that increase in specificity as needs dictate. However, SURA has defined a common set of software that should be available on all SURA server systems at this point in time, to insure interoperability among systems and support for the current and near-term application set. To facilitate installation of the appropriate software, the SURAgrid team is collaborating with the TIGRE [4] project in the development of a "one-button" installation with stack for SURAgrid. This installation package includes both services and clients for those services, and leverages the Virtual Data Toolkit [5] (VDT) to provide a convenient way to install and configure this software. The excerpt below illustrates parts of this automated process for adding a resource to SURAgrid.

Please check the official SURAgrid Server Stack website [6] for the most current material.

Contents

The SURAgrid software stack consists of the following components:

  • Globus Toolkit 4.0 [7] (servers and clients)
  • Grid Proxy programs. For obtaining X.509 credentials.
  • Pre-WS and WS-GRAM. The GRAM2 (pre-web services) and Gram4 (web services) Globus client and server components. These components provide remote job submission. Also included are supporting services such as the Reliable File Transfer Service and the Delegation Service.
  • GridFTP. GridFTP server and clients that provide secure, high-bandwidth file transfers.
  • GSI OpenSSH [8]. Provides ssh access to SURAgrid systems using grid credentials.
  • UberFTP [9]. An interactive command line client for GridFTP.
  • MyProxy [10] client. One way for caching proxies obtained from grid credentials.
  • Condor-G [11]. Job submission and management.

Requirements

VDT supports a variety of operating system and OS versions. Please make sure your platform is one of the supported operating systems [12]. The SURAgrid software stacks require the following underlying software to be available:

  • Perl 5.8.0 or greater
  • tar (any version)
  • diff+patch (any recent version should suffice)
  • Python 2.2 or greater (pacman itself will install if necessary)

The disk space requirements vary per platform but generally 1-2 GB of free disk space will suffice.

General Preparations and Steps

The basic steps in an installation scenario include:

  • Install pacman

    The SURAgrid software stack is installed and managed with pacman [13]. pacman is a utility which manages software packages in Linux. It uses simple compressed files as a package format, and maintains a text-based package database (more of a hierarchy), just in case some hand tweaking is necessary.

  • Install the SURAgrid server software stack

    The root directory, where the SURAgrid server software stack is installed, is created. Then pacman is used to begin installing the server software stack. pacman asks you questions and downloads a relatively large number of packages.

  • Configure the SURAgrid Software Stack

    After the installation is complete, there are a number of post-install configuration steps to perform before the SURAgrid server software is fully functional. They include:

    • Install Credentials

      Credentials are required for your host. The SURAgrid authentication and authorization infrastructure is based on a two-tier PKI approach coupled with an optional LDAP-based PAM callout. See the SURAgrid PKI Bridge Certification Authority and User Management System [14] pages for more information.

    • Map SURAgrid Users

      You have the option of either using the SURAgrid LDAP callout to control local account mapping and authorization or simply setting up a grid-mapfile to do so. Such mapping is required to associate the subject distinguished name for a particular user in their X.509 certificate to a local Unix account. For further information, see the grid-mapfile section [15] of the SURAgrid PKI Bridge Certification Authority and User Management System pages. (Note that this topic is undergoing some evolution within SURagrid, and we hope in particular to provide a mechanism for a fully-accredited approach in cooperation with the International Grid Trust Federation [16] soon.)

    • Configure GSISSH

      The GSI version of SSH [17] allows users to ssh into SURAgrid system using their grid credentials. They do not have to provide a password and are automatically mapped to their local assigned grid userid upon gsissh login. To set this up, enable your gsissh server [18].

    • Configure GridFTP

      The Globus GridFTP provides high-performance file movement between SURAgrid systems.

    • Configure Globus account

      For added security, the Globus web services container should be run as an ordinary user. The typical userid used is 'globus'.

    • Configure WS-GRAM

      The Web Services GRAM (WS-GRAM) component allows remote users to execute applications on each SURAgrid system.

    • Set special local conditions

      The globus tcp port range can be set to match local preferred values, as well as any other special local conditions for your installation.

      Other commands needed to handle variations that apply to a local cluser environment may also be added.

    • Start services

      Globus services can now be started, including WS-GRAM.

To get help

If you have any problems installing the SURAgrid software, please contact the SURAgrid Support e-mail list [19]. There is a VDT Discuss and Announce list which might be helpful for specific advanced usage scenarios. Please see the VDT Support page [20] for a addtional information.


The Open Science Grid

(The following Open Science Grid example borrows liberally from the information available through the OSG website [21].)

The Open Science Grid, formed in 2004 is a distributed computing infrastructure for large-scale scientific research, built and operated by a consortium of universities, national laboratories, scientific collaborations and software developers. The goal of the OSG Consortium to enable diverse communities of scientists to access a common grid infrastructure and shared resources,

The OSG is supported by the National Science Foundation and the U.S. Department of Energy's Office of Science.

Members of the OSG Consortium [22] contribute effort and resources to the OSG infrastructure and reap the benefits of a shared infrastructure that integrates computing and storage resources from more than 50 sites in the United States, Asia and South America.  OSG also has partners [23], including campus, regional, national and international grids.

Researchers from many fields [24], including astrophysics, bioinformatics, computer science, medical imaging, nanotechnology and physics use the OSG infrastructure to advance their research. OSG provides help for new communities to adapt their applications to use the distributed facility and make their resources accessible. The OSG also works to enable scientists to seamlessly harness grid-computing resources worldwide and interoperates with multiple other Grid infrastructures.

The OSG is a continuation of Grid3 [25], a community grid built in 2003 as a joint project of the U.S. LHC software and computing programs, the National Science Foundation's Grid Physics Network (GriPhyN) and International Virtual Data Grid Laboratory (iVDGL) projects, and the U.S. Department of Energy's Particle Physics Data Grid (PPDG) project.

 The OSG includes two grids: an Integration Grid and a Production Grid. The Integration Grid is used to test new grid applications, sites and technologies, while the Production Grid provides a stable, supported environment on which researchers run their scientific applications. OSG partners, include campus, regional, national and international grids. The OSG also works to enable scientists to seamlessly harness grid-computing resources worldwide and interoperates with multiple other Grid infrastructures.


Figure JG-3. Location of the Open Science Grid Production Resources.


Software

The Open Science Grid provides and supports a reference set of software called the "OSG Software Stack" for download and use by administrators and users of the OSG.

The software stack relies on the Virtual Data Toolkit (VDT) [26] middleware, which is itself a packaging and distribution based on the NSF Middleware Institute (NMI) [27] releases of Condor, Globus and other standard Grid middleware.

OSG@Work [28] pages provide detailed instructions on how to prepare a facility and/or resource and how to download and configure the OSG Software Stack in order to provide or access resources on the OSG.

A production release of the OSG Software Stack comes only after validation of the proposed software on the Integration Grid, and is based on released versions of the VDT.


Applications on OSG

Scientists from many different fields use the Open Science Grid to advance their research. The OSG Consortium includes members from particle and nuclear physics, astrophysics, bioinformatics, gravitational-wave science and computer science collaborations. Consortium members contribute to the development of the OSG and benefit from advances in grid technology. Applications in other areas of science, such as mathematics, medical imaging and nanotechnology, benefit from the OSG through its partnership with local and regional grids or their communities' use of the Virtual Data Toolkit software stack.

The Consortium members contribute the resources available to the OSG. The owners of the resources control their use, with an expectation that 10-20% are on average available for opportunistic use, and with policies such that OSG members can use any unused cycles.

Thus the existence of the OSG does not obviate the need for the purchase of hardware and building of computational facilities by and for each science community. The scope of OSG is to:

  • Enable scientists to use a greater % of the available compute and storage cycles.
  • Help scientists to use distributed systems and software with less effort.
  • Enable more sharing and reuse of software and reduce duplication through providing effort in integration and extensions.
  • Establish an "open-source" community working together to communicate knowledge and experience and reduce overheads for new participants.

The benefits come from reducing risk in and sharing support for large, complex systems, which must be run for many years with a short lifetime workforce. And also from leveraging the expertise and support for such systems to enable new communities to more easily participate in distributed science including:

  • Savings in effort for integration, system and software support,
  • Opportunity and flexibility to distribute load and address peak needs.
  • Maintenance of an experienced workforce in a common system.
  • Lowering the cost of entry to new contributors.
  • Enabling of new computational opportunities to communities that would not otherwise have access to such resources.

The deliverables and milestones of OSG are driven directly by the needs of the current set of scientific stakeholders and evolve through balancing of their needs and those of the new communities to the available effort.

The OSG Grid Operations Center at Indiana University provides front line support for all areas of the OSG and the OSG web site document repository and @Work Twiki site give a lot of information about all aspects of the facility. We will only touch on a few representative areas of activity here.


Use of the OSG

There are more than sixty active computational sites on the OSG. The infrastructure supports job throughput of more than a hundred thousand CPUhours a day and supports several hundred users. About twenty of the sites are part of the US LHC distributed data handling and analysis systems (Brookhaven and Fermilab Tier-1, University Tier-2s). These sites are supporting ongoing data distribution at aggregate of tens of terabytes a day. Four sites are owned by LIGO and are being used for transitioning analysis codes from the existing LIGO data grid to full production on the common infrastructure. Four sites are owned (or partially owned) by STAR and are being used to bring STAR data distribution, simulation and production codes. The Tevatron experiments are also making good opportunistic use of the OSG


Bringing new users onto the OSG

The Open Science Grid (OSG) engagement activity works closely with new user communities over periods of several months to bring them to production running. These activities include:  providing an understanding of how to use the distributed infrastructure; adapting applications to run effectively on OSG sites; engaging in the deployment of community owned distributed infrastructures; working with the OSG Facility to ensure the needs of the new community are met; providing common tools and services in support of the new communities; and working directly with and in support of the new end users with the goal to have them transition to be full contributing members of the OSG. To date there are the following Engagement users:

  • Adaptation and production running opportunistically using more than a hundred thousand CPUhours of the Rosetta  application from the Kuhlman Laboratory in North Carolina across more than thirteen OSG sites which has resulted in structure predictions for more than ten proteins. We have so far tested the robustness of the system to the submission of up to about three thousand jobs simultaneously
  • Production runs of the Weather Research and Forecast (WRF) application using more than one hundred and fifty thousand CPUhours on the NERSC OSG site at Lawrence Berkeley National Laboratory
  • Improvement of the performance of the nanoWire application from the nanoHUB project on sites on the OSG and TeraGrid, such that stable running of batches of five hundred jobs across more than five sites is routine. Work was also done in support of nanoHUB scientists to use OSG resources to run BioMoca simulation jobs last year and the first couple months of this year.
  • Production running using more than twenty thousand CPU hours of the CHARMM molecular dynamic simulation to the problem of water penetration in staphylococcal nuclease using the ATLAS workload management system, PANDA and opportunistically available resources across more than ten OSG sites.

Sites and VOs

A Site is a set of processing and/or storage resources and/or services co-located and centrally administered. A Virtual Organization (VO) is an organization that includes people using the resources (users, developers, administrators, and managers), the services needed by the organization and the resources owned by the organization. The OSG architecture defines interfaces between sites and VOs to the common infrastructure. The OSG provides an integrated and tested reference set of software the Virtual Data Toolkit (VDT) for the sites and the VOs to use to interface to the OSG distributed facility. Both sites and VOs have responsibility and authority over the resources, software and services that they own and they control and mange their use.

The OSG implementation architecture is cognizant that any resource may be supporting use through multiple interfaces — from local submission and access, from the OSG, and from other similar infrastructures such as Campus Grids (e.g. FermiGrid) or other national infrastructures such as TeraGrid. Similarly, the implementation architecture is cognizant that any VO may be using multiple infrastructures simultaneously and may have a deep set of (sometimes complex) shared software and services that are specific to the VO and operate across these infrastructures. These are additional drivers to the model that sites retain local control and management for all use, services and processes, and that VOs have control and management over their internal processes, priorities and use. In addition to levels of service and resource use being agreed between resource owners and users, the site and VO processes are implemented to support sharing and opportunistic use of the resources accessible to the OSG.


OSG services

The OSG provides common services across the distributed facility in support of VOs and Sites: monitoring, validation and information about the full infrastructure; tracking of any and all problems and ensuring they are resolved; the Virtual Data Toolkit software packaging and support; integration and testing facilities; security; troubleshooting of the end-to-end system; and support for existing and new user communities. The OSG also provides effort to bring new services and software into the facility and to collaborate with the external projects, as well as documentation and training of site and VO administrators and users.


Benefits from a common, integrated software stack

OSG software releases consist of the collection of software integrated and distributed as the Virtual Data Toolkit (VDT) with a thin layer of additional OSG specific configuration scripts. Modules in the VDT are included at the request of the stakeholders. The Condor and Globus software provide the base technologies. VDT includes about thirty additional modules, including components from other computer science groups, the Enabling Grids for EScience (EGEE), DOE Laboratory facilities (Brookhaven, Fermilab and LBNL), and the application communities themselves, as well as standard open source software components such as Apache and Tomcat. OSG also supports the VDT for external projects. For example, VDT is used by the EGEE and Australian distributed computing infrastructures. Additionally, in support of interoperability across their infrastructures, the OSG and TeraGrid software stacks include aligned versions of the Condor and Globus software.

The VDT provides a reference software stack for use by OSG sites. Once the software is installed a site supports remote job submission, shared storage at a site, data transfer between sites, has services to manage priorities and access between VOs, and can participate in the OSG monitoring, validation and accounting services. OSG supports use of the reference software. Sites must provide the common interfaces to OSG services but the actual implementation is not dictated. The VDT also provides client libraries and tools for the applications to use to access OSG resources and services.


Operational security and the security infrastructure

OSG is well aware of the essential and integrated nature of security operations and management. We have comprehensive risk analysis and security plans. We respond to software security notifications by a prompt analysis of the problem and assigning high priority to patches and fixes. Over the past year we have had about ten such notifications which have resulted in new software, downloads within between a day and a week or two. The VDT, Condor, Globus, and EGEE software teams communicate security risks as soon as they are identified and work together on patches and solutions. The collaborative nature of the OSG means that communication is natural and happening all the time and security is part of the day-to-day normal processes. OSG has mechanisms to deny user's access to sites and resources. The grouping of users into VOs gives us a small number of well-identified responsible managers who control user entry to the infrastructure. This leads us to a model of trust between sites, VOs and the OSG, with delegated trust between the VO and the end users.

The OSG security infrastructure is based on: X509 user, host and service certificates gained through one of the International Grid Trust Federation accredited Certificate Authorities; user identity proxy certificates obtained through the VO Management Service (EGEE VOMS) which provides checking of the user as part of a VO; management of extended certificate attributes to assign "roles" to a particular access by a user; flexible definition of mapping of user certificates to accounts and ACLs (access control lists) on a site; and policy (including blacklist) enforcement points at the site (processing and storage) services themselves.


Jobs, data, and storage

Job Management and Execution

OSG sites present interfaces allowing remotely submitted jobs to be accepted, queued and executed locally. The priority and policies of execution are controlled both by the VO and the site itself. VO policies are defined through "roles" given to the user through the VOMS service. Site policies and priorities are defined through mapping the user and their roles to specific accounts used to submit the job to the batch queue. OSG supports the Condor-G job submission client which interfaces to either the pre-web service or web services GRAM Globus interface at the executing site. Job managers at the backend of the GRAM gatekeeper support job execution by local Condor, LSF, PBS, or SGE batch systems. 

Data Transport, Storage and Access

Many of the OSG physics user communities have large file based data transport and application level high data I/O needs. The data transport, access, and storage implementations on OSG take account of these needs. OSG relies on GridFTP protocol for the raw transport of the data — using Globus GridFTP in all cases except where interfaces to storage management systems (rather than file systems) dictate individual implementations. The community has been heavily involved in the early testing of new versions of Globus GridFTP as well as defining needed changes in the GridFTP protocol.

Storage Resource Management

OSG supports the Storage Resource Management (SRM) interface to storage resources to enable management of space and data transfers to prevent unexpected errors due to running out of space, to prevent overload of the GridFTP services, and to provide capabilities for pre-staging, pinning and retention of the data files. OSG currently provides reference implementations of two storage systems the LBNL Disk Berkeley Storage Manager (BeStMan) [34] and dCache [35].

In addition, because functionalities to support space reservation and sharing are not yet available through grid interfaces, OSG defines a set of environmental variables that a site must implement and a VO can rely on to point them to available space, space shared between all nodes on a compute cluster, and for the use of high performance I/O disk caches.


Gateways to other facilities and grids

We are seeing a rapid growth in the interest and deployment of shared computational infrastructures at the local and regional level. We are also seeing a rapid growth in research communities' needing to move data and jobs between heterogeneous facilities and build integrated community computational systems across high performance computing (HPC) facilities and more traditional computing clusters.

OSG provides interfaces to these HPC facilities to support these use cases. The OSG also federates with other large infrastructures — notably the TeraGrid and EGEE — by providing gateways between them and OSG, and supporting groups to submit jobs across and move data between them. For example, the OSG collects information from the resources and publishes them in the format needed by the EGEE.  The CMS VO "Resource Broker" job dispatcher then submits jobs transparently across EGEE and OSG resources.


Participating in the OSG

New organizations contribute to OSG by providing resources, using the infrastructure, working with the communities to provide software, participating in training and documentation activities, and/or participating in the security, troubleshooting, or other OSG activity areas. The overhead to participation is low and the benefit is matched to the principle that "what you get out depends on what you put in". An organization registers with the Grid Operations Center and provides contact and planned usage information. The OSG staff then helps to interface the resources to the OSG and provides support for the VOs usage.


Training on the OSG

The OSG Education and Training [29] program provides training for student users, researchers and educators of the OSG and site administrators.At the core of the student education program are the Workshops [30], organized by OSG and its partners. These grid schools give advanced undergraduate and graduate students a basic foundation in distributed computing and provide valuable hands-on training in distributed and grid computing techniques. The schools introduce essential skills that will be needed by students in the fields of natural and applied science, engineering and computer science to conduct and support scientific analysis in grid computing environments.

See OSG's Research Highlights [31] for more details.


Bibliography

[1] SURAgrid Web site (http://www.sura.org/suragrid)
[2] SURAgrid-IBM partnership (http://www.sura.org/news/docs/IBMSURAgrid.doc)
[3] SURAgrid User Management and PKI Bridge Certification Authority (https://www.pki.virginia.edu/nmi-bridge)
[4] TIGRE (http://tigreportal.hipcat.net)
[5] Virtual Data Toolkit (http://vdt.cs.wisc.edu/)
[6] SURAgrid Server Stack website (http://omnius.hpcc.ttu.edu/SURAgrid_wiki/ServerStack)
[7] Globus Toolkit 4.0 (http://www.globus.org/toolkit/docs/4.0/)
[8] GSI OpenSSH (http://grid.ncsa.uiuc.edu/ssh/)
[9] UberFTP (http://dims.ncsa.uiuc.edu/set/uberftp/)
[10] MyProxy (http://grid.ncsa.uiuc.edu/myproxy/)
[11] Condor-G (http://www.cs.wisc.edu/condor/condorg/)
[12] VDT supported operating systems (http://vdt.cs.wisc.edu/releases/1.6.1/requirements.html)
[13] pacman (http://www.archlinux.org/pacman/)
[14] SURAgrid PKI Bridge Certification Authority and User Management System (https://www.pki.virginia.edu/sura-bridge/scl/)
[15] grid-mapfile section (https://www.pki.virginia.edu/nmi-bridge/scl/#gridmapfile)
[16] International Grid Trust Federation (http://gridpma.org)
[17] GSI version of SSH (http://grid.ncsa.uiuc.edu/ssh/)
[18] Step 6: Install the GSI-OpenSSH Server (http://grid.ncsa.uiuc.edu/ssh/install.html#install_server)
[19] SURAgrid Support e-mail list (mailto:suragrid-support@sura.org)
[20] VDT Support page (http://vdt.cs.wisc.edu/support.html)
[21] OSG website (http://www.opensciencegrid.org)
[22] Members of the OSG Consortium (http://www.opensciencegrid.org/About/Who_is_the_Open_Science_Grid%3F/Consortium_Members)
[23] OSG partners (http://www.opensciencegrid.org/About/Who_is_the_Open_Science_Grid%3F/Partners)
[24] OSG Researcher fields (http://www.opensciencegrid.org/Science_on_the_OSG/Research_Highlights)
[25] Grid3 (http://www.ivdgl.org/grid2003/)
[26] Virtual Data Toolkit (VDT) (http://vdt.cs.wisc.edu//index.html)
[27] NSF Middleware Institute (NMI) (http://www.nsf-middleware.org/default.aspx)
[28] OSG@Work (http://twiki.grid.iu.edu/twiki/bin/view)
[29] OSG Education and Training (http://twiki.grid.iu.edu/twiki/bin/view/Education/WebHome)
[30] OSG Workshops (http://twiki.grid.iu.edu/twiki/bin/view/Education/GridWorkshops)
[31] OSG Research Highlights (http://www.opensciencegrid.org/Science_on_the_OSG/Research_Highlights)
[32] SRM collaboration working group (http://sdm.lbl.gov/srm-wg)
[33https://twiki.grid.iu.edu/twiki/bin/view/Storage] Storage Group (OSG)
[34] BeStMan (http://datagrid.lbl.gov/bestman)
[35] dCache (http://www.dcache.org)

© 2006-8, Southeastern Universities Research Association
Sponsored by SURA, TATRC (No. W81XWH-06-1-0419), OSG, and iVDGL
Updated September, 2007