Home |  Previous |  Next |  Print |  Contact

 Related Topics

  
 Acknowledgments
 Preface
 Introduction
 History, Standards & Directions
 What Grids Can Do For You
 Grid Case Studies
 Current Technology for Grids
 Programming Concepts & Challenges
 Joining a Grid: Procedures & Examples
 Typical Usage Examples
 Related Topics
 
 Networks and grids
 Manpower requirements
 Bibliography
 My Favorite Tips
 Glossary
 Appendices
 Use of This Material
 

Related Topics

Over the course of cookbook development, we collected material that is connected to the grid topic though not necessarily in the context of version one of this cookbook. We present it here for your reading as time permits.


Networks and grids

Grids are predicated on the existence of persistent network connections between grid nodes. The grid concept of sharing resources, like storage and computing cycles, via the network is analogous to the idea of sharing information via the Web. Neither the Grid nor the Web would be developing without a broadly deployed, reliable, worldwide network. Ubiquitous high performance networks are often a requirement for high performance computing on the grid, but slower network connections can accommodate some types of applications, particularly if network parameters are considered in resource selection and scheduling.

In a grid, networks serve as the virtual bus for the distributed colleciotn of resources that are orchestrated through grid middleware and are central to the efficient, effective operation of the entity as a whole.. Because of this, understanding networks and how they interact with grid systems is an important part of developing, deploying and managing a grid infrastructure.


General concepts

Network Reference Model and Terminology

Networks have their own terminology and we first introduce some of the important concepts and terms.

The Open Systems Interconnection (OSI [1]) reference model provides a layered abstract description of the computer/network communication model. TCP/IP is the network protocol that enables today's Internet. Although TCP/IP is not a strict implementation of the OSI model (for instance, some applications can extend their functionality beyond the application layer), it is useful to consider network functionality in terms of the model, and its implications for grid operation and performance. Briefly the seven layers of the OSI model, from "lowest" or most fundamental to "highest" are:

  1. Physical Layer (example: Optical fiber)
  2. Data Link Layer (example: Ethernet)
  3. Network Layer (example: Internet Protocol, or IP, in a TCP/IP network)
  4. Transport Layer (example: Transmission Control Protocol, or TCP, in a TCP/IP network)
  5. Session Layer (example: NetBIOS, named pipes)
  6. Presentation Layer (example: ASCII or MPEG encoding)
  7. Application Layer (example: GridFTP)

All layers can have impact on the end-to-end performance of applications, but layers 1-4 are typically associated most closely with the network.

Details of the OSI Reference Model

Basic network functionality involves the transmission of information, or data, from a source to a destination using some addressing scheme. The information is sent by some application (OSI layer 7) using some encoding (layer 6), perhaps within some session (layer 5) and delivered to the "network" (layers 1-4) which, from the application view, "transports" the information to the destination. This section focuses on layers 1 through 4 and considers details as related to IP (Internet Protocol) networks, as the underlying network for grids as covered in this Cookbook.

Transport Layer (4)

At the transport layer a protocol is chosen to manage the delivery of the source information to the destination. There are a number of possible services that can exist at this level, although none of them are required:

  • Connection oriented
  • Ordered delivery
  • Reliable delivery
  • Flow control
  • Ports

The transmitted data is broken into "packets" of potentially varying sizes by the selected transport protocol. Two typical choices are TCP [2] (Transmission Control Protocol) and UDP [3] (User Datagram Protocol). TCP guarantees delivery of packets of information in the order that they were sent. UDP is a lighter weight alternative that doesn't provide any guarantees of delivery or ordering but is faster and more efficient than TCP. Because of lack of feedback inherent in UDP, however, there is no way for UDP traffic to "fairly" share network bandwidth which is a primary concern of TCP.

There are some hardware devices that operate at layer 4. For example web server load balancing devices are used to distribute web page requests amongst many possible servers depending upon their current loading.

Network Layer (3)

Both UDP and TCP (and other transport protocols) rely on the network level (layer 3) to address and transmit information. The network layer addresses messages and translates logical addresses into physical ones. It is responsible for the end-to-end delivery of packets.

One of the most broadly used protocols at the network layer is IP (Internet Protocol) of which IPv4 [4] is the most widespread. The packet structure of an IPv4 packet is shown in Figure NG-1.

Figure NG-1. The format of an IPv4 packet (See RFC 3514 [5] for definition of the "Evil Bit")

The typical layer 3 device on networks is the router (and also "layer-3 switches"). The details of the packet structure are explained in RFC 791 [6]. Some important details about the header:

  • Version is the version of the internet header.
  • IHL is the length of the internet header in 32 bit words (minimum of 5 as shown in Figure NG-1 above).
  • Type of Service is used to indicate abstract parameters of the quality of service desired or more recently to indicate ECN (See RFC 3168).
  • Total Length is 16 bits defining the entire packet size (including header and data) in bytes.

For further details see the Wikipedia entry on IPv4.

Data Link Layer (2)

The data transport layer (layer 2) manages the node-to-node (or hop to hop) packet delivery. One typical example of layer 2 data transport is Ethernet. The framing of Ethernet data is shown in Figure NG-2. The higher level layers are encapsulated inside the layer 2 framing. The switch is a typical layer 2 device.

Figure NG-2. Details of an Ethernet Type II frame format.

Good general explanations of the Ethernet [7] and the Data link layer [8] can be found in Wikipedia.

Physical Layer (1)

The physical layer denotes the actual physical cabling, wireless electromagnetic connection or optically modulated carrier that transports bits in the network.

This layer deals with contention resolution, flow control, initiation and termination of connections and conversion between digital data representations.

Typical devices are Ethernet hubs, optical transponders, wireless access points, etc.

Summarizing

Application data destined for a remote network location undergoes a number of steps before being transmitted on the physical layer. Figure NG-3 shows how fragmented data from an application is encapsulated by various headers and trailers, and at various levels in the OSI network model.

Figure NG-3. OSI data encapsulation.

The figure above shows how application data is progressively encapsulated at each level for transmission on the physical layer. This encapsulation has implications for how networks perform that we will discuss in following sections.

Some important functional considerations

Now that we have an overview of how networks deliver data between source and destination we can discuss some of the important functional considerations that impact how the network performs.

Typically applications have widely varying data sizes which that need to be sent across the network. However, the underlying network layers present their own constraints on the data block sizes they can transport per packet. This means that depending upon the amount of data to be sent multiple packets or frames may be required for a given application data block. One potential way to optimize traffic flow is to maximize the size of the data content of each packet, which is discussed below.

Frame/Packet Size and Rate

Ethernet (layer 2) typically has a constraint on the maximum frame size that can be transmitted. The limit is denoted as the MTU (Maximum Transmission Unit) and is typically 1500 bytes. This has an implication for data transmission. If a large file is being sent across the network there will be many Ethernet frames required to transport the data. Each frame can require the receiving NIC (network interface card) to issue an interrupt to the local processor so it can move the received data. The rate of interrupts depends upon the data arrival speed (network bandwidth). At slower NIC speeds (Ethernet = 10 Mbits/sec or Fast Ethernet 100 Mbits/sec) this is not a daunting challenge for modern processors. However at higher speeds (Gigabit or 10 Gigabit Ethernet) the load can bring even the fastest current processors to their knees and reduce the overall network throughput achievable to well below the "wire-speed".

There exist a number of means of overcoming high-speed limitations:

  1. Increase the layer-2 frame size (Jumbo Frames — Wikipedia [9] and Wareonearth [10].)
  2. Use options in the NIC/drivers to coalesce multiple frames into a single interrupt to the processor (typically controlled by 'ethtool' [11], see the ethtool -C options)
  3. Use NIC options to offload packet processing from the CPU (again, see ethtool options -G and -K)
  4. Tune your system TCP stack for the type of connections you need to optimize (see Enabling High Performance Data Transfers [12] or TCP Tuning Guide [13] for details)
  5. Purchase newer NICs that have significant improvements in reducing the load on the processor.

Network Stacks and TCP

Another consideration is the use of TCP across long, high bandwidth networks. TCP was designed when typical high-speed network connections were Ethernet (10 Mbits/sec) and typical networks were "local" and only rarely regional or national in size. It is perhaps understandable that this protocol does not scale well to gigabit and beyond network speeds over national or international distance scales. Tuning the TCP network stack parameters can help ameliorate poor WAN performance for TCP applications and is often required to achieve any kind of reasonable WAN performance.

TCP, because of its widespread use, plays an important role in how well many networked applications function. Most operating systems "out of the box" have their TCP implementation's tuned for LANs. This is partly historical (most applications used to be "local") and partly to minimize resource consumption (tuning network stacks for WAN performance can significantly increase memory consumption because all network connections may use more resources).

Newer operating systems are doing better at having default configurations which are better optimized for networked use. Windows XP/2003 and Linux (2.6.9+ kernels) have significantly improved in their network tunings. Linux now supports "autotuning" of stacks and allows selecting different variants of TCP (2.6.12+ kernels).

Hosts and Network Performance

A last functional consideration involves the hosts themselves. Many times users will blame "the network" for poor application performance when many times the problem does not originate with the network. Applications that use the network must be considered as an end-to-end system and problems can arise in many different areas from poorly designed applications (especially how such applications interact with the network itself), high host CPU load, low free host memory, deficient or mis-configured host storage subsystems, buggy or badly designed device drivers, out-of-date firmware, poorly tuned network stacks, badly designed or defective NICs or faulty cabling. Ruling out all these issues on the local host still doesn't necessarily implicate the network since those same issues may be affecting the remote host.

Of course there are times when the network is the problem. Congested or mis-configured local-area, campus, regional or backbone networks cause significant disruption to networked applications.

This is important when considering network tuning and monitoring for grids. To help manage and optimize networks for grid use will require careful attention on the whole "end-to-end" problem and not just the network in isolation. Being able to determine if the problem is at either end or the network in the middle is critical to quickly resolving the problem.

Network components and operation

  • NICs and Hosts – Network interface cards (NICs) provide hosts with access to the network. Typically these cards encode their information via Ethernet at layer 2 (either wired or wireless) with speeds from 10-10000 Mbits/sec. The wired versions typically come in copper or fiber (optical) variants.
  • Switches – These are the "layer 2" devices with 2 or more ports responsible for interconnecting multiple network devices (hosts, switches, etc). Switches "learn" the hardware addresses (MAC for Ethernet) of the devices connected to each of their ports and can switch layer 2 packets coming in one port to the "correct" destination port. Newer switches are typically "non-blocking" meaning that all their ports allow wire-speed, full duplex interconnections (each port can "talk" to a partner port full-duplex at the same times as all other port pairs are doing this).
  • Routers – These are the "layer 3" devices with 2 or more ports responsible for "routing" (determining the best path for) IP packets across the network. Routers are aware of the various networks connected to their ports and can route incoming IP packets to the correct destination port.
  • Optical Components – Many newer switches and routers utilize optical rather then electrical interconnects. Instead of copper cabling connecting to a port, fiber carrying modulated light is used to transmit information. Light pulses can propagate further without degradation compared to electrical signaling on copper cables. Since almost all existing switches and routers utilize electronic components internally to fulfill their roles, network information encoded in modulated light must be translated into electrical signals for processing (and then perhaps converted back to light for transmission). The process is referred to as OEO (Optical-Electrical-Optical). Gigabit speed optical components are called GBICs (GigaBit Interface Convert) and typical 10 Gigabit components are called Xenpaks. Both come in a variety of physical layer interfaces (single-mode, multi-mode, extended reach, short range, long range, etc.). Note that there are also optical "switches" which can connect light from a source fiber to a destination fiber through the use of small mirrors on the millisecond timescale.
  • Monitoring – Monitoring is an often neglected but vital component of networks and their operation. Being able to track and measure various network data is critical for problem diagnosis and localization, resource planning and network management. Broadly speaking "monitoring" can include tracking network switch/router configurations, port bandwidth utilization and errors, system logging information (syslog from switches/hosts/routers), network "flow" information and statistics and endhost network usage and errors. There is no uniform "end-to-end" monitoring system for networks that is deployed but there are a number of projects working on providing a lot of this capability: PerfSONAR, MonALISA and others. Also SNMP (Simple Network Management Protocol) is available to provide a standard way of accessing much of the information about the network and the devices which comprise it.

Measurement and monitoring

If you don't measure, you don't know.

Grids consist of many hardware and software components, any of which can break or misbehave. Monitoring and measuring at least portions of the network connections between grid nodes is necessary for reliable operation and support. This section discusses some of the tools available for this.

Whether your concern is monitoring the network itself or the user experience, monitoring from diverse locations is essential to identifying problems. Layers 1 through 3 can be monitored using typical network monitoring and measurement tools, however, we also need to look into the application layer to understand what users are actually trying to do. In addition to getting information to the network manager (ideally before a user would notice the problem), monitoring that can transgress layers greatly benefits the task of root cause analysis.

We describe the family of network monitoring approaches in two categories, passive monitors and active monitors. [18]

  • Passive Monitors

    Passive monitors don't add traffic to the network; they just provide a view of what goes by. The major advantage of this of course is that no extra load is generated on the network and servers from the use of the monitoring device itself. If the monitoring is done in enough detail, however, user-perceived performance for network activities such as TCP connections, DNS lookups and file transfers can be gauged. This is significant because, while many of the active approaches (described later) claim to provide a measure of the user experience, they do not involve measurement of actual user activities.

    The disadvantage of passive monitoring is that it becomes more and more difficult to monitor correctly as the volume of data on the network increases. The movement away from true broadcast networks to switching further complicates this situation in that more monitoring points are required in order to "see" all of the traffic. Another problem is the increasing use of encryption , which can hide the actual application details that we want to monitor.

    Passive monitoring generally relies on a promiscuous mode tap that can see all network traffic. This is the classic Remote MONitoring [19] (RMON) approach and can be found in commercial products like TrafficDirector as well as the current GOAT and many other publicly available tools and appliances such as NTOP. These tools are typically deployed at one or more locations on a network (e.g. border gateway, one per subnet). The data is gathered and often brought back to a central server for correlation and analysis.

    In addition to the dedicated monitoring device, there are a number of passive client-based tools that have been developed. These tools focus on the network performance experienced by a single user. A passive monitor, installed on the user's computer, watches network applications as they are being used and reports the performance to a central collection point. From the network and service manager this is ideal as all of the users become "free" network probes. Of course, nothing is ever really free and some performance degradation is likely to be obvious tothe user. The more successful attempts at this have worked to limit the pain. Some examples in this arena are a commercial product called FirstSense and NETI@home [20], an open source package from Georgia Institute of Technology.

  • Active Monitors

    Unlike passive monitors, active monitors will generate traffic to perform a measurement. This includes traditional network tests like ping [22] and traceroute [23] but also application tests like file transfers and DNS lookups. The primary advantages of this approach are that it is somewhat easier to implement than the passive scanner and that it is possible for the network administrator to see a problem even before a user would see it. For instance, we can discover that the mail server went down at 4am and get it back up and running before users ever notice there is a problem. The primary disadvantages of active monitoring are the additional load on both network and servers and the fact that we don't actually observe the real user experience but something designed to look like a user. The techniques used can be divided into two groups: real tests and synthetic tests.

    A Real Test Active Monitor is a probe that sits out on the network, either in a dedicated box or on a user's computer, and performs operations with an on-line, production server. This tests not only the network performance but also the complete end-to-end service. The goal is to get as close to the real user's experience as possible. If the probe can do a DNS lookup or get a DHCP lease, then there's a good chance that the user can too. Tools available for use in real test monitoring include the publicly available Nagios and commercial tools from Micromuse.

    An Active Synthetic Test is very similar to a real test in that it performs some real application, such as a file transfer. However, this is not done to the production server but to a collection of dedicated performance testing boxes. There are several of these in the Internet today. Tools such as AMP and PingER are in this category, along with many others. The Iperf tool is often used in this manner. The Ganymede tool is a commercial offering that operates in this way.

    Most implementations of active monitors will break the test down into components. For instance, a web server measurement will include timings for DNS lookup, TCP connection and then detailed application transaction timings (complete order, process credit card, etc.)

    Active monitoring tools include some simple things like:

    • ping: The source site sends a packet of some size to a destination site. The source site measures how long it takes the destination site to return the packet and determines if there was any data loss in the returned packet. ping is very helpful in telling if the destination site is down, unreachable for some reason, or if the network between is causing delays or transmission issues of any kind. You can try the command ping internet2.edu from a Unix workstation to see how it works; other operating systems typically offer the same or a similar command. (Note: Some system administrators disable ping acknowledgements from their machines if they feel the network traffic it generates is unbearable.)
    • traceroute: The traceroute tool does just that - it traces the route between source and destination IP sites. The segment between each two sites along the path is called a "hop". Traceroute will list each hop IP address (and hostname if available) along with three sample test times. Viewing the route and the test times can be very informative. For example, you may find that the route taken is not what you expected which can indicate a network outage in your usual route. You may find serious delays (which will be noted with asterisks or test timeouts.) And you may find many more hops than you expected to see denoting possible rerouting or path problems. Try a traceroute sura.org to see how this cookbook material reaches you!
    • iperf: iperf measures TCP and UDP bandwidth performance. NLANR's Iperf [24] tool reports bandwidth, delay jitter, and datagram loss.

    Internet2 has also developed several advanced tools for network measurement:

    • owamp: One-Way Active Measurement Protocol (OWAMP) [25] is a command line client application and a policy daemon used to determine one way latencies between hosts and is an implementation of the standard of the same name. The one-way measurements performed by OWAMP help to determine the direction of the congestion (note that the route there is not always the same as the route back.)
    • bwctl: Bandwidth Test Controller (BWCTL) [26] is a command line client application and a scheduling and policy daemon for using Iperf. BWCTL does things like arrange Iperf tests between different servers on different systems, request and reserve specific types of tests, and streamline multipoint tests via configuration options for administrators.
    • ndt: Network Diagnostic Tool (NDT) [27] is a client/server program that provides network configuration and performance testing to a users desktop or laptop computer. NDT will look for things like duplex mismatch conditions on Ethernet/FastEthernet links, incorrectly set TCP buffers in the user’s computer, or problems with the local network infrastructure. A multi-level series of plain language messages, suitable for novice users, and detailed test results, suitable for a network engineer, are generated and available to the user (test results may be easily emailed to the appropriate administrator to assist in the problem resolution phase as well.)

Of course, the real value of all of this monitoring is limited unless there is adequate work on the data gathering, correlation and reporting tools. This is where the real analysis is done to determine first whether or not a problem exists and then who to contact to get it resolved.

Other examples of active monitors include:

  • Popular monitoring tools

    MRTG: The Multi Router Traffic Grapher (MRTG) [28] is a tool to monitor the traffic load on network links. MRTG generates HTML pages containing PNG images which provide a LIVE visual representation of this traffic.



    Figure NG-4. A sample MRTG output graph.


    Cacti: Cacti, the Complete RRDTool-based Graphing Solution [34] is a very versatile, easy-to-manage tool designed to harness the power of RRDTool's data storage and graphing functionality. It supports a plugin architecture and has been demonstrated to scale to monitoring thousands of hosts. End-systems, network devices and many other types of information can be tracked via Cacti. There are plugins to alert based upon threshold and system events as well as the ability to gather and track MAC address locations in complicated switching environments.


    Figure NG-5. Cacti Dual Pane Tree View.


    OpenView: HP OpenView Network Node Manager Smart Plug-in for IP Multicast [29] is designed specifically to manage the multicast environment. OpenView will:
    • Automatically discover IP multicast routing topology relationships
    • Proactively monitor device health and measure IP multicast traffic flow
    • Rapidly generate alarms based on multicast activity
    • Quickly isolate and fix multicast faults through built-in diagnostic capability.

  • Tools for monitoring clusters and servers

    ganglia: The Ganglia Monitoring System [30] is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization.

    Figure NG-6. Several real-time graphical reports from Ganglia.
    (http://lindir.ics.muni.cz/ganglia/?c=skurut%20cluster&m=&r=hour&s=descending&hc=4).

  • Comprehensive tools

    MonALISA: MONitoring Agents using a Large Integrated Services Architecture (MonALISA) [31] has been developed by Caltech and its partners with the support of the U.S. CMS software and computing program. The framework is based on Dynamic Distributed Service Architecture and is able to provide complete monitoring, control and global optimization services for complex systems. MonALISA provides:
    • Distributed Registration and Discovery for Services and Applications.
    • Monitoring all aspects of complex systems.
    • System information for computer nodes and clusters.
    • Network information (traffic, flows, connectivity, topology) for WAN and LAN.
    • Monitoring the performance of Applications, Jobs or services.
    • End User Systems, and End-To-End performance measurements.
    • Can interact with any other services to provide in near real-time customized information based on monitoring information.
    • Secure, remote administration for services and applications.
    • Agents to supervise applications, to restart or reconfigure them, and to notify other services when certain conditions are detected.
    • The Agent system can be used to develop higher level decision services, implemented as a distributed network of communicating agents, to perform global optimization tasks.
    • Graphical User Interfaces to visualize complex information.
    • Global monitoring repositories for distributed Virtual Organizations.


    Figure NG-7. Several real-time graph examples from MonALISA global statistics
    (http://monalisa.cacr.caltech.edu/monalisa__Looking_Glass.htm)

    PerfSONAR: PERFormance Service Oriented Network monitoring ARchitecture (PerfSONAR) [32] has three contexts: it is a consortium, a protocol, and a set of code. For our purposes, the last item is most interesting to us in terms of the services developed to act as an intermediate layer, between the performance measurement tools and the diagnostic or visualization applications. Major perfSONAR services include:
    • Measurement Point Service: Creates and/or publishes monitoring information related to active and passive measurements.
    • Measurement Archive Service: Stores and publishes monitoring information retrieved from Measurement Point Services.
    • Lookup Service: Registers all participating services and their capabilities.
    • Authentication Service: Manages domain-level access to services via tokens.
    • Transformation Service: Offers custom data manipulation of existing archived measurements.
    • Resource Protector Service: Manages granular details regarding system resource consumption.
    • Toplogy Service: Offers topological information on networks.

    An example of PerfSONAR use can be seen at the ESnet PerfSONAR Traceroute Visualizer [33].


Manpower requirements

Grid system administration and manpower requirements of a campus-wide grid (Texas Tech University example)

Clear definition of operational policies and procedures provides a foundation for the successful support of a production grid. The examples below from TechGrid, the campus grid of Texas Tech University, illustrate the level of detail that is considered in some key areas of policy and administration, and evolving as needed to support increasing usage and infrastructure development.

This section shall outline the administration requirements of a campus-wide grid, how a grid works, and policies/procedures of TechGrid with respect to events that are grid related in nature such as grid infrastructure failures, planned maintenance, and planned reimaging of given 'zones'.

Zone Administrators duties (2 hours a week):

Responsibilities:

1.     Installing nodes: A script has been provided.

2.     Uninstalling nodes: A script has been provided.

3.     Configuring nodes: A script has been provided.

4.     Testing nodes: A script has been provided.

5.     Reimaging: This is a standard ATLC function; however nodes need to be uninstalled before reimaging takes place.

Campus Grid Administrators duties (20 hours a week):

Responsibilities:

1.     Maintain the Bootstrap server: create scripts to monitor Grid usage and failures.

2.     Train Zone Administrators.

3.     Write scripts to add functionality to the Grid.

4.     Train users.

5.     Find more resources to add to the Grid. 

6.     Help Zone Administrators with Grid related issues.

7.     Help students/researchers Grid-enable their code.

6.     Installing nodes: Install nodes into new Grid zones.

7.     Uninstalling nodes: Help new Grid zones uninstall at first reimaging.

8.     Configuring nodes: Help new Grid zones configure their nodes.

9.     Testing nodes: Create scripts that test the availability of a node.

Emergency procedures:

1.     Grid Maintenance: If the Campus Administrator knows when the Grid will go down with enough warning, then the Grid can be gracefully unmounted using an elegant shutdown script that will unmount each individual node in the Grid without affecting quality of services for the end users. Zone Administrators will be told of the Grid shutdown in advance. Grid Zone Administrators will be asked to reset their compute nodes at the end of the day to reactivate the Grid on those nodes.

2.      If an emergency shutdown is required, then the Grid will be shutdown without dismounting worker nodes.  This case is rare since this type of failure is caused by circumstances beyond Grid Administration control such as power or chiller failure at Reese Center. Emails will be sent to Zone Administrators.

3.     Grid Failures: If the Grid goes down without warning, then the next step will be to disable Grid system services on each machine (This is a rare occurrence).  It is the zone administrator's duty to inform the Campus Grid Administrator when issues like this arise so that a remedy can be applied immediately.

In review, the current policies that are in effect:

1.     Grid nodes cannot be used during the day.

2.     Grid nodes cannot be used at anytime if processing load is higher than 50%.

3.     Grid nodes cannot be used at anytime if anyone is logged into it locally or remotely.

4.     Jobs will cease automatically if a user logs in or if the wall clock time of the Grid node displays any time between 7:00AM and 8:00PM

Contact

Jerry Perez, Texas Tech University.

URL: http://www.hpcc.ttu.edu/techgrid.html [40]


Bibliography

[1] OSI Model — Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/OSI_model)
[2] Transmission Control Protocol — Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Transmission_Control_Protocol)
[3] User Datagram Protocol — Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/User_Datagram_Protocol)
[4] IPv4 — Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/IPv4)
[5] The Security Flag in the IPv4 Header (http://www.ietf.org/rfc/rfc3514.txt)
[6] RFC 791 (http://tools.ietf.org/html/rfc791)
[7] Ethernet — Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Ethernet)
[8] Data_link_layer — Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Data_link_layer)
[9] Jumbo Frames — Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Jumbo_Frames)
[10] Gigabit Ethernet Jumbo Frames (http://sd.wareonearth.com/~phil/jumbo.html)
[11] ethtool (http://sourceforge.net/projects/gkernel)
[12] Enabling High Performance Data Transfers (http://www.psc.edu/networking/projects/tcptune/)
[13] TCP Tuning Guide (http://dsd.lbl.gov/TCP-tuning/TCP-tuning.html)
[18] Taxonomy of Network and Service Monitoring Approaches (http://www.rnoc.gatech.edu/cpr/taxonomy.html)
[22] ping, From Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Ping)
[23] traceroute, From Wikipedia, the free encyclopedia (http://en.wikipedia.org/wiki/Traceroute)
[24] Iperf — The TCP/UDP Bandwidth Measurement Tool (http://dast.nlanr.net/Projects/Iperf/)
[25] One-Way-Ping (OWAMP) (http://e2epi.internet2.edu/owamp/)
[26] Bandwidth Test Controller (BWCTL) (http://e2epi.internet2.edu/bwctl/)
[27] Network Diagnostic Tool (NDT) (http://e2epi.internet2.edu/ndt/)
[28] Tobi Oetiker's MRTG — The Multi Router Traffic Grapher (http://oss.oetiker.ch/mrtg/)
[29] HP OpenView Network Node Manager Smart Plug-in for IP Multicast (http://www.openview.hp.com/products/mcast/)
[30] Ganglia Monitoring System (http://ganglia.sourceforge.net/)
[31] Monalisa — Monitoring the Grid since 2001 (http://monalisa.cacr.caltech.edu/monalisa.htm)
[32] PERFormance Service Oriented Network monitoring ARchitecture (http://wiki.perfsonar.net/jra1-wiki/index.php/PerfSONAR_About)
[33] ESnet PerfSONAR Traceroute Visualizer (https://performance.es.net/cgi-bin/level0/perfsonar-trace.cgi)
[34] Cacti, the Complete RRDTool-based Graphing Solution (http://cacti.net/)
[40] Texas Tech TechGrid (http://www.hpcc.ttu.edu/techgrid.html)

© 2006-8, Southeastern Universities Research Association
Sponsored by SURA, TATRC (No. W81XWH-06-1-0419), OSG, and iVDGL
Updated September, 2007