Sun Microsystems
Products & Services
 
Support & Training
 
 

Previous Previous     Contents     Index     Next Next
Chapter 1

Introduction to the N1™ Grid Engine 6 Software

This chapter provides background information about the system of networked computer hosts that run the N1™ Grid Engine 6 software (grid engine system. This chapter includes the following topics:

  • A brief explanation of grid computing

  • A description of each of the important components of the product

  • A detailed list of client commands that are available to users and administrators

  • An overview of QMON, the grid engine system graphical user interface

What Is Grid Computing?

A grid is a collection of computing resources that perform tasks. In its simplest form, a grid appears to users as a large system that provides a single point of access to powerful distributed resources. In its more complex form, which is explained later in this section, a grid can provide many access points to users. In all cases, users treat the grid as a single computational resource. Resource management software such as N1 Grid Engine 6 software(grid engine software) accepts jobs submitted by users. The software uses resource management policies to schedule jobs to be run on appropriate systems in the grid. Users can submit millions of jobs at a time without being concerned about where the jobs run.

No two grids are alike. One size does not fit all situations. The following three key classes of grids exist, which scale from single systems to supercomputer-class compute farms that use thousands of processors:

  • Cluster grids are the simplest. Cluster grids are made up of a set of computer hosts that work together. A cluster grid provides a single point of access to users in a single project or a single department.

  • Campus grids enable multiple projects or departments within an organization to share computing resources. Organizations can use campus grids to handle a variety of tasks, from cyclical business processes to rendering, data mining, and more.

  • Global grids are a collection of campus grids that cross organizational boundaries to create very large virtual systems. Users have access to compute power that far exceeds resources that are available within their own organization.

Figure 1-1 shows the three classes of grids. In the cluster grid, a user's job is handled by only one of the systems within the cluster. However, the user's cluster grid might be part of the more complex campus grid. And the campus grid might be part of the largest global grid. In such cases, the user's job can be handled by any member execution host that is located anywhere in the world.

Figure 1-1 Three Classes of Grids

N1 Grid Engine 6 software, the newest version of Sun's resource management solution, provides the power and flexibility required for campus grids. The product is useful for existing cluster grids because it facilitates a smooth transition to creating a campus grid. The grid engine systemeffects this transition by consolidating all existing cluster grids on the campus. In addition, the grid engine system is a good start for an enterprise campus that makes the move to the grid computing model for the first time.

The grid engine software orchestrates the delivery of computational power that is based on enterprise resource policies set by the organization's technical and management staff. The grid engine system uses these policies to examine the available computational resources within the campus grid. The system gathers these resources and then allocates and delivers resources automatically, optimizing usage across the campus grid.

To enable cooperation within the campus grid, project owners who use the grid must do the following:

  • Negotiate policies

  • Have flexibility in the policies for manual overrides for unique project requirements

  • Have the policies automatically monitored and enforced

The grid engine software can mediate among the entitlements of a multitude of departments and projects that are competing for computational resources.

Managing Workload by Managing Resources and Policies

The grid engine system is an advanced resource management tool for heterogeneous distributed computing environments. Workload management means that the use of shared resources is controlled to best achieve an enterprise's goals such as productivity, timeliness, level-of-service, and so forth. Workload management is accomplished through managing resources and administering policies. Sites configure the system to maximize usage and throughput, while the system supports varying levels of timeliness and importance . Job deadlines are instances of timeliness. Job priority and user share are instances of importance.

The grid engine software provides advanced resource management and policy administration for UNIX environments that are composed of multiple shared resources. The grid engine system is superior to standard load management tools with respect to the following major capabilities:

  • Innovative dynamic scheduling and resource management that allows grid engine software to enforce site-specific management polices.

  • Dynamic collection of performance data to provide the scheduler with up-to-the-minute job level resource consumption and system load information.

  • Availability of enhanced security by way of Certificate Security Protocol (CSP)-based encryption. Instead of transferring messages in clear text, the messages in this more secure system are encrypted with a secret key.

  • High-level policy administration for the definition and implementation of enterprise goals such as productivity, timeliness, and level-of-service.

The grid engine software provides users with the means to submit computationally demanding tasks to the grid for transparent distribution of the associated workload. Users can submit batch jobs, interactive jobs, and parallel jobs to the grid.

The product also supports checkpointing programs. Checkpointing jobs migrate from workstation to workstation without user intervention on load demand.

For the administrator, the software provides comprehensive tools for monitoring and controlling jobs.

How the System Operates

The grid engine system does all of the following:

  • Accepts jobs from the outside world. Jobs are users' requests for computer resources.

  • Puts jobs in a holding area until the jobs can be run.

  • Sends jobs from the holding area to an execution device.

  • Manages running jobs.

  • Logs the record of job execution when the jobs are finished.

Matching Resources to Requests

As an analogy, imagine a large "money-center" bank in one of the world's capital cities. In the bank's lobby are dozens of customers waiting to be served. Each customer has different requirements. One customer wants to withdraw a small amount of money from his account. Arriving just after him is another customer, who has an appointment with one of the bank's investment specialists. She wants advice before she undertakes a complicated venture. Another customer in front of the first two customers wants to apply for a large loan, as do the eight customers in front of her.

Different customers with different needs require different types of service and different levels of service from the bank. Perhaps the bank on this particular day has many employees who can handle the one customer's simple withdrawal of money from his account. But at the same time the bank has only one or two loan officers available to help the many loan applicants. On another day, the situation might be reversed.

The effect is that customers must wait for service unnecessarily. Many of the customers could receive immediate service if only their needs were immediately recognized and then matched to available resources.

If the grid engine system were the bank manager, the service would be organized differently.

  • On entering the bank lobby, customers would be asked to declare their name, their affiliations, and their service needs.

  • Each customer's time of arrival would be recorded.

  • Based on the information that the customers provided in the lobby, the bank would serve the following customers:

    • Customers whose needs match suitable and immediately available resources

    • Customers whose requirements have the highest priority

    • Customers who were waiting in the lobby for the longest time

  • In a "grid engine system bank," one bank employee might be able to help several customers at the same time. The grid engine system would try to assign new customers to the least-loaded and most-suitable bank employee.

  • As bank manager, the grid engine system would allow the bank to define service policies. Typical service policies might be the following:

    • To provide preferential service to commercial customers because those customers generate more profit

    • To make sure a certain customer group is served well, because those customers have received bad service so far

    • To ensure that customers with an appointment get a timely response

    • To prefer a certain customer on direct demand of a bank executive

  • Such policies would be implemented, monitored, and adjusted automatically by a grid engine system manager. Customers with preferential access would be served sooner. Such customers would receive more attention from employees, whose assistance those customers must share with other customers. The grid engine manager would recognize if the customers do not make progress. The manager would immediately respond by adjusting service levels in order to comply with the bank's service policies.

Previous Previous     Contents     Index     Next Next