Sun Microsystems
Products & Services
 
Support & Training
 
 

Previous Previous     Contents     Index     Next Next

Queue Sorting

The following means are provided to determine the order in which the grid engine system attempts to fill up queues:

  • Load reporting. Administrators can select which load parameters are used to compare the load status of hosts and their queue instances. The wide range of standard load parameters that are available, and an interface for extending this set with site-specific load sensors, are described in Load Parameters.

  • Load scaling. Load reports from different hosts can be normalized to reflect a comparable situation. See Configuring Execution Hosts With QMON.

  • Load adjustment. The grid engine software can be configured to automatically correct the last reported load as jobs are dispatched to hosts. The corrected load represents the expected increase in the load situation caused by recently started jobs. This artificial increase of load can be automatically reduced as the load impact of these jobs takes effect.

  • Sequence number. Queues can be sorted following a strict sequence.

Job Sorting

Before the grid engine system starts to dispatch jobs, the jobs are brought into priority order, highest priority first. The system then attempts to find suitable resources for the jobs in priority sequence.

Without any administrator influence, the order is first-in-first-out (FIFO). The administrator has the following means to control the job order:

  • Ticket-based job priority. Jobs are always treated according to their relative importance as defined by the number of tickets that the jobs have. Pending jobs are sorted in ticket order. Any change that the administrator applies to the ticket policy also changes the sorting order.

  • Urgency-based job priority. Jobs can have an urgency value that determines their relative importance. Pending jobs are sorted according to their urgency value. Any change applied to the urgency policy also changes the sorting order.

  • POSIX priority. You can use the -p option to the qsub command to implement site-specific priority policies. The -p option specifies a range of priorities from -1023 to 1024. The higher the number, the higher the priority. The default priority for jobs is zero.

  • Maximum number of user or user group jobs. You can restrict the maximum number of jobs that a user or a UNIX user group can run concurrently. This restriction influences the sorting order of the pending job list, because the jobs of users who have not exceeded their limit are given preference.

For each priority type, a weighting factor can be specified. This weighting factor determines the degree to which each type of priority affects overall job priority. To make it easier to control the range of values for each priority type, normalized values are used instead of the raw ticket values, urgency values, and POSIX priority values.

The following formula expresses how a job's priority values are determined:

job_priority = weight_urgency * normalized_urgency_value + 
weight_ticket * normalized_ticket_value + 
weight_POSIX_priority * normalized_POSIX_priority_value

You can use the qstat command to monitor job priorities:

  • Use qstat -prio to monitor job priorities overall, including POSIX priority.

  • Use qstat -ext to monitor job priorities based on the ticket policy.

  • Use qstat -urg to monitor job priorities based on the urgency policy.

  • Use qstat -prito diagnose job priority issues when urgency policy, ticket based policies and -p <priority> are used concurrently

  • Use qstat -explainto diagnose various queue instance based error conditions.

About the Urgency Policy

The urgency policy defines an urgency value for each job. The urgency value is derived from the sum of three contributions:

  • Resource requirement contribution

  • Waiting time contribution

  • Deadline contribution

The resource requirement contribution is derived from the sum of all hard resource requests, one addend for each request.

If the resource request is of the type numeric, the resource request addend is the product of the following three elements:

If the resource request is of the type string, the resource request addend is the resource's urgency value as defined in the complex.

The waiting time contribution is the product of the job's waiting time, in seconds, and the waiting-weight value specified in the Policy Configuration dialog box.

The deadline contribution is zero for jobs without a deadline. For jobs with a deadline, the deadline contribution is the weight-deadline value, which is defined in the Policy Configuration dialog box, divided by the free time, in seconds, until the deadline initiation time.

For information about configuring the urgency policy, see Configuring the Urgency Policy.

Resource Reservation and Backfilling

Resource reservation enables you to reserve system resources for specified pending jobs. When you reserve resources for a job, those resources are blocked from being used by jobs with lower priority.

Jobs can reserve resources depending on criteria such as resource requirements, job priority, waiting time, resource sharing entitlements, and so forth. The scheduler enforces reservations in such a way that jobs with the highest priority get the earliest possible resource assignment. This avoids such well-known problems as "job starvation".

You can use resource reservation to guarantee that resources are dedicated to jobs in job-priority order.

Consider the following example. Job A is a large pending job, possibly parallel, that requires a large amount of a particular resource. A stream of smaller jobs B(i) require a smaller amount of the same resource. Without resource reservation, a resource assignment for job A cannot be guaranteed, assuming that the stream of B(i) jobs does not stop. The resource cannot be guaranteed even though job A has a higher priority than the B(i) jobs.

With resource reservation, job A gets a reservation that blocks the lower priority jobs B(i). Resources are guaranteed to be available for job A as soon as possible.

Backfilling enables a lower-priority job to use resources that are blocked due to a resource reservation. Backfilling work only if there is a runnable job whose prospective run time is small enough to allow the blocked resource to be used without interfering with the original reservation.

In the example described earlier, a job C, of very short duration, could use backfilling to start before job A.

Because resource reservation causes the scheduler to look ahead, using resource reservation affects system performance. In a small cluster, the effect on performance is negligible when there are only a few pending jobs. In larger clusters, however, and in clusters with many pending jobs, the effect on performance might be significant.

To offset this potential performance degradation, you can limit the overall number of resource reservations that can be made during a scheduling interval. You can limit resource reservation in two ways:

  • To limit the absolute number of reservations that can be made during a scheduling interval, set the Maximum Reservation parameter on the Scheduler Configuration dialog box. For example, if you set Maximum Reservation to 20, no more than 20 reservations can be made within an interval.

  • To limit reservation scheduling to only those jobs that are important, use the -R y option of the qsub command. In the example described earlier, there is no need to schedule B(i) job reservations just for the sake of guaranteeing the resource reservation for job A. Job A is the only job that you need to submit with the -R y option.

You can configure the scheduler to monitor how it is influenced by resource reservation. When you monitor the scheduler, information about each scheduling run is recorded in the file sge-root/cell/common/schedule.

The following example shows what schedule monitoring does. Assume that the following sequence of jobs is submitted to a cluster where the global license consumable resource is limited to 5 licenses:

qsub -N L4_RR -R y -l h_rt=30,license=4 -p 100 $SGE_ROOT/examples/jobs/sleeper.sh 20
qsub -N L5_RR -R y -l h_rt-30,license=5        $SGE_ROOT/examples/jobs/sleeper.sh 20
qsub -N L1_RR -R y -l h_rt=31,license=1        $SGE_ROOT/examples/jobs/sleeper.sh 20

Assume that the default priority settings in the scheduler configuration are being used:

weight_priority          1.000000
weight_urgency           0.100000
weight_ticket            0.010000

Previous Previous     Contents     Index     Next Next