- administration host
Administration
hosts are hosts that have permission to carry out administrative activity for the grid engine system.
- access list
A list of users
and UNIX groups who are permitted or denied access to a resource such as a queue or
a host. Users and groups can belong to multiple access lists, and the same access
lists can be used in various contexts.
- array job
A job made up of a range of independent
identical tasks. Each task is similar to a separate job. Array job tasks differ among
themselves only by having unique task identifiers, which are integer numbers.
- campus grid
A grid that enables multiple
projects or departments within an organization to share computing resources.
- cell
A separate cluster with
a separate configuration and a separate master machine. Cells can be used to loosely
couple separate administrative units.
- checkpointing
A procedure
that saves the execution status of a job into a checkpoint, thereby
allowing the job to be aborted and resumed later without loss of information and already
completed work. The process is called migration if the
checkpoint is moved to another host before execution resumes.
- checkpointing environment
A grid engine system configuration
entity that defines events, interfaces, and actions that are associated with a certain
method of checkpointing.
- cluster
A collection of machines,
called hosts, on which grid engine system functions occur.
- cluster grid
The simplest form
of a grid, consisting of computer hosts working together to provide
a single point of access to users in a single project or department.
- cluster queue
A container for a class
of jobs that are allowed to run concurrently. A queue determines certain job attributes,
for example, whether it can be migrated. Throughout its lifetime, a running job is
associated with its queue. Association with a queue affects some of the things that
can happen to a job. For example, if a queue is suspended, all jobs associated with
that queue are also suspended.
- complex
A set of resource
attribute definitions that can be associated with a queue, a host, or the entire cluster.
- department
A list of users
and groups who are treated alike in the functional and override scheduling policies
of the grid engine system. Users and groups can belong to only one department.
- entitlement
The same as share. The amount of resources that are planned to be consumed by a certain
job, user, user group, or project.
- execution host
Systems that have
permission to run grid engine system jobs. These systems host queue instances, and run the
execution daemon sge_execd.
- functional policy
A policy that
assigns specific levels of importance to jobs, users, user groups, and projects. For
instance, through the functional policy, a high-priority project and all its jobs
can receive a higher resource share than a low-priority project.
- global grid
A collection of campus grids
that cross organizational boundaries to create very large virtual systems.
- grid
A collection of computing
resources that perform tasks. Users treat the grid as a single computational
resource.
- group
A UNIX group.
- hard resource requirements
The resources
that must be allocated before a job can be started. Contrast with soft resource
requirements.
- host
A system on which grid engine system functions
occur.
- job
A request from a user
for computational resources from the grid.
- batch job
A batch job is a UNIX
shell script that can be run without user intervention and does not require access
to a terminal.
- interactive job
An interactive
job is a session started with the commands qrsh, qsh,
or qlogin, which open an xterm window for
user interaction or provide the equivalent of a remote login session.
- job class
A set of jobs that are equivalent in some
sense and treated similarly. A job class is defined by the identical requirements
of the corresponding jobs and by the characteristics of the queues that are suitable
for those jobs.
- manager
A user who can manipulate
all aspects of the grid engine software. The superusers of the master host and of any other
machine that is declared to be an administration host have manager privileges. Manager
privileges can be assigned to nonroot user accounts as well.
- migration
The process of
moving a checkpointing job from one host to another before execution of the job resumes.
- operator
Users who can perform
the same commands as managers except that they cannot change the configuration. Operators
are supposed to maintain operation.
- override policy
A policy commonly
used to override the automated resource entitlement management of the functional and
share-based policies. The cluster administrator can modify the automated policy implementation
to assign override to jobs, users, user groups, and projects.
- owner
Users who can suspend
or resume, and disable or enable, the queues they own. Typically, users are owners
of the queue instances that reside on their workstations.
- parallel environment
A grid engine system configuration
that defines the necessary interfaces for the grid engine software to correctly handle parallel
jobs.
- parallel job
A job that is made
up of more than one closely correlated task. Tasks can be distributed across multiple
hosts. Parallel jobs usually use communication tools such as shared memory or message
passing (MPI, PVM) to synchronize and correlate tasks.
- policy
A set of rules and
configurations that the administrator can use to define the behavior of the grid engine system.
Policies are implemented automatically by the system.
- priority
The relative level
of importance of a job compared to others.
- project
A grid engine system project.
- resource
A computational
device consumed or occupied by running jobs. Typical examples are memory, CPU, I/O
bandwidth, file space, software licenses, and so forth.
- master host
The master host is central to the overall
cluster activity. It runs the master daemon sge_qmaster and the
scheduler daemon sge_schedd. By default, the master host is also
an administration host and a submit host.
- share
The same as entitlement. The amount of resources that are planned to be consumed by
a certain job, user, or project.
- share-based policy
A policy that
allows definition of the entitlements of user and projects and arbitrary groups thereof
in a hierarchical fashion. An enterprise, for instance, can be subdivided into divisions,
departments, projects active in the departments, user groups working on those projects,
and users in those user groups. The share-based hierarchy is called a share-tree,
and once a share-tree is defined, its entitlement distribution is automatically implemented
by the grid engine software.
- share-tree
The hierarchical
definition of a share-based policy.
- soft resource requirements
Resources that
a job needs but that do not have to be allocated before a job can be started. Allocated
to a job on an as-available basis. Contrast with hard resource requirements.
- submit host
Submit hosts allow for submitting and controlling batch jobs only. In particular, a user who is logged in to a submit host
can submit jobs using qsub, can control the job status using qstat, and can use the grid engine system's OSF/1 Motif graphical user interface QMON.
- suspension
The process of
holding a running job but keeping it on the execution host (in contrast to checkpointing,
where the job is aborted). A suspended job still consumes some resources, such as
swap memory or file space.
- ticket
A generic unit for
resource share definition. The more ticket shares that a job, user, project, or other
component has, the more important it is. If a job has twice as many tickets as another
job, for example, that job is entitled to twice the resource consumption.
- usage
Another term for "resources
consumed." Usage is determined by an administrator-configurable weighted sum
of CPU time consumed, memory occupied over time, and amount of I/O performed.
- users
People who can submit
jobs to the grid and run them if they have a valid login ID on at least one submit
host and one execution host.
- userset
Either an access list or a department.