Sun Microsystems
Products & Services
 
Support & Training
 
 

Previous Previous     Contents     Index     Next Next

Ask the grid engine system administration for the available parallel environment interfaces best suited for your types of parallel jobs.

You can specify resource requirements along with your parallel environment request. The specifying of resource requirements further reduces the set of eligible queues for the parallel environment interface to those queues that fit the requirement. See "Defining Resource Requirements" in N1 Grid Engine 6 User's Guide.

For example, assume that you run the following command:

% qsub -pe mpi 1,2,4,8 -l nastran,arch=osf nastran.par

The queues that are suitable for this job are queues that are associated with the parallel environment interface mpi by the parallel environment configuration. Suitable queues also satisfy the resource requirement specification specified by the qsub -l command.


Note - The parallel environment interface facility is highly configurable. In particular, the administrator can configure the parallel environment startup and stop procedures to support site-specific needs. See the sge_pe(5) man page for details. Use the qsub -v and qsub -V commands to pass information from the user who submits the job to the startup and stop procedures. These two options export environment variables. If you are unsure, ask the administrator whether you are required to export certain environment variables.


Configuring Parallel Environments From the Command Line

Type the qconf command with appropriate options:

qconf options

The following options are available:

  • qconf -ap pe-name

    The -ap option (add parallel environment) displays an editor containing a parallel environment configuration template. The editor is either the default vi editor or an editor defined by the EDITOR environment variable. pe-name specifies the name of the parallel environment. The name is already provided in the corresponding field of the template. Configure the parallel environment by changing the template and saving to disk. See the sge_pe(5) man page for a detailed description of the template entries to change.

  • qconf -Ap filename

    The -Ap option (add parallel environment from file) parses the specified file filename and adds the new parallel environment configuration.

    The file must have the format of the parallel environment configuration template.

  • qconf -dp pe-name

    The -dp option (delete parallel environment) deletes the specified parallel environment.

  • qconf -mp pe-name

    The -mp option (modify parallel environment) displays an editor containing the specified parallel environment as a configuration template. The editor is either the default vi editor or an editor defined by the EDITOR environment variable. Modify the parallel environment by changing the template and saving to disk. See the sge_pe(5) man page for a detailed description of the template entries to change.

  • qconf -Mp filename

    The -Mp option (modify parallel environment from file) parses the specified file filename and modifies the existing parallel environment configuration.

    The file must have the format of the parallel environment configuration template.

  • qconf -sp pe-name

    The -sp option (show parallel environment) prints the configuration of the specified parallel environment to standard output.

  • qconf -spl

    The -spl option (show parallel environment list) lists the names of all currently configured parallel environments.

Parallel Environment Startup Procedure

The grid engine system starts the parallel environment by using the exec system call to invoke a startup procedure. The name of the startup executable and the parameters passed to this executable are configurable from within the grid engine system.

An example for such a startup procedure for the PVM environment is contained in the distribution tree of the grid engine system. The startup procedure is made up of a shell script and a C program that is invoked by the shell script. The shell script uses the C program to start up PVM cleanly. All other required operations are handled by the shell script.

The shell script is located under sge-root/pvm/startpvm.sh. The C program file is located under sge-root/pvm/src/start_pvm.c.


Note - The startup procedure could have been a single C program. The use of a shell script enables easier customization of the sample startup procedure.


The example script startpvm.sh requires the following three arguments:

  • The path of a host file generated by grid engine software, containing the names of the hosts from which PVM is to be started

  • The host on which the startpvm.sh procedure is invoked

  • The path of the PVM root directory, usually contained in the PVM_ROOT environment variable

These parameters can be passed to the startup script as described in Configuring Parallel Environments With QMON. The parameters are among the parameters provided to parallel environment startup and stop scripts by the grid engine system during runtime. The required host file, as an example, is generated by the grid engine system. The name of the file can be passed to the startup procedure in the parallel environment configuration by the special parameter name $pe_hostfile. A description of all available parameters is provided in the sge_pe(5) man page.

The host file has the following format:

  • Each line of the file refers to a queue on which parallel processes are to run.

  • The first entry of each line specifies the host name of the queue.

  • The second entry specifies the number of parallel processes to run in this queue.

  • The third entry denotes the queue.

  • The fourth entry denotes a processor range to use in case of a multiprocessor machine.

This file format is generated by the grid engine system. The file format is fixed. Parallel environments that need a different file format must translate it within the startup procedure. See the startpvm.sh file. PVM is an example of a parallel environment that needs a different file format.

When the grid engine system starts the parallel environment startup procedure, the startup procedure launches the parallel environment. The startup procedure should exit with a zero exit status. If the exit status of the startup procedure is not zero, grid engine software reports an error and does not start the parallel job.


Note - You should test any startup procedures first from the command line, without using the grid engine system. Doing so avoids all errors that can be hard to trace if the procedure is integrated into the grid engine system framework.


Termination of the Parallel Environment

When a parallel job finishes or is aborted, for example, by qdel, a procedure to halt the parallel environment is called. The definition and semantics of this procedure are similar to the procedures described for the startup program. The stop procedure can also be defined in a parallel environment configuration. See, for example, Configuring Parallel Environments With QMON.

The purpose of the stop procedure is to shut down the parallel environment and to reap all associated processes.


Note - If the stop procedure fails to clean up parallel environment processes, the grid engine system might have no information about processes that are running under parallel environment control. Therefore the stop procedure cannot clean up these processes. The grid engine software, of course, cleans up the processes directly associated with the job script that the system has launched.


The distribution tree of the grid engine system also contains an example of a stop procedure for the PVM parallel environment. This example resides under sge-root/pvm/stoppvm.sh. It takes the following two arguments:

  • The path to the host file generated by the grid engine system

  • The name of the host on which the stop procedure is started

Similar to the startup procedure, the stop procedure is expected to return a zero exit status on success and a nonzero exit status on failure.


Note - You should test any stop procedures first from the command line, without using the grid engine system. Doing so avoids all errors that can be hard to trace if the procedure is integrated into the grid engine system framework.


Previous Previous     Contents     Index     Next Next