![]() |
|||
![]() |
![]() ![]() |
![]() |
![]() ![]() |
![]() |
![]() ![]() |
![]() |
| |||||||||||
Chapter 6Verifying the InstallationThe verification phase includes the following tasks:
Verifying the InstallationTo ensure that the grid engine system daemons are running, look for the sge_qmaster and sge_schedd daemons on the master host, and then the sge_execd daemon on the execution hosts. Once you have verified that the daemons are running, you can try to use commands, and prepare to submit jobs. Note - If no cell name was specified during installation, the value of cell is default.
|
% ps -ax | grep sge |
On systems running a UNIX System 5-based operating system (such as the Solaris Operating System), type the following command.
% ps -ef | grep sge |
Verify that the daemons are running by looking through the output for sge strings that are similar to the following examples.
Specifically, you should see that the sge_qmaster daemon and the sge_schedd daemon are running.
On a BSD-based UNIX system, you should see output such as the following example.
14676 p1 S < 4:47 /gridware/sge/bin/solaris/sge_qmaster 14678 p1 S < 9:22 /gridware/sge/bin/solaris/sge_schedd |
On a UNIX System 5-based system, you should see output such as the following example.
root 439 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_qmaster root 446 1 0 Jun 2 ? 3:37 /gridware/sge/bin/solaris/sge_schedd |
If you do not see the appropriate strings, one or more daemons that are required on the master host are not running.
Look in the file sge-root/cell/common/act_qmaster to see if you really are on the master host.
Restart the daemons by hand.
To start the master host daemons, sge_qmaster and sge_schedd:
# sge-root/cell/common/sgemaster start |
Continue the verification process.
After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands.
Log in to the execution hosts on which you ran the execution host installation procedure.
Verify that the daemons are running by typing one of the following commands, depending on the operating system you are running.
On BSD-based UNIX systems, type the following command.
% ps -ax | grep sge |
On systems running a UNIX System 5--based operating system (such as the Solaris Operating System), type the following command.
% ps -ef | grep sge |
Verify the daemons are running by looking for the sge_execd string in the output.
Specifically, you should see that the sge_execd daemon is running.
On a BSD-based UNIX system, you should see output such as the following example.
14688 p1 S < 4:27 /gridware/sge/bin/solaris/sge_execd |
On a UNIX System 5-based system, such as the Solaris Operating System, you should see output such as the following example.
root 171 1 0 Jun 22 ? 7:11 /gridware/sge/bin/solaris/sge_execd |
If you do not see similar output, the daemon required on the execution host is not running. Restart the daemon by hand.
# sge-root/cell/common/sgeexecd start |
Continue the verification process.
After you have verified that the master host and the execution host daemons are running, continue the verification process. See How to Run Simple Commands.
If both the necessary daemons are running on the master and execution hosts, the grid engine software should be operational. Check by issuing a trial command.
Log in to either the master host or another administrative host.
In your standard search path, make sure to include sge-root/bin.
From the command line, type the following command.
% qconf -sconf |
This qconf command displays the current global cluster configuration (see "Basic Cluster Configuration" in N1 Grid Engine 6 Administration Guide).
If this command fails, your SGE_ROOT environment variable is not set correctly.
Check whether the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set in the script files, sge-root/cell/common/settings.csh or sge-root/cell/common/settings.sh.
Note - If no cell name was specified during installation, the value of cell is default.
If so, make sure that the environment variables SGE_EXECD_PORT and SGE_QMASTER_PORT are set to the correct value before you try the command again.
If not, verify whether your NIS services map contains entries for sge_qmaster and sge_execd.
If the SGE_EXECD_PORT and SGE_QMASTER_PORT variables are not used in these files, then the services database (for example, /etc/services or the NIS services map) on the machine from which you run the command must provide entries for both sge_qmaster and sge_execd. If it doesn't, add such an entry to the machine's services database, giving it the same value as is configured on the master host.
Retry the qconf command.
Try to submit test jobs.
![]() ![]() |