![]() |
|||
![]() |
![]() ![]() |
![]() |
![]() ![]() |
![]() |
![]() ![]() |
![]() |
| ||
Chapter 4Monitoring and Controlling Jobs and QueuesAfter you submit jobs, you need to monitor and control them. This chapter provides background information about monitoring, and controlling jobs and queues, as well as instructions for how to do these tasks. The chapter also includes information about job checkpointing. This chapter includes instructions for the following tasks: Monitoring and Controlling JobsYou can monitor and control submitted jobs in three ways:
The following sections describe each of these methods. Monitoring and Controlling Jobs With QMONQMON provides the Job Control dialog box that is specifically designed for controlling jobs. To monitor and control your submitted jobs, in the QMON Main Control window click the Job Control button. The Job Control dialog box appears. ![]() The Job Control dialog box has three tabs, a tab for Running Jobs, a tab for Pending Jobs that are waiting to be dispatched to an appropriate resource, and a tab for recently Finished Jobs. The Submit button provides a link to the Submit Job dialog box. The Job Control dialog box enables you to monitor all running, pending, and finished jobs that are known to the system. You can also use this dialog box to manage jobs. You can change a job's priority. You can also suspend, resume, and cancel jobs. In its default format, the Job Control dialog box displays the following columns for each running and pending job:
You can change the default display by customizing the format. See Customizing the Job Control Display for details. Refreshing the Job Control DisplayTo keep the displayed information up-to-date, QMON uses a polling scheme to retrieve the status of the jobs from sge_qmaster. Click Refresh to force an update of the Job Control display. Selecting JobsYou can select jobs with the following mouse and key combinations:
You can also use a filter to select the jobs that you want to display. See Filtering the Job List for details. Managing JobsYou can use the buttons at the right of the dialog box to manage selected jobs in the following ways:
Only the job owner or grid engine managers and operators can suspend and resume jobs, delete jobs, hold back jobs, modify job priority, and modify jobs. See Managers, Operators, and Owners. Only running jobs can be suspended or resumed. Only pending jobs can be rescheduled, held back and modified, in priority as well as in other attributes. Suspension of a job sends the signal SIGSTOP to the process group of the job with the UNIX kill command. SIGSTOP halts the job and no longer consumes CPU time. Resumption of the job sends the signal SIGCONT, thereby unsuspending the job. See the kill(1) man page for your system for more information on signalling processes. Note - You can force suspending, resuming, and deleting jobs. That is, you can register these actions with sge_qmaster without notifying the sge_execd that controls the jobs. Forcing is useful when the corresponding sge_execd is unreachable, for example, due to network problems. Select the Force check box for this purpose. | ||
| ||
![]() |