Chapter 7
Thresholds
A threshold is a limit (high or low) placed on a specific monitored metric. When a limit is exceeded for more than a specified number of sampling intervals (its tolerance), that threshold is crossed.
For example, you could set a threshold of 5% maximum CPU time on system processes on all nodes, and give the threshold a tolerance of three. Then, if a node had more than 5% of its CPU time used for system processes for more than 3 consecutive sampling intervals, that threshold would be crossed.
You can set thresholds to notify you when they are crossed. The Threshold Notifications dialog box is the default method of notification and provides you with detailed information.
Caution:
Executing resource-intensive commands when a threshold is crossed causes the system load to increase. The increased load can cause more frequent threshold crossings, and in some cases, the threshold crossings are due solely to command execution. This can result in an excessive and continually growing system load.
To avoid this situation, increase the tolerance for the expression being monitored. The command will not execute until the threshold is crossed the number of times specified by the tolerance level.
Some other examples of thresholds:
-
A node's I/O Queue exceeds a dozen processes for more than 10 consecutive sampling intervals.
-
A node's Disk Transfers exceed 25/second for more than 5 consecutive sampling intervals.
-
A node's Total Bad IP Packets exceed zero in any sampling interval.
When a threshold is crossed, the following occurs:
- The event is logged (written
in the Performance Manager log file: /var/opt/pm/log/pmgr_gui.log).
-
A command (if specified) is run. Performance Manager has a number of commands built in, but it is also extensible. You or your system administrator can create your own commands. This command can do anything from sending you mail about the problem, to taking steps to fix the problem.
The session window displays threshold data along with monitoring data. The displays are managed in the same way, and the type is designated at the beginning of the title bar with a D for displays and a T for thresholds.
Threshold Notifications
The Threshold Notifications dialog box has a list view of threshold activity and a reporting window for information on selected thresholds. There are three action buttons:
-
Back
-- Returns you to the previous threshold.
-
Next
-- Moves to the next threshold.
- Display
-- Switches to the display mode.
Setting Thresholds
Follow this procedure to set a threshold:
- Select a
node, cluster, or
group in the main window's node area.
-
Click on the
Threshold
button in the work area.
- Select a
metric category.
-
Select the specific metrics for monitoring from the list.
-
Set the
value
of the threshold.
- Set the
rearm point. The rearm point occurs when the metric drops a specified
amount below the threshold. If it recrosses the threshold after rearming,
another alert will be sent.
These are the metric categories displayed by default in the threshold work area:
Selecting the More button for a specific
metric opens another dialog box for advanced settings (notification
methods and additional information).
|
|
|
CPU Thresholds
You can set the thresholds on the following CPU metrics:
-
Average Job Loads over Last 5 Seconds
|
-
Percentage of CPU Time in User State
|
-
Average Job Loads over Last 30 Seconds
|
-
Percentage of CPU Time in System State
|
-
Average Job Loads over Last 60 Seconds
|
-
Percentage of CPU Time in Idle State
|
System Thresholds
You can set thresholds for the following system metrics:
|
-
Rate of Device Interrupts
|
Processes Thresholds
You can set thresholds for the following processes metrics:
-
Percentage of CPU Use by Top Processes
|
-
Percentage of CPU Use by Top Users
|
Buffer Cache Thresholds
You can set thresholds for the following buffer cache metrics:
-
Percentage of Read Misses
Network Thresholds
You can set thresholds for the following network metrics:
-
Percentage of Timeouts for Calls
|
-
Rate of IP Datagrams Discarded
|
-
Rate of Ethernet Collisions
|
|
-
Percentage of Erroneous Outbound Packets
|
|
-
Percentage of Erroneous Inbound Packets
|
|
File System Thresholds
You can set thresholds for the following file system metrics:
-
Percentage of Available File Space
|
-
Percentage of Free Inodes
|
Memory Thresholds
You can set thresholds for the following memory metrics:
-
Percentage of Free Paging Memory
|
|
|
-
Rate of Processes Swapped Out
|
|
-
Percentage of Free Swap Space
|
AdvFS Thresholds
You can set thresholds for the following AdvFS metrics:
|
-
Percentage of Free Space in Fileset
|
-
Percentage of Free Space in AdvFS Domains
|
-
Percentage of Free Space in Domain Volume
|
-
Percentage of Free Space in Domain
|
|
TruCluster Thresholds
You can set thresholds for the following TruCluster metrics:
Environmental Thresholds
You can set thresholds for the following environmental metrics:
Advanced Threshold Dialog
(more...) Box
The advanced threshold (more...) dialog box has two sections. Use them
for these tasks:
Threshold
Notification Methods
-
Choose one or more notification methods by
clicking the checkbox on.
-
Threshold Notification Dialog Box (default
selection). This displays a dialog box on your screen when a threshold
is crossed.
-
Send Email to: Type an address in this field.
-
Execute: Command - Set the Execute toggle. Choose Command to open a pull-down list of command
categories, then choose a command from the
submenu to open a command execution dialog box.
-
Use the Notification Message text entry field to create your own
notification message.
AdditionalTthreshold
Information
-
Set the
tolerance for this threshold. This is the number of consecutive threshold
crossings permitted before a violation is reported.
-
Set the interval for this threshold. This
is the sampling rate, or time specified between samples.
Click on OK to save setting and return to
the main window, click on Reset to return
the settings to their defaults, and click on Cancel close the dialog box without saving the
settings.
Threshold Environment Variables
These environment variables are set up internally to retrieve
threshold information from commands that you create. For example, the
./var/opt/pm/Smscripts/pm_mailer script
sends detailed mail about the crossed threshold that uses this
information. You can create your own shell script that accesses these
values using the $ symbol in front of the
variable, for example, $PMTHRESH DESCRIPTION. These variables are helpful in creating your own logging script that
tracks thresholds and rearms of Performance Manager's metrics.
Environment variable
|
Description
|
PMTHRESH_DESCRIPTION
|
Description of the expression in the database.
|
PMTHRESH_CURRENT_VALUE
|
Value that has triggered threshold.
|
PMTHRESH_THRESHOLD_VALUE
|
Value that had to be passed to trigger threshold.
|
PMTHRESH_NODE
|
Node on which triggered threshold was detected.
|
PMTHRESH_USER_MESSAGE
|
User message from Advanced Threshold Dialog box.
|
PMTHRESH_UPDATE_TIME
|
The update time value from the triggered expression.
|
PMTHRESH_REARM_VALUE
|
The value at which the threshold will be rearmed.
|
PMTHRESH_TOLERANCE_VALUE
|
The tolerance of the triggers.
|
PMTHRESH_STATE
|
Value is a string being either crossed or rearmed corresponding to the triggered event.
|
PMTHRESH_INSTANCE
|
Additional information about the triggered threshold, such as which file system or CPU crossed.
|
PMTHRESH_OPERATOR
|
Greater than or less than the threshold value.
|