Chapter 8 |
Rules |
This chapter covers the following topics:
Detection of alarm conditions and subsequent triggering of actions is the basic function of the Sun Management Center framework program. Alarm conditions are determined by:
Both mechanisms achieve the same purpose. To provide comprehensive alarm capabilities, Sun Management Center supports both mechanisms. However, this chapter focuses only on the Rule evaluation.
Rules can be considered an arbitrarily complex type of alarm check, which normally depends on other objects (that is, a rule is usually associated with a derived object.) The agent rules are implemented through an extension of alarm checking of derived objects. Thus, rules can be treated as another alarm check mechanism.
Currently no industry standard exists for rule syntax; thus, rules must be introduced into the agent based on how they are written. However, there is a consistent convention for specifying which rule is to be fired in the agent configuration file (module configuration file format). Storage of any state or persistent data required by a rule is provided in the object that invokes the rule.
This section covers the rules agent infrastructure.
Rules and derived objects are inherently related. For the Sun Management Center framework, a rule is implemented in a derived object; consequently, there is a one-to-one relationship between a rule and a derived object.
Each rule must be named as:
r<n>
For example: r231 represents rule 231.
Module designers can create custom rules that use a wide variety of alarm criteria. These custom rules can examine the value of the node to which the rule is attached, or the value or status of a different node. A special category of rules, referred to as log rules, can be triggered to fire whenever a message matching a specified regular expression appears in a log file.
Rules usually use parameters that are stored in separate files where they can be customized on a per-machine basis by site administrators. Certain rule parameters can be declared editable by end users through the Sun Management Center console.
Because a rule is considered a complex alarm check, it is natural to extend the existing agent alarm checking mechanism to encompass rules. A qualifier, <alarmRules>, specifies a particular rule for a given node. This variable is normally assigned to a node in the module Model file.
alarmRules = r231
W henrever a rule is used, no other alarm check is allowed. If the agent detects both an alarmChecks and a valid rule in the <alarmRules> variable for a given node, only <alarmRules> is used in status determination; the alarmChecks are ignored.
When an agent encounters alarmRules balarmRules, it invokes the rule directly using a ruleFire procedure, which is described later in this section.
Note - A log rule, described later in this chapter, can also be invoked by the file scan service through ruleFire, after the rule has been subscribed to that service (see the "Rules Attributes"" for a description of ruleFire).
If the node is a vector (that is, it can have more than one row), there is no change in the preceding discussion. However, internally, the rule specified in <alarmRules> is available to each row in the vector node and any data stored in slices is distinct for each rowName.
The program provides module specific rules, general rules or base rules, and rules created by clients.
Tcl rules associated with a particular module are placed within the file:
<module><-subspec>-d.rul
Parameter definitions and message text definitions for such rules go in the associated files:
<module><-subspec>-ruleinit-d.x <module><-subspec>-ruletext-d.x
Any custom module rule must be made available within the context of the node that requires it. Here is an example of what you enter into the model file to achieve this:
_rules = { [ use PROC ] [ source solaris-example-d.rul ] } system = { [ use MANAGED-OBJECT ] consoleUser = { [ use STRINGRULE MANAGED-PROPERTY _rules ] alarmRules = rUsrChk } }
In this example, the rUsrChk rule is associated with the consoleUser node. This rule determines the state of the consoleUser node.
Rules that are more general in nature, and that can be used in many different modules, should be placed in the base-rules.rul file. To reduce overhead, only those rules that must be globally accessible should be made base rules.
Parameter definitions and message text definitions for such rules go in the associated base-ruleinit-d.x and base-ruletext-d.x files.Rules placed in these files are automatically available within the context of all nodes and do not have to be sourced explicitly.
An example of how to attach a rule that is defined base-rules.rul is:
system = { [ use MANAGED-OBJECT ] consoleUser = { [ use STRINGRULE MANAGED-PROPERTY ] alarmRules = rGenRule } }
Note - Base text messages are not loaded into a _rules node; these messages are added directly to the appropriate file.
In the future, clients may wish to create their own custom rules. Such rules should be placed in the file:
user-rules.rul
Parameter definitions and message text definitions for such rules go in the associated files:
user-ruleinit-d.x user-ruletext-d.x
Placing client rules in these separate files, allows the client rules to be saved readily across new code releases.
Note - The purpose of these files is to segregate possible client customizations from the base distributed code. The wider issues of code management, configuration, and distribution are outside the scope of this document.
As in the case of base-rules.rul, rules placed in user-rules.rul are automatically available within the context of all nodes and do not have to be explicitly sourced. They are globally accessible.
Because a rule is connected to a node through the <alarmRules> variable, it is obvious that a rule must be associated with a node. However, some special circumstances require attention:
Each of these cases is described in the following sections.
An example of this multiple rule requirement can be taken from the rules. Rules rcr4u209, rcr4u212, and rcr4u300 all apply to memory SIMMs. In this case, if a module hierarchy has a node for a particular SIMM, for example, J3201, which has a leaf node for its status, these three rules cannot be associated with the leaf nodes <alarmRules> variable because only one rule is allowed.
The solution is to create three more leaf nodes as inferiors of J3201, with one rule per node. By doing so, hierarchical summarization of status up to the SIMM node is handled by the existing agent status propagation mechanism.
Alternatively, if possible, the three rules can be redesigned and collapsed into a single rule.
Several existing rules (for example, rules rknrd105,) do not drive any alarms (that is, they do not affect the alarm status of any node). Instead, they simply generate events. Normally, a rule is attached to the node that it alarms. In this case, a node must be created specifically for hosting the rule.
It would be wise to collect all such orphan nodes together into a single rules-only module.
This section describes the Tcl and C/C++ compiled rules.
Note - Wherever method names are mentioned, the method should be implemented in two forms: as a Tcl/TOE method and as a C/C++ function).
Rules have four types of variables (also referred to as attributes or parameters in this description):
Except for the first case of temporary variables, all other variables must be stored in a TOE slice, and accessed through the getRuleParm/setRuleParm (see "Rules Functions" for the Tcl/TOE implementation). Each node must have its own copy of whatever slices it requires whenever node-specific data is created:
Another slice, rulemsg, must be created once for each loaded module, under the node _rules. This slice must contain a key-value pair for every text message.
Note - The rulemsg key name does NOT require a <ruleId> prefix, and normally should NOT have a prefix, since messages can apply to multiple rules.
TABLE 8-2 Rule Message Key rulemsg key
Usage
<msgId>
Contains a string defining the specified message (for example, ir209msg)
Sun Management Center has no restriction on what data or how much data is saved by a rule between invocations. The name and usage of such data is strictly rule-specific (that is, up to the rule designer), and is accessed through the getRuleParm/setRuleParm methods (see "Rule Functions" for the Tcl/TOE implementation).
Note - The internal data is always available for every rule. It is maintained transparently by the underlying rule implementation). This data must never be modified by a rule designer.
The rule designer has read-only access to certain internal data as shown in the following table.
To generate alarms, a rule defines a CONDITION (in Tcl rules this is the Tcl script in the condition case) that is evaluated every time the rule is executed. If the condition is true, this is an active event.
An event is an alarm generated by a rule. An inactive event is equivalent to saying that there is no alarm generated for a particular rule. Events can transition through various states; the underlying rule determines these transitions. TABLE 8-4 lists the allowed event states and transitions.
The ruleFire procedure determines these states. The preceding section describes the state variables that persist between rule invocations.
Of these are two special operations that can be performed on an active (that is, open or continue state) event; these operations are ack and fix. Presently, neither operation causes an event state transition; thus, if a rule detects either operation in effect, it executes the corresponding Tcl script specified in the rule, and returns with an empty string ("").
The ack operation signifies a user action to acknowledge an event; the event remains active.
The fix operation signifies a user action to manually repair a hardware-related event; whether this affects the event state isto be determined.
Event Status |
Meaning |
ok |
Inactive event |
info[-<qualifier>] |
Informational event |
warning[-<qualifier>] |
Warning event |
error[-<qualifier>] |
Error event |
Note - The mandatory portion of the return string (ok, info, warning, and error) must be used by the console to determine the icon.
The optional <qualifier> allows additional descriptive text to be appended. The qualifier can be used to differentiate events (for example, error-temp, error-parity). The qualifier is a maximum of eight characters.
Examples of valid return strings:
info warning-rx
This section lists the methods that Tcl rules can call. The following table summarizes all methods that rules can call directly.
Rules are usually implemented in the form of Tcl procedures. If necessary, for performance reasons, rules can also be created as C or C++ code.
As an example, an agent configuration file in module configuration file format loads Tcl procedures from file pfrules-d.prc:
# # Load pfrules procedures # _procedures = { [use PROC ] [ source pfrules-d.prc ] }
Any rules written in C or C++ must be loaded as packages.
As an example, an agent configuration file in module configuration file format loads package pkgrules.so:
# # Load rules package # [ requires package rules ]
The rules that actually are loaded into the agent must return a string indicating the detected rule state.
The actual rule to be invoked for a derived object must be assigned through the specification of the following refresh variables:
refreshTrigger = <node>[:<event>] refreshCommand = r<n> refreshService = _internal
The effect of the refreshCommand is to invoke the specified rule, which has already been loaded either as a Tcl procedure or a C/C++ package (see "Rule Syntax & Loading").
Function prototypes,typeddefs and so forth required to interface with TOE must be available in header file, sdk/include/toeInt.h. The convention for describing each function is to consider the first argument as arg1, the second argument as arg2, and so forth. The following TOE functions are listed to provide an indication of how a rule can access agent object data:
This section provides a guide to writing a Tcl rule by describing the following:
The principles in this section apply to C/C++ compiled rules as well, since the basic structure of a rule is simply a procedure with a switch statement to allow the appropriate code to be invoked to handle each case. A natural way to proceed is to port the Tcl methods introduced in this section to C/C++; these methods constitute the interface between a C/C++ rule and the agent. The C/C++ rules would be made available to the agent by creating Tcl packages.
Methods referenced in the example (for example, getRuleParm) are described in "Rule Functions"
.
All Tcl rule files must have an extension of `rul' (for example, config-reader-d.rul). A portion of a rule file for the running example in this section would look like this:
CODE EXAMPLE 8-2 Tcl rules File Format # File: config-reader-d.rul # # configReader rules # proc rcr4u209 {action ruleId {rowName ""} {rowIndex 0} \ {matchList ""}} { body of procedure goes here ... } proc rcr4u212 {action ruleId {rowName ""} {rowIndex 0} \ {matchList ""}} { body of procedure goes here ... } proc rcr4u300 {action ruleId {rowName ""} {rowIndex 0} \ {matchList ""}} { body of procedure goes here ... }
The template in this section shows the minimum elements that must exist in every Tcl rule. Examples of all required files, based on the running example of ConfigReader rule rcr4u209), follow.
The major steps required to create a rule are listed in the section below.
Guidelines
_rule = { rule:rXXX-editparm = alarmThresh deadband } |
Note - The listed editable parameters do not have the rule identifier as a prefix.
- For the console, the displayed text describing the editable parameter must be internationalized. To internationalize, a special static parameter, keypath, is used to specify the path to the module's Properties file. Normally, the properties file must be in the Sun Management Center software proto tree under classes:
/com/sun/symon/base/modules/<module><-subspec>.properties
- For example, to internationalize the text for the editable parameters, include the following in the <module><-subspec>-ruleinit-d.x file:
rule:rXXX-keypath = "base.modules.<module>"
- The module properties file must contain entries for the internationalized text for the module's editable parameters like these:
editAtt.rXXX.alarmThresh=<internationalized text for threshold> editAtt.rXXX.deadband=<internationalized text for deadband>
- You can also define datatypes for any of the editable parameters. The data types would then be enforced whenever an end-user modifies the parameters. Set an editable parameter datatype by including a line like the following in the <module><-subspec>-ruleinit-d.x file:
ruledatatype:rXXX-alarmThresh = "float"
- If no datatype is specified, the default is string. TABLE 8-8 lists the allowed datatypes:
TABLE 8-8 Datatypes Allowed Datatype
Description
int
integer
uint
unsigned integer
float
floating point number
string
character string (this is the default)
Note - The rule datatype definitions are same as the datatype definitions for the underlying OS.
editAtt.rXXX.desc=<internationalized text for rule description> editAtt.rXXX.paramsdesc=<internationalized text for rule parameters description> |
Note - The code for the condition case is used to determine if the underlying event is active; therefore, the last statement in the condition case shall evaluate to 0 or 1; this is returned to the caller (ruleFire).
CODE EXAMPLE 8-3 Template # Rule: <ruleId> # Purpose: # Arguments: # action - one of # {condition|init|open|continue|ack|fix|close} # ruleId - rule identifier # rowName - name of vector row (only valid for vector nodes); default="" # rowIndex - index of vector row (only valid for vector nodes); default=0 # matchList - set to matched substrings if this is a log rule and the # File Scan service is notifying the rule about matches found; default="" # # Notes: # The Tcl code for the various states may be empty, or the case left out # entirely, if there are no actions to be taken. # proc <ruleId> {action ruleId {rowName ""} {rowIndex 0} {matchList ""}} { # Set state transition actions # For a log rule, condition should always be true, since # the log rule executes only when triggered by callback from # the File Scan service upon matching the subscribed pattern switch $action { condition { # Make sure that the last statement in the condition Tcl script # evaluates to 0 for false, 1 for true; this is used to # determine if the underlying event is active (i.e true) or not. < code to evaluate the condition > return {0 | 1} } init { # Call logSubscribe $ruleId <log file> <rowName> <pattern> # <callback function> if this is a log rule < code for init > return } open { # Call setRuleText <ruleId> <rowName> <pattern> <statusmsg> # # <callback> <option> if this is a log rule, there are two choices now: # (a) Call logEvent to log the event, and return "ok" # (b) Return a state-qualifier ("error-gt", for example). Do not call # logEvent. The event will be logged through the normal open event # mechanism. The event will remain open in this case. < code for open > return <event status > } continue { # If the event status is to remain unchanged on continuation, simply # return the previous event status, or return "". # # If the event status is to change (for example, perhaps the event is # being escalating to "error" from "warning"), then: # # (a) call setRuleText to set the message text for the new state # (b) return the new state. # # If there is a previous open event, and you return a new event state, # the previous event will close automatically. # # Do not call setRuleText unless you are returning a new event state also. # # Note that if this is a log rule, and you let a previous file match # remain in the open state, subsequent file matches will be sent # as "continue" rather than open. In such cases, call "logEvent" # to log the new event (if desired). You can call "closeEvent" to # explicitly close the previous open event (if desired). < code for continue > return <event status> } ack { < code for ack > return } fix { < code for fix > return } close { < code for close > return } } }
The following code example includes a template.
When a module requires a rule, the appropriate rule file (for example, config-reader-d.rul) is sourced. Rule files must be located in the same directory as their corresponding module configuration files. The source command must be put in a container node (typically called _rules) in the module's Model file. The _rules node must be inherited by any node requiring access to the rules.
CODE EXAMPLE 8-4 Module Model File # File: config-reader-models-d.x # # Node _rules will contain the TOE slices for the ConfigReader # rule initialization data and text messages. _rules = { [ source config-reader-d.rul ] } availability = { mediumDesc = Available } simm = { [ use MANAGED-OBJECT ] r209 = { [ use RULE _rules MANAGED-PROPERTY ] mediumDesc = Rule 209 alarmRules = rcr4u209 } r212 = { [ use RULE _rules MANAGED-PROPERTY ] mediumDesc = Rule 212 alarmRules = rcr4u212 } r300 = { [ use RULE _rules MANAGED-PROPERTY ] mediumDesc = Rule 300 alarmRules = rcr4u300 } } disk = { } cpu = { } [ load config-reader-ruleinit-d.x ] [ load config-reader-ruletext-d.x ]
The Module Agent File
The ConfigReader module agent file, config-reader-d.x, might then look like this:
CODE EXAMPLE 8-5 Module Agent File # File: config-reader-d.x # # ConfigReader module configuration file # [ use MANAGED-MODULE ] [ load config-reader-m.x ] [ requires template config-reader-models-d ] availability = { refreshInterval = ... refreshService = ... refreshCommand = ... } memory = { SIMM(0) = { [ use templates.config-reader-models-d.simm ] } ... SIMM(31) = { [ use templates.config-reader-models-d.simm ] } } # loading alarmlimit defaults now. # config-reader-ruleinit-d.x and config-reader-ruletext-d.x # loaded in models file [ load config-reader-d.def ]
The initial values for static and editable parameters are assigned in the
<module><-subspec>-ruleinit-d.x file. Note that editable parameters are placed in the alarmlimit slice; all others go in the rule slice.When this file is loaded into an Agent, the initialized parameters are available to be read by any rule attached to any node. If an end user modifies an editable parameter, a local slice shall be created for the affected node to contain the customized value; other nodes will not be affected.
The datatype for an editable parameter can be specified in the <module><-subspec>-ruleinit-d.x file (see the example below). The default datatype is string.
Dynamic parameters are not initialized in the <module><-subspec>-ruleinit-d.x file. Such parameters are set directly within the rule logic, either as needed, or in the init action section. Dynamic parameters are stored in a local slice for the affected nodes; their values are not available to other nodes (even other nodes running the same rule).
A portion of the initialization file for the running example might be:
# File: config-reader-ruleinit-d.x # _rules = { rule:rcr4u209-group = hardware rule:rcr4u209-version = 0.1 rule:rXXX-group = example rule:rXXX-version = 0.1 rule:rXXX-editparm = "sample_thresh" alarmlimit:rXXX-sample_thresh = 0.10 ruledatatype:rXXX-sample_thresh = "float" rule:rXXX-keypath = "base.modules.configReader" }
Rules can use various text messages to convey status; these messages can be collected into a file for a particular module in order to centralize the messages. Note that messages can apply to more than one rule, so there need not be any rule identifier in the message name. Messages shall be assigned to the rulemsg slice.
The text messages shall be associated with node _rules.
A portion of the initialization file for the running example can be as follows:
# File: config-reader-ruletext-d.x # _rules = { rulemsg:rcr4u209msg = "%s: %s: Error. ECC Data bit %s was corrected" rulemsg:lowmsg = "has less than %s percent free space" }
For every event that occurs, a rule can create two different descriptive messages:
English Status Message
For example, if the rule specifies a status message such as the following:
has less than 2 percent free space
the actual message in the Alarm Manager console or the hierarchy is:
muskoka Solaris /export1 Filesystem has less than 2 percent free space
where:
host = muskokamodule = Solarisrowname = /export1mediumDesc = FilesystemThe message is set in within the rule logic in the open action as follows. If it is not set, it defaults to something like:
muskoka Solaris /export1 Filesystem rcr4u209 error
Internationalized Status Message
It is anticipated that the English status message will be supplemented in the future with an internationalized version of the status message. This internationalized status message will consist of a series of keywords that will be translated appropriately by the displaying console.
The internationalized messages supplement, not replace, the English status messages. That is, both types of messages will have to be created explicitly by each rule. In the much longer term, the English status messages may eventually be phased out entirely, in favor of the internationalized message.
Currently, rule designers should not attempt to format internationalized status strings.
CODE EXAMPLE 8-6 Simple Rule |
#Rule: rknrd402 #Purpose: #Checks if available swap space drops below 10% for two hours. #Storage of the last time CPU load was below 6 is maintained #between rule invocations. This parameter is initialized to #some date in the year 2001. #Note: #This rule should be attached to KernelReader.mem # proc rknrd402 { action ruleId {rowName ""} {rowIndex 0} {matchList ""} } { set estatus info # Fire state transition actions switch $action { condition { set value [ getExternalValue { $ruleId KernelReader.mem.swap_avail "" } ] set swapb [ getExternalValue { $ruleId KernelReader.mem.swap_total "" } ] set cur [ getTime ] if { ($value/$swapb} > [ getRuleParm $ruleId swap_thresh 0.10 ] } { setRuleParm DYN $ruleId $rowName user_timestamp $cur } set 1st [ getRuleParm DYN $ruleId $rowName user_timestamp 999999999 ] return [expr { ($cur-$1st)>14400 }] } open { set msg [ getRuleMsg rknrd402msg ] setRuleText $ruleId $rowName $estatus $msg add_cpa SWAP $cur trim_cpa return $estatus } }
See coding of rule rcr4u209 in this document.
In addition to the rule rcr4u209 example, here is another:
CODE EXAMPLE 8-7 Log Rule Rule: rknrd106 # Purpose: # Check for no swap space left. # # proc r106 { action ruleId {rowName ""} {rowIndex 0} {matchList ""} } { set estatus warning # Fire state transition actions switch $action { init { logSubscribe $ruleId /var/adm/messages "no swap space.*pid (\[0-9\]+)}"\ $rowName $ruleId return } open { set pid [ lindex $matchList 0 ] set rmsg [ getRuleMsg rknrd106msg ] set msg [ format "$rmsg" "$pid" ] setRuleText $ruleId $rowName $estatus $msg logEvent $ruleId $rowName $estatus return $estatus } }