Monitoring Cycle

Agent execution is controlled by the cron daemon on each server. The high-level steps of a monitoring cycle are as follows.

  1. Verify that the agent is idle.
  2. If the previous run of the agent has not finished, allow it to finish. Only one instance of the monitoring agent (/opt/SUNWstade/bin/rasagent) should be running at any one time.

  3. Load and execute all appropriate device modules used to generate instrumentation reports and generate health-related events.
  4. The system generates instrumentation reports by probing the device for all relevant information and saves this information in a report stored in /var/opt/SUNWstade/DATA. The system compares the report data to previous reports and evaluates the differences to determine if health-related events need to be generated.

    Events are also created by relaying information found in logfiles. For example, all errors and warnings found in /var/adm/messages.t3 will be translated into a Log Event event without further analysis. Most events are generated because a rule or policy in the software concluded that a problem exists, but if the storage array indicates issues in the syslog file, an event is immediately generated.

  5. Send any generated health-related event to the master agent if the events were generated by a slave agent, or, send the events to all interested parties if the event is generated by the master agent.
  6. The master agent is responsible for generating its own events and collecting events from the slaves. Events can also be aggregated on the master before dissemination.


    Note -

    Aggregated events and events that require action by service personnel (known as actionable events) are also referred to as alarms.


  7. Store instrumentation reports for future comparison.
  8. Event logs are accessible the Administration tab of the user interface. The Storage Automated Diagnostic Environment software updates the state database with the necessary statistics. Some events require that a certain threshold be attained before an event is generated. For example, having the CRC count of a switch port going up by one is not sufficient to trigger an event, since a certain threshold is required.

    The Storage Automated Diagnostic Environment supports email thresholds that prevent the generation of multiple emails about the same component of the same device. By keeping track of the number of events that were already sent in a specified timeframe, redundant email alerts can be prevented. Other Providers (non-email) do not support this feature.

  9. Send the events and/or alarms to the interested parties.
  10. Events are sent only to those recipients that have been set up for notification. The types of events can be filtered, so that only pertinent events are sent to individuals.


    Note -

    The email provider and the Sun Network Storage Command Center (NSCC, by way of the Sun Net Connect provider) receives notification of all events, if enabled.


Related Topics