Embedded Support Partner ASCII console usage notes

The SGI Embedded Support Partner ASCII console is a set of means that provides access to the SGI Embedded Support Partner facilities for users running cursor-addressable, character-cell display devices (e.g., vt100 terminals, vt100 emulators, or any other "curses-oriented" displays).

In order to operate the Embedded Support Partner User Interface from such display device, the Lynx WEB browser must be used. It is expected that the executable file of Lynx browser will be installed into /usr/local/bin subdirectory. Please, refer to the Lynx's documentation about the installation of this browser.

Since there are significant differences between usage of graphics-based Web Browser (Netscape) and ascii-based Web Browser (Lynx) it is strongly recommended for a person who does not have previous experience of working with Lynx to refer to the documentation about general usage of this WEB Browser as well as intradocument and interdocument navigation.

Due to dynamic nature of the user interface it is essential to ensure that the HTML pages displayed by Lynx are current and have not been loaded from Lynx's cache.

To ensure that you have to follow a few simple rules:

  1. Use "x" (NOCACHE on keymap) to activate links on the page to ensure that this page is loaded from the server and not from the cache.
  2. If you used "Backspace" or "Delete" ("HISTORY" on keymap) to get history of the visited pages, you can must "x" (NOCACHE) to return to the page that you selected.
  3. If you used "PREV_DOC" key to return to the previous document and you need to refresh this page, hit "Backspace" then "x" to do the job.

Tip. Press "k" to get current key assignment.


SYSTEM INFORMATION > Introduction

The SYSTEM INFORMATION category provides information about the system on which the Single System Manager is running.

Use the commands in this category to display the following types of system information:

All reports in this category display general system information:
SYSTEM INFORMATION > Hardware

Use this command to display the hardware configuration of the system, which existed at a specific time on a specific date.

Hardware configuration information is available for the following systems:

If you are interested in hardware information for a specific date/time, enter the desired date/time in the appropriate field.

You must select a database that corresponds to the date that you specified.

The information is displayed in a hierarchical manner. If information is not available or not applicable, "N/A" is displayed.

The first column of the report table can include the following symbols: "[+]" or "[-]". Selecting "[+]" symbol expands the table to display the subcomponents that compose the selected component. Selecting "[-]" symbol collapses the subcomponent display.

The other columns of the table contain the following information:
       NAME             The name of the component
       LOCATION         The location of the component
       PART_NUMBER      The part number of the component
       SERIAL_NUMBER    The serial number of the component
       REVISION         The revision level of the component

SYSTEM INFORMATION > Software

Use this command to display the software configuration of the system and version information that existed at a specific time on a specific date.

If you are interested in software information for a specific date/time, enter the desired date/time in the appropriate field. Otherwise, the latest available information wil be displayed.

You must select the database that corresponds to the date that you specified.

This report lists the software that was installed on the system at the time you specify. The installed software is listed 10 items per page. Symbol ">" lists the next 10 pages, ">>" goes to the last page. Symbol "<" lists the previous 10 pages, and "<<" returns to the first page.

The report table provides the following information:

        NAME             The name of the software
        VERSION          The version number of the software
        INSTALL_DATE     The date on which the software was installed
        DESCRIPTION      A description of the software

SYSTEM INFORMATION > System Changes

Use this command to view any system changes that occurred within the range of dates that you specify.

If you do not specify a date, all system configuration changes are displayed.

You must select the database that corresponds to the dates that you specified.

System change information can be collected from only one database at a time.

The SGI Embedded Support Partner tracks the following types of system changes:

The software table describes all software changes that occurred during the period of time that you specified. The table provides the following information:

        NAME           The name of the software
        VERSION        The version number of the software 
        INSTALL_DATE   The date on which the software was installed
        DEINSTALL_DATE The date on which the software was deinstalled
        DESCRIPTION    A description of the software

The hardware table describes all hardware changes that occurred during the period of time that you specified. The table provides the following information:

        NAME           The name of the part
        LOCATION       The location of the part
        PART_NUMBER    The part number for the part
        SERIAL_NUMBER  The serial number of the part
        REVISION       The revision level of the part
        INSTALL_TIME   The date on which the component was installed
        DEINSTALL_TIME The date on which the component was deinstalled.

The system changes table describes all system changes (for example, hostname, IP address change, and so on) that occurred during the period of time that you specified.


SYSTEM INFORMATION > Part Changes

Use this command to view the transaction history of a part.

You must enter the component serial number. (If necessary, use the SYSTEM Information > Hardware to locate a serial number.)

You must choose a database to view the history of the component whose serial number you entered above.

The report table lists the name of the component, the module number in which the component was installed, the part number of the component, the serial number of the module, the revision number of the part, and the slot number in which the component was installed.


SYSTEM INFORMATION > Events Registered

Use this command to view information about events that SGI Embedded Support Partner has registered.

Enter a range of dates for the events that you want to view. Then, choose the type of event information that you want to view. The following options are available:

    All System Events
    Specific System Event    
    System Events by Class   

All System Events

The report table provides the following information about events that were registered within the selected range of dates:

    Event Class               The class in which the event belongs 
                              (for example, Availability)
                              
    Event Description         A brief description of the event
    
    Event ID                  The unique identification number assigned 
                              to this event. You can use this number to 
                              find this event via SYSLOG
                              
    First Occurrence          The date and time that the event first occurred
    
    Last Occurrence           The date and time that the event last occurred. 
                              If Number of Occurrences is 1, the time value of 
                              the First Occurrence and the time value of the 
                              Last Occurrence will be identical
                              
    Number of Occurrences     The number of times that the event occurred. 
                              This number corresponds to the number of events 
                              that must occur before registration begins. 
                              By default, this number is 1.

Specific System Event

Use this report to track a specific event that is associated with an actual or suspected system problem. Choose an event class from the list that appears.

Use this page to specify the event that you want to view. Choose the event from the list of events in the class that you have already specified.

The report table provides the following information about the event registrations between the selected range of dates:

    First Occurrence         The date and time that the event first occurred
    Last Occurrence          The date and time that the event last occurred. 
                             If Number of Occurrences is 1, the time value of 
                             the First Occurrence and thetime value of the 
                             Last Occurrence will be identical.
                          
    Number of Events         The number of times that the event occurred. 
                             This number corresponds to the number of events 
                             that must occur before registration begins. 
                             By default, this number is 1.

System Events by Class

Use this report when you need information about events that are associated with a specific class. For example, use Memory class to track various memory events. Choose the appropriate class for the event that you want to view.

The report table provides the following information about events that were registered between the selected range of dates:

    Event Description        A brief description of the event
    
    Event ID                 The unique identification number 
                             assigned to this event
                             
    First Event Occurrence   The date and time that the event first occurred
    
    Last Event Occurrence    The date and time that the event last occurred. 
                             If Number of Occurrences is 1, the time value of 
                             the First Occurrence and the time value of the 
                             Last Occurrence will be identical
                             
    Number of Events         The number of times that the event occurred. 
                             This number corresponds to the number of events 
                             that must occur before registration begins. 
                             By default, this number is 1.

SYSTEM INFORMATION > Actions Taken

Use this command to display information about actions that have been performed by SGI Embedded Support Partner.

Specify the range of dates for which you want to report actions taken. If you do not enter a date, this option defaults to the current date.

You must choose one of the two available types of reports:

     All Actions Taken     
     Actions Taken for a Specific Event 

All Actions Taken

This option displays the actions that the SGI Embedded Support Partner performed within the range of dates that you specified. The report table provides the following information about actions that were taken for all events between the selected range of dates:

    Event Class          The class in which the event belongs 
                         for example, Availability)
                                      
    Event Description    A brief description of the event
    
    Event ID             The unique identification number 
                         assigned to this event
                                      
    Action Description   A brief description of the action
    
    Action Taken         The action that SGI Embedded Support Partner 
                         performed in response to the event
                                      
    Time of Action       The date and time that SGI Embedded Support Partner 
                         performed the action

Actions Taken for a Specific Event

Use this option when you want to view actions taken for specific events. Choose an event class that contains the event that you want to select.

From the list of events, choose the event that you want to research.

The report table provides the following information about actions that were taken for the specified event between the selected range of dates:

    Action Description   A brief description of the action
    
    Action Taken         The action that the SGI Embedded Support Partner 
                         performed in response to the event
                            
    Time of Action       The date and time that SGI Embedded Support Partner 
                         performed the action


Diagnostics Results

This command displays the results of the diagnostics that you run on the system.

You must specify the range of dates for which you want to view diagnostics results.

The top portion of the diagnostic report contains the information that pertains to the system from which you requested the report.

The diagnostics results table provides the following information for all diagnostics that were run on the system during the period of time that you specified:

    Diagnostic Name      Contains the name of diagnostic. 
	                 In cases where multiple tests run as a group under 
                         one program (for example, under SVP), the total 
                         number of tests is indicated in parentheses next 
                         to the name of the diagnostic:

	                    SVP (86) means that 86 tests ran under 
                                     the SVP program.

    Diagnostic Status    Diagnostic status can be PASS, FAIL or COMPLETE.

	                 PASS      indicates that the diagnostic completed 
                                   successfully
		         FAIL      indicates a failure occurred

		         COMPLETE  indicates that multiple tests ran, and 
                                   one or more of them failed and others 
                                   completed successfully

    Diagnostic Result  
    Time                 The time when the diagnostic test completed. 
                         When multiple tests run under one program, the
	                 Diagnostic Result Time indicates the time when 
                         the entire program completed.


SYSTEM INFORMATION > Availability

This command displays system availability statistics. The upper portion of this page displays the total availability percentage and the mean time between interrupts (MTBI) in minutes.

You must specify the range of dates and type of availability information that you want to view. Two types of availability information are curently available:

    Overall Availability
    Availability Events List

Overall Availability

The Overall Availability covers the aggregation of events for the given system. Events are grouped as either "Unscheduled" or "Service Action" (controlled shutdown) events. Events are further classified by categories within these two groups. For each category, overall availability report includes the count of events in that category, the total downtime (in minutes), the MTBI (mean time between interrupts, in minutes) and the availability as a percentage. MTBI and availability per category are computed for events within the category as applied to the entire time period of the report. Count, total downtime, MTBI, and availability are also displayed for the two groups, as well as the final total of all the events.

The average, least, and most uptimes and downtimes are also included in the report in addition to logging start time and the duration of system uptime since the last boot.

The Overall Availability table summarizes the overall availability of the system:

Use the Event Availability Information link at the bottom of the page to access information about the individual availability events that the system has registered.

Event Availability Information

In the events list display, the fields shown are Start time (when the system was previously booted), the Incident Time, when the event occurred, the uptime and downtime in minutes, and a very brief description of the event type or cause of the event. The Summary displays the event information with more details, including a complete event type description.

The report provides a summary of an event that includes the following information:

If a system panic occurs, this report also includes a brief summary of why the system panicked.


Setup > Introduction

Embedded Support Partner is a configuration driven system.  From this section, you can setup SGI Embedded Support Partner to suit your specific needs. On the left is a menu consisting of various items organized in groups each of which belongs to a specific component of SGI Embedded Support Partner. A brief description of these components is given below. A context sensitive help is also available for all applicable menu items and can be viewed by selecting 'Help' button on the top right-hand corner of the menu item. You can always view the current settings by selecting 'View Current Setup' item for any of the components.
 

Caution must be observed while changing any of the settings.  If you are in doubt, please read the help carefully before committing any changes.  You can also refer to SGI Embedded Support Partner User Guide for more information.

SETUP > Global > Server

This command configures the Web server that SGI Embedded Support Partner uses. Use this command to perform the following functions:

The upper portion of this page displays the following information:

    Server Identification      The name of the Web server software in use
    
    Server Version             The version level of the Web server 
                               software and its installation date
                               
    Server Port                The Web server connection port in use

The lower portion of this page displays the following selectable options:

    Server Access Permissions      Enables or restricts access 
                                   by external systems
                                   
    Name & Password Change         Enables you to change the current 
                                   username and password

Server Access Permissions

Use this page to specify which systems can access the SGI Embedded Support Partner Web server. Any change that you make to the server access list takes effect immediately.

You can specify the exact IP address or IP address mask using a wildcard. For example, 197.23.14.5, or 135.*.*.5, or *.*.*.*, and so on.

User Name and Password Change

Use this page to change a current username or password that enables access to SGI Embedded Support Partner. Any change that you make to a username or password takes effect immediately.

The username and password must each contain between 1 and 128 characters. Characters like "*", "&", and ":" are not allowed in the username and password strings.

The default username administrator and the default password partner must be changed immediately after installation.


SETUP > Global > Global Configuration

An event is a happening or an occurrence that takes place on the system that SGI Embedded Support Partner is monitoring. A few examples of events follow: parity errors, disk full, nonmaskable interrupts (NMI), and even activities of the SGI Embedded Support Partner itself.

Use this page if you want to reset the following parameters for all events on the system.

Note: The Global Configuration setting will override individual event setting.


SETUP > Events > View Current Setup

Because the number of events can be extensive, events are divided into sets called classes. This scheme simplifies the management of events, enables more efficient use of displays, and facilitates navigation within the program.

The following options are available:

View Event

Use this option to determine the current setting of an individual event. This option allows you to view:

View Event List

Use this option when you want to obtain a list of all events compatible with the SGI Embedded Support Partner. The report allows you to view:

View Classes

Use this option when you want to view all classes available on the system. The report allows you to view:


SETUP > Events > Update

Use this command to update (change settings for ) an existing event. Only one event at a time can be updated using Ascii console.

SETUP > Events > Update > Change Settings

1. Set checkmark to enable the registration of chosen event with SGI Embedded Support Partner. Remove checkmark to disable the registration of chosen event with SGI Embedded Support Partner.

2. Enter the number of events that must occur before registration begins.

3. Select Accept button to set your changes.

4. Select Change Action Settings link to change the action(s) that will be taken upon the occurrence of the chosen event.

5. Select Return to Update > Select Event page link to select another event.

SETUP > Events > Update Actions

An event/action assignment defines the action that the SGI Embedded Support Partner performs when it registers a specific event. An event/action is a cause-and-effect relationship between an event and an ensuing action. Use this command to modify an event/action assignment; that is, to replace, add, or delete event/action assignments.

In order to Update event/action relationship you must:

1. Select the event for which you want to update the action assignment.

2. Select Change Action Settings link on SETUP > Events > Update > Change Settings page. The list of actions that are curently available will be displayed.

3. Select actions that you want to be assign to chosen event.

4. Select Accept button to assign selected actions.

5. Select Return to Update Event page link to return to SETUP > Events > Update > Change Settings page


SETUP > Events > Add

Use this command to add new events for the SGI Embedded Support Partner to monitor.

    
    To add the new event you must:
    
      1. Using provided listbox, specifies the existing class 
         to which you want to add the new event 
         
         OR 
         
         Set checkmark, if you want to create the new class for this event, 
         and enter a new class name in the next input field.
          
         Note. The checkmark must be removed in order to add 
               the new event into an existing class.
         
      2. Enter a name for the new event
    
      3. Specifies a description of the event that is shown in the interface
      
      4. Set checkmark to enable the registration of this event with 
         SGI Embedded Support Partner      
       
      5. Enter the number of events that must occur before registration begins
      
      6. Press Accept button to add the new event 
      
         OR 
         
         Press Clear button to clear fields and start from the beginning.

SETUP > Events > Delete Custom Events

Use this command to delete custom event(s) from the SGI Embedded Support Partner. All records and information associated with these classes/events will also be deleted. Empty classes will be automatically deleted.

    In order to select event(s) to be deleted you must
    
    Press Show all custom events button 
    to display the list of all custom events
 
    OR 
 
    Choose the event class and 
    press Show custom events for selected class button
    to display the list of all custom events for selected class.
 
    Set checkmarks for the event(s) that you want to delete.
    
    Press Delete Selected Events button.

SETUP > Actions > View Current Setup

Use this command to view the current configuration of actions. The following options are available:

     View Action Setup               Displays the configuration information 
                                     for a specific action
                                     
     View Available Actions List     Displays a table of all actions 
                                     that are currently available

View Action Setup

You must choose an action whose information you want to view.

This option allows you to view the following action information:

View Available Actions List

This report displays all actions that are currently available. The table includes the following information:


SETUP > Actions > Update

Use this command to update an existing action.

Select an action that you want to update. You can modify all of the action parameters, except the action description:

   Actual action command string     Specifies the command that action executes
    
   A username to execute the action Specifies the user account that the SGI 
   as (Default = nobody)            Embedded Support uses to execute the 
                                    command
                                          
                                          
   Action timeout                   Specifies the time period for which the 
                                    action can run without being killed.
                                    The value that you specify must be a 
                                    multiple of 5. (Default = 600 seconds)
                                          
   The number of times that         Specifies how many times the event must be
   the event must be registered     registered before the SGI Embedded Support 
   before an action will be taken   Partner performs this action 
    
    
   The number of retry times        Specifies the number of times that the SGI 
                                    Embedded Support Partner attempts to 
                                    execute the action before it stops.
                                    The value cannot exceed 23; however, it is 
                                    not recommended to set it greater than 4.
   For example: 
     action to run is               diagnostic
    
     username to execute an action  nobody
    
     action timeout                 3600
    
     the number of times that       5
     the event must be registered 
     before an action will be taken  
    
     the number of retry times      2

This diagnostic will run after the event is registered in the SGI Embedded Support Partner database 5 times. It will be executed with nobody privileges. If the diagnostic is still running after an hour (3600 seconds), it will be killed and restarted a second time (retry times = 2).


SETUP > Actions > Add

Use this command to add a new action. The following options are available:

   Action description               Provides a description of the action.
                                    Example: page to John Dow
                                    
   Action command string            Specifies the exact action command 
                                    to execute.
                                    Example: /usr/bin/espnotify -p 1234567
                                    
   Username to execute the action   Specifies the user account that the SGI 
   as (default = nobody)            Embedded Support uses to execute the 
                                    command. (Default = nobody)
                                    
   Action timeout                   Specifies the time period for which the 
                                    action can run without being killed.
                                    The value that you specify must be a 
                                    multiple of 5. (Default = 600 seconds)
                                     
   The number of times an event     Specifies how many times the event must 
   must be registered before an     be registered before the SGI Embedded 
   action will be taken             Support Partner performs this action. 
            
   The number of retry times        Specifies the number of times that 
                                    the SGI Embedded Support Partner attempts 
                                    to execute the action before it stops.
                                    The value cannot exceed 23; however, 
                                    it is not recommended that you set it 
                                    greater than 4. 
   For example: 
     action to run is               diagnostic
    
     username to execute an action  nobody
    
     action timeout                 3600
    
     the number of times that       5
     the event must be registered 
     before an action will be taken  
    
     the number of retry times      2

This diagnostic will run after the event is registered in the SGI Embedded Support Partner database 5 times. It will be executed with nobody privileges. If the diagnostic is still running after an hour (3600 seconds), it will be killed and restarted a second time (retry times = 2).

Examples of notification options:

For more information regarding notification options, refer to the espnotify man page.

The following list includes the accepted user format strings and any action-specific options:

For example: /usr/bin/espnotify -D system_name.sgi.com:0.0 -c %D

This displays a window on the machine system_name.sgi.com. The window contains data that is significant to the event.


SETUP > Actions > Delete

Use this command to delete an action. Choose an action that you want to delete.

Note: The action will be deleted from the SGI Embedded Support Partner database. If this action is assigned to some events, the list of all affected events is displayed. You have a choice to cancel or proceed with deletion. Press Yes button to delete the action and remove the selected action from all events to which it is assigned. To cancel operation return back to previous page.


SETUP > Paging

Use espnotify action to deliver a text/numeric message to a pager by specifying appropriate command line options. You may obtain more information on espnotify by using the man espnotify command.

To work properly, paging has to be configured. The SGI Embedded Support Partner provides the User Interface to set required configuration parameters. All the parameters are written to /etc/qpage.cf file.

Paging requires that a modem be connected to the system to dial the paging service provider to deliver a page. The Modem/Admin section enables modem configuration. The Service section enables configuration of the parameters of the Paging Service Provider(s). Because the service provider normally identifies each individual pager by means of a pager ID (which does not have to be the pager Touch-tone number), a pager ID must be provided in order to deliver the page. The Pager section enables you to configure different pagers that are associated with the Service.


SETUP > Paging > View Current Setup

Use this command to display the current values of the paging parameters and the following types of information:


SETUP > Paging > Modem/admin

You can configure the following Modem setup parameters:

Modem name
Specifies a unique name that the SGI Embedded Support Partner uses to identify a modem. Entering an existing modem name will update the modem name. No spaces are allowed.

Modem device
Specifies the device to which the modem is connected (for example, /dev/ttya)

Modem initialization command
Specifies the command that the SGI Embedded Support Partner should use to initialize the modem before dialing the Service Provider. These initialization commands are modem specific and are available in your modem manual. For example, many paging services require that error correction be turned off on your modem. For some modems, this can be done by including &A0&K0&M0 in the modem initialization command

You can configure the following Administration Setup parameters:

    Administrator's e-mail address     Specifies the e-mail address of
                                       the person to contact if Paging
                                       fails to deliver a pager
    
    The time interval for retrying     Specifies the amount of time that the 
                                       espnotify should wait between retries

SETUP > Paging > Service

Use this command to set up information about a paging service.

You can configure the following parameters:

Service name
Specifies the unique name that the SGI Embedded Support Partner uses to identify paging service provider. Entering an existing service name will result update the service name. No spaces are allowed.

Device
Specifies the device (modem name) that the SGI Embedded Support Partner should use to dial the service provider. Use SETUP > Paging > Modem/Admin to set up any modems.

Maximum number of retries
Specifies the maximum number of times the SGI Embedded Support Partner should attempt to access this service before it quits trying.

Maximum length of the message
Specifies the maximum number of characters that can be sent using this service. This depends on your service provider.

Phone number of the paging service
Specifies the IXO/TAP telephone number of the Service Provider. Do not confuse your pager's Touch-tone telephone number with the service provider's IXO/TAP telephone number. They are never the same.

The telephone number should contain at least 7 numbers and should not include any spaces, "-", or other symbols.


SETUP > Paging > Pager

Use this command to set up a specific pager.

You can configure the following parameters:

    Pager Name       Specifies a unique name to identify this pager
    
    Pager ID         Specifies the ID that is used by your Paging 
                     service provider to identify the pager. 
                     The ID is not necessarily be the touch-tone
                     phone number that you dial to access the pager.
                     Please, contact your service provider to get 
                     this information.
    
    Service Name     Specifies the paging service (service name) to which
                     espnotify should deliver the page for this pager
                     Use the SETUP > Paging > Service 
                     to set up any paging services that you want to use

SETUP > Availability Monitoring

The Availability Monitoring is a set of tools that collectivly monitor and report the availability of system(s) and diagnosis of system crashes. Availability monitoring tools gather information from diagnostic programs like ICRASH, FRU Analyzer, SYSLOG and identify the cause of system shutdowns. The system configuration information comes from configmon, hinv and versions. Availability monitoring tools can report data to various locations based on the Availability MailList setting.


SETUP > Availability Monitoring > View Current Setup

Use this command to view the current values of the availability monitor parameters. It displays the following information:


SETUP > Availability Monitoring > Configuration

Use this command to set up the availability monitor component of the SGI Embedded Support Partner.

You can configure the following parameters:

    Automatic e-mail distribution                 (Enable or Disable)
    Specifies whether availability monitor should 
    automatically distribute reports by e-mail.
    
    Display of shutdown reason                    (Enable or Disable)
    Specifies whether availability monitor should 
    display the reason for a shutdown
    
    Include HINV information into e-mail          (Yes or No)
    Specifies whether availability monitor should 
    include HINV information in the diagnostic e-mail 
    messages that it generates.
    
    Capturing of important system messages        (Enable or Disable)
    Specifies whether availability monitor should 
    capture important system messages.
    
    Start uptime daemon                           (Yes or No)
    Specifies whether availability monitor should 
    start the uptime daemon
    
    Number of days between status updates         (0 - 300,Default-60)
    Availability monitor, using eventmond, periodically sends a status 
    report if the system is up for an extended period of time. This value 
    specifies the number of days after which a status report should be sent. 
    
    Interval in seconds between uptime check          (User specified)  
    Specifies the number of seconds that event monitor 
    should wait before it performs an uptime check on the system.
    (default = 300 seconds)

SETUP > Availability Monitor > e-mail List

Use this command to set up the e-mail lists for availability information reports.

You can set up e-mail lists for the following reports:

The availability report contains computed system availability metrics.

The diagnostic report includes all of the availability report data and diagnostic data for troubleshooting.


SETUP > Performance Monitoring > View Current Setup

All performance rules can be enabled or disabled via user interface.Use this command to display performance rules status.

The report table displays the following information:


SETUP > Performance Monitoring > Configuration

There is a set of rules available to set up for performance monitoring.

The table below provides a short description for each rule:

   cpu.context_switch          High aggregate context switch rate
Average number of context switches per CPU per second exceeded threshold over the past sample interval.

   cpu.excess_fpe                 Possible high floating point exception rate
This predicate attempts to detect processes generating very large numbers of floating point exceptions (FPEs). Characteristic of this situation is heavy system time coupled with low system call rates (exceptions are delivered through the kernel to the process, taking some system time, but no system call is serviced on the application's behalf).

   cpu.load_average            High 1-minute load average
The current 1-minute load average is higher than the larger of min_load and ( per_cpu_load times the number of CPUs ). The load average measures the number of processes that are running, runnable or soon to be runnable (i.e. in short term sleep).

   cpu.low_util                Low average processor utilization
The average processor utilization over all CPUs was below threshold percent during the last sample interval. This rule is effectively the opposite of cpu.util and is disabled by default - it is only useful in specialized environments where, for example, processing is batch oriented and low processor utilization is indicative of poor use of system resources. In such a situation the cpu.low_util rule should be enabled, and cpu.util disabled.

   cpu.syscall                 High aggregate system call rate
Average number of system calls per CPU per second exceeded threshold over the past sample interval.

   cpu.system                  Busy executing in system mode
Over the last sample interval, the average utilization per CPU was busy percent or more, and the ratio of system time to busy time exceeded threshold percent.

   cpu.util                    High average processor utilization
The average processor utilization over all CPUs exceeded threshold percent during the last sample interval.

   craylink.node_cb_errs       CrayLink checkbit errors on Origin node
For some Origin 2000 node, at least one checkbit error was observed on the node (CrayLink) interface and/or the I/O interface in the last sample interval. Use the command
$ pminfo -f hinv.map.node
to discover the abbreviated PCP names of the installed nodes and their corresponding full names in the /hw file system.

   craylink.router_cb_errs     CrayLink checkbit errors on Origin route
For some CrayLink router port, at least one checkbit error was observed in the last sample interval. Use the command
$ pminfo -f hinv.map.routerport
to discover the abbreviated PCP names of the installed router ports and their corresponding full names in the /hw file system.

   filesys.buffer_cache        Low buffer cache read hit ratio
Some filesystem read activity (at least min_lread Kbytes per second of logical reads), and the read hit ratio in the buffer cache is below threshold percent.Note: It is possible for the read hit ratio to be negative more phsical reads than logical reads) - this can be as a result of:

   filesys.dnlc_miss           High directory name cache miss rate
With at least min_lookup directory name cache (DNLC) lookups per second being performed, threshold percent of lookups result in cache misses.

   filesys.filling             File system is filling up
Filesystem is at least threshold percent full and the used space is growing at a rate that would see the file system full within lead_time.

   memory.exhausted            Severe demand for real memory
The system is swapping modified pages out of main memory to the swap partitions, and has been doing this at the rate of at least threshold pages swapped out per second for at least pct of the last 10 samples, ie. sustained page out activity.

   memory.swap_low             Low free swap space
There is only threshold percent swap space remaining - the system may soon run out of virtual memory. Reduce the number and size of the running programs or add more swap(1) space before it completely runs out.

   network.buffers             Serious demand for network buffers
During the last sample interval the rate at which processes tried to acquire network buffers (mbufs) and either failed or were stalled waiting for a buffer to be freed is greater than threshold times per second.

   network.tcp_drop_connects   High ratio of TCP connections dropped
There is some TCP connection activity (at least min_close connections closed per minute) and the ratio of TCP dropped connections to all closed connections exceeds threshold percent during the last sample interval. High drop rates indicate either network congestion (check the packet retransmission rate) or an application like a Web browser that is prone to terminating TCP connections prematurely, perhaps due to sluggish response or user impatience.

   network.tcp_retransmit      High number of TCP packet retransmissions
There is some network output activity (at least 100 TCP packets per second) and the average ratio of retransmitted TCP packets to output TCP packets exceeds threshold percent during the last sample interval. High retransmission rates are suggestive of network congestion, or long latency between the end-points of the TCP connections.

   per_cpu.context_switch      High per CPU context switch rate
The number of context switches per second for at least one CPU exceeded threshold over the past sample interval. This rule only applies to multi-processor systems, for single-processor systems refer to the cpu.context_switch rule. For Origin 200 and Origin 2000 systems, use the command
$ pminfo -f hinv.map.cpu
to discover the abbreviated PCP names of the installed CPUs and their corresponding full names in the /hw file system.

   per_cpu.many_util           High number of saturated processors
The processor utilization for at least pct percent of the CPUs exceeded threshold percent during the last sample interval. Only applies to multi-processor systems having more than min_cpu_count processors - for single-processor systems refer to the cpu.util rule, for multi-processor systems with less than min_cpu_count processors refer to the per_cpu.some_util rule.

   per_cpu.some_util           High per CPU processor utilization
The processor utilization for at least one CPU exceeded threshold percent during the last sample interval. Only applies to multi-processor systems with less than max_cpu_count processors - for single-processor systems refer to the cpu.util rule, and for multi-processor systems with more than max_cpu_count processors refer to the cpu.many_util rule. For Origin 200 and Origin 2000 systems, use the command
$ pminfo -f hinv.map.cpu
to discover the abbreviated PCP names of the installed CPUs and their corresponding full names in the /hw file system.

   per_cpu.syscall             High per CPU system call rate
The number of system calls per second for at least one CPU exceeded threshold over the past sample interval. This rule only applies to multi-processor systems, for single-processor systems refer to the cpu.syscall rule. For Origin 200 and Origin 2000 systems, use the command
$ pminfo -f hinv.map.cpu
to discover the abbreviated PCP names of the installed CPUs and their corresponding full names in the /hw file system.

   per_cpu.system              Some CPU busy executing in system mode
Over the last sample interval, at least one CPU was active for busy percent or more, and the ratio of system time to busy time exceeded threshold percent. Only applies to multi-processor systems, for single-processor systems refer to the cpu.system rule. For Origin 200 and Origin 2000 systems, use the command
$ pminfo -f hinv.map.cpu
to discover the abbreviated PCP names of the installed CPUs and their corresponding full names in the /hw file system.

   per_disk.util               High per spindle disk utilization
For at least one spindle, disk utilization exceeded threshold percent during the last sample interval.

   per_netif.collisions        High collision rate in packet sends
More than threshold percent of the packets being sent across an interface are causing a collision, and packets are being sent across the interface at packet_rate packets per second. Ethernet interfaces expect a certain number of packet collisions, but a high ratio of collisions to packet sends is indicitive of a saturated network.

   per_netif.errors            High network interface error rate
For at least one network interface, the error rate exceeded threshold errors per second during the last sample interval.

   per_netif.packets           High network interface packet transfers
For at least one network interface, the average rate of packet transfers (in and/or out) exceeded the threshold during the last sample interval. This rule is disabled by default because the per_netif.util rule is more generally useful as it takes into consideration each network interfaces' reported bandwidth. However, there are some situations in which this value is zero, in which case an absolute threshold-based rule like this one will make more sense (for this reason it should typically be applied to some network interfaces, but not others - use the "interfaces" variable to filter this).

   per_netif.util             High network interface utilization
For at least one network interface, the average transfer rate (in and/or out) exceeded threshold percent of the peak bandwidth of the interface during the last sample interval.

   rpc.bad_network            RPC network transmission failure
More than threshold percent of sent client remote procedure call (RPC) packets are timing out before the server responds and the number of timeouts is significantly more than the number of duplicate packets being received (indicating lost packets). The networked file system (NFS) utilizes the RPC protocol for its client-server communication needs. This high failure rate when sending RPC packets may be due to faulty network hardware or inappropriately sized NFS packets (packets possibly too large).

   rpc.slow_response         RPC server response is slow
More than threshold percent of sent client remote procedure call (RPC) packets are timing out before the server responds and the number of timeouts is roughly equivalent to the number of duplicate packets being received. The network file system (NFS) utilizes the RPC protocol for its client-server communication needs. This high timeout rate when sending RPC packets may be because the NFS server is processing duplicate requests from the clients which were sent after the original requests timed out.

   espping.response           System Group Manager slow service response
A service being monitored by the SGI Embedded Support Partner Group Manager has taken more than threshold milliseconds to complete, during the last sample interval. The hosts parameter specifies hosts running the ssping PMDA, not hosts being monitored by this PMDA. The latter are encoded in the "instances" for each espping PMDA metric - run
$ pminfo -f espping.cmd
to list the instances and values for the espping.cmd metric.

   espping.status             System Group Manager service probe failure
A service being monitored by the SGI Embedded Support Partner Group Manager has either failed, or not responded within a timeout period (as defined by espping.control.timeout) during the last sample interval. The hosts parameter specifies hosts running the espping PMDA, not hosts being monitored by this PMDA. The latter are encoded in the "instances" for each espping PMDA metric - run
$ pminfo -f espping.cmd
to list the instances and values for the espping.cmd metric.

Archive Database

Use the Archive Database command to delete a previously archived database or to get instructions for archiving.