Embedded Support Partner ASCII console usage notes
The SGI Embedded Support Partner ASCII console is a set of means that provides access to the SGI Embedded Support Partner facilities for users running cursor-addressable, character-cell display devices (e.g., vt100 terminals, vt100 emulators, or any other "curses-oriented" displays).
In order to operate the Embedded Support Partner User Interface from such display device, the Lynx WEB browser must be used. It is expected that the executable file of Lynx browser will be installed into /usr/local/bin subdirectory. Please, refer to the Lynx's documentation about the installation of this browser.
Since there are significant differences between usage of graphics-based Web Browser (Netscape) and ascii-based Web Browser (Lynx) it is strongly recommended for a person who does not have previous experience of working with Lynx to refer to the documentation about general usage of this WEB Browser as well as intradocument and interdocument navigation.
Due to dynamic nature of the user interface it is essential to ensure that the HTML pages displayed by Lynx are current and have not been loaded from Lynx's cache.
To ensure that you have to follow a few simple rules:
Tip. Press "k" to get current key assignment.
The SYSTEM INFORMATION category provides information about the system on which the Single System Manager is running.
Use the commands in this category to display the following types of system information:
Use this command to display the hardware configuration of the system, which existed at a specific time on a specific date.
Hardware configuration information is available for the following systems:
If you are interested in hardware information for a specific date/time, enter the desired date/time in the appropriate field.
You must select a database that corresponds to the date that you specified.
The information is displayed in a hierarchical manner. If information is not available or not applicable, "N/A" is displayed.
The first column of the report table can include the following symbols: "[+]" or "[-]". Selecting "[+]" symbol expands the table to display the subcomponents that compose the selected component. Selecting "[-]" symbol collapses the subcomponent display.
The other columns of the table contain the following information: NAME The name of the component
LOCATION The location of the component
PART_NUMBER The part number of the component
SERIAL_NUMBER The serial number of the component
REVISION The revision level of the component
Use this command to display the software configuration of the system and version information that existed at a specific time on a specific date.
If you are interested in software information for a specific date/time, enter the desired date/time in the appropriate field. Otherwise, the latest available information wil be displayed.
You must select the database that corresponds to the date that you specified.
This report lists the software that was installed on the system at the time you specify. The installed software is listed 10 items per page. Symbol ">" lists the next 10 pages, ">>" goes to the last page. Symbol "<" lists the previous 10 pages, and "<<" returns to the first page.
The report table provides the following information:
NAME The name of the software
VERSION The version number of the software
INSTALL_DATE The date on which the software was installed
DESCRIPTION A description of the software
Use this command to view any system changes that occurred within the range of dates that you specify.
If you do not specify a date, all system configuration changes are displayed.
You must select the database that corresponds to the dates that you specified.
System change information can be collected from only one database at a time.
The SGI Embedded Support Partner tracks the following types of system changes:
The software table describes all software changes that occurred during the period of time that you specified. The table provides the following information:
NAME The name of the software
VERSION The version number of the software
INSTALL_DATE The date on which the software was installed
DEINSTALL_DATE The date on which the software was deinstalled
DESCRIPTION A description of the software
The hardware table describes all hardware changes that occurred during the period of time that you specified. The table provides the following information:
NAME The name of the part
LOCATION The location of the part
PART_NUMBER The part number for the part
SERIAL_NUMBER The serial number of the part
REVISION The revision level of the part
INSTALL_TIME The date on which the component was installed
DEINSTALL_TIME The date on which the component was deinstalled.
The system changes table describes all system changes (for example, hostname, IP address change, and so on) that occurred during the period of time that you specified.
Use this command to view the transaction history of a part.
You must enter the component serial number. (If necessary, use the SYSTEM Information > Hardware to locate a serial number.)
You must choose a database to view the history of the component whose serial number you entered above.
The report table lists the name of the component, the module number in which the component was installed, the part number of the component, the serial number of the module, the revision number of the part, and the slot number in which the component was installed.
Use this command to view information about events that SGI Embedded Support Partner has registered.
Enter a range of dates for the events that you want to view. Then, choose the type of event information that you want to view. The following options are available:
All System Events
Specific System Event
System Events by Class
All System Events
The report table provides the following information about events that were registered within the selected range of dates:
Event Class The class in which the event belongs
(for example, Availability)
Event Description A brief description of the event
Event ID The unique identification number assigned
to this event. You can use this number to
find this event via SYSLOG
First Occurrence The date and time that the event first occurred
Last Occurrence The date and time that the event last occurred.
If Number of Occurrences is 1, the time value of
the First Occurrence and the time value of the
Last Occurrence will be identical
Number of Occurrences The number of times that the event occurred.
This number corresponds to the number of events
that must occur before registration begins.
By default, this number is 1.
Specific System Event
Use this report to track a specific event that is associated with an actual or suspected system problem. Choose an event class from the list that appears.
Use this page to specify the event that you want to view. Choose the event from the list of events in the class that you have already specified.
The report table provides the following information about the event registrations between the selected range of dates:
First Occurrence The date and time that the event first occurred
Last Occurrence The date and time that the event last occurred.
If Number of Occurrences is 1, the time value of
the First Occurrence and thetime value of the
Last Occurrence will be identical.
Number of Events The number of times that the event occurred.
This number corresponds to the number of events
that must occur before registration begins.
By default, this number is 1.
System Events by Class
Use this report when you need information about events that are associated with a specific class. For example, use Memory class to track various memory events. Choose the appropriate class for the event that you want to view.
The report table provides the following information about events that were registered between the selected range of dates:
Event Description A brief description of the event
Event ID The unique identification number
assigned to this event
First Event Occurrence The date and time that the event first occurred
Last Event Occurrence The date and time that the event last occurred.
If Number of Occurrences is 1, the time value of
the First Occurrence and the time value of the
Last Occurrence will be identical
Number of Events The number of times that the event occurred.
This number corresponds to the number of events
that must occur before registration begins.
By default, this number is 1.
Use this command to display information about actions that have been performed by SGI Embedded Support Partner.
Specify the range of dates for which you want to report actions taken. If you do not enter a date, this option defaults to the current date.
You must choose one of the two available types of reports:
All Actions Taken
Actions Taken for a Specific Event
All Actions Taken
This option displays the actions that the SGI Embedded Support Partner performed within the range of dates that you specified. The report table provides the following information about actions that were taken for all events between the selected range of dates:
Event Class The class in which the event belongs
for example, Availability)
Event Description A brief description of the event
Event ID The unique identification number
assigned to this event
Action Description A brief description of the action
Action Taken The action that SGI Embedded Support Partner
performed in response to the event
Time of Action The date and time that SGI Embedded Support Partner
performed the action
Actions Taken for a Specific Event
Use this option when you want to view actions taken for specific events. Choose an event class that contains the event that you want to select.
From the list of events, choose the event that you want to research.
The report table provides the following information about actions that were taken for the specified event between the selected range of dates:
Action Description A brief description of the action
Action Taken The action that the SGI Embedded Support Partner
performed in response to the event
Time of Action The date and time that SGI Embedded Support Partner
performed the action
This command displays the results of the diagnostics that you run on the system.
You must specify the range of dates for which you want to view diagnostics results.
The top portion of the diagnostic report contains the information that pertains to the system from which you requested the report.
The diagnostics results table provides the following information for all diagnostics that were run on the system during the period of time that you specified:
Diagnostic Name Contains the name of diagnostic.
In cases where multiple tests run as a group under
one program (for example, under SVP), the total
number of tests is indicated in parentheses next
to the name of the diagnostic:
SVP (86) means that 86 tests ran under
the SVP program.
Diagnostic Status Diagnostic status can be PASS, FAIL or COMPLETE.
PASS indicates that the diagnostic completed
successfully
FAIL indicates a failure occurred
COMPLETE indicates that multiple tests ran, and
one or more of them failed and others
completed successfully
Diagnostic Result
Time The time when the diagnostic test completed.
When multiple tests run under one program, the
Diagnostic Result Time indicates the time when
the entire program completed.
This command displays system availability statistics. The upper portion of this page displays the total availability percentage and the mean time between interrupts (MTBI) in minutes.
You must specify the range of dates and type of availability information that you want to view. Two types of availability information are curently available:
Overall Availability
Availability Events List
Overall Availability
The Overall Availability covers the aggregation of events for the given system. Events are grouped as either "Unscheduled" or "Service Action" (controlled shutdown) events. Events are further classified by categories within these two groups. For each category, overall availability report includes the count of events in that category, the total downtime (in minutes), the MTBI (mean time between interrupts, in minutes) and the availability as a percentage. MTBI and availability per category are computed for events within the category as applied to the entire time period of the report. Count, total downtime, MTBI, and availability are also displayed for the two groups, as well as the final total of all the events.
The average, least, and most uptimes and downtimes are also included in the report in addition to logging start time and the duration of system uptime since the last boot.
The Overall Availability table summarizes the overall availability of the system:
Use the Event Availability Information link at the bottom of the page to access information about the individual availability events that the system has registered.
Event Availability Information
In the events list display, the fields shown are Start time (when the system was previously booted), the Incident Time, when the event occurred, the uptime and downtime in minutes, and a very brief description of the event type or cause of the event. The Summary displays the event information with more details, including a complete event type description.
The report provides a summary of an event that includes the following information:
If a system panic occurs, this report also includes a brief summary of why the system panicked.
Setup > Introduction
Embedded Support Partner is a configuration driven system. From this
section, you can setup SGI Embedded Support Partner to suit your specific
needs. On the left is a menu consisting of various items organized
in groups each of which belongs to a specific component of SGI Embedded
Support Partner. A brief description of these components is given below.
A context sensitive help is also available for all applicable menu items
and can be viewed by selecting 'Help' button on the top right-hand corner
of the menu item. You can always view the current settings by selecting
'View Current Setup' item for any of the components.
Caution must be observed while changing any of the settings. If you are in doubt, please read the help carefully before committing any changes. You can also refer to SGI Embedded Support Partner User Guide for more information.
SETUP > Global > Server
This command configures the Web server that SGI Embedded Support Partner uses. Use this command to perform the following functions:
The upper portion of this page displays the following information:
Server Identification The name of the Web server software in use
Server Version The version level of the Web server
software and its installation date
Server Port The Web server connection port in use
The lower portion of this page displays the following selectable options:
Server Access Permissions Enables or restricts access
by external systems
Name & Password Change Enables you to change the current
username and password
Server Access Permissions
Use this page to specify which systems can access the SGI Embedded Support Partner Web server. Any change that you make to the server access list takes effect immediately.
You can specify the exact IP address or IP address mask using a wildcard. For example, 197.23.14.5, or 135.*.*.5, or *.*.*.*, and so on.
User Name and Password ChangeUse this page to change a current username or password that enables access to SGI Embedded Support Partner. Any change that you make to a username or password takes effect immediately.
The username and password must each contain between 1 and 128 characters. Characters like "*", "&", and ":" are not allowed in the username and password strings.
The default username administrator and the default password partner must be changed immediately after installation.
An event is a happening or an occurrence that takes place on the system that SGI Embedded Support Partner is monitoring. A few examples of events follow: parity errors, disk full, nonmaskable interrupts (NMI), and even activities of the SGI Embedded Support Partner itself.
Use this page if you want to reset the following parameters for all events on the system.
Note: Refer to the SETUP > Events and the SETUP > Actions menus for additional information about events and actions.
Note: The Global Configuration setting will override individual event setting.
Because the number of events can be extensive, events are divided into sets called classes. This scheme simplifies the management of events, enables more efficient use of displays, and facilitates navigation within the program.
The following options are available:
View Event
Use this option to determine the current setting of an individual event. This option allows you to view:
View Event List
Use this option when you want to obtain a list of all events compatible with the SGI Embedded Support Partner. The report allows you to view:
View Classes
Use this option when you want to view all classes available on the system. The report allows you to view:
Use this command to update (change settings for ) an existing event. Only one event at a time can be updated using Ascii console.
SETUP > Events > Update > Change Settings
1. Set checkmark to enable the registration of chosen event with SGI Embedded Support Partner. Remove checkmark to disable the registration of chosen event with SGI Embedded Support Partner.
2. Enter the number of events that must occur before registration begins.
3. Select Accept button to set your changes.
4. Select Change Action Settings link to change the action(s) that will be taken upon the occurrence of the chosen event.
5. Select Return to Update > Select Event page link to select another event.
SETUP > Events > Update Actions
An event/action assignment defines the action that the SGI Embedded Support Partner performs when it registers a specific event. An event/action is a cause-and-effect relationship between an event and an ensuing action. Use this command to modify an event/action assignment; that is, to replace, add, or delete event/action assignments.
In order to Update event/action relationship you must:
1. Select the event for which you want to update the action assignment.
2. Select Change Action Settings link on SETUP > Events > Update > Change Settings page. The list of actions that are curently available will be displayed.
3. Select actions that you want to be assign to chosen event.
4. Select Accept button to assign selected actions.
5. Select Return to Update Event page link to return to SETUP > Events > Update > Change Settings page
Use this command to add new events for the SGI Embedded Support Partner to monitor.
To add the new event you must:
1. Using provided listbox, specifies the existing class
to which you want to add the new event
OR
Set checkmark, if you want to create the new class for this event,
and enter a new class name in the next input field.
Note. The checkmark must be removed in order to add
the new event into an existing class.
2. Enter a name for the new event
3. Specifies a description of the event that is shown in the interface
4. Set checkmark to enable the registration of this event with
SGI Embedded Support Partner
5. Enter the number of events that must occur before registration begins
6. Press Accept button to add the new event
OR
Press Clear button to clear fields and start from the beginning.
SETUP > Events > Delete Custom Events
Use this command to delete custom event(s) from the SGI Embedded Support Partner. All records and information associated with these classes/events will also be deleted. Empty classes will be automatically deleted.
In order to select event(s) to be deleted you must
Press Show all custom events button
to display the list of all custom events
OR
Choose the event class and
press Show custom events for selected class button
to display the list of all custom events for selected class.
Set checkmarks for the event(s) that you want to delete.
Press Delete Selected Events button.
Use this command to view the current configuration of actions. The following options are available:
View Action Setup Displays the configuration information
for a specific action
View Available Actions List Displays a table of all actions
that are currently available
View Action Setup
You must choose an action whose information you want to view.
This option allows you to view the following action information:
View Available Actions List
This report displays all actions that are currently available. The table includes the following information:
Use this command to update an existing action.
Select an action that you want to update. You can modify all of the action parameters, except the action description:
Actual action command string Specifies the command that action executes
A username to execute the action Specifies the user account that the SGI
as (Default = nobody) Embedded Support uses to execute the
command
Action timeout Specifies the time period for which the
action can run without being killed.
The value that you specify must be a
multiple of 5. (Default = 600 seconds)
The number of times that Specifies how many times the event must be
the event must be registered registered before the SGI Embedded Support
before an action will be taken Partner performs this action
The number of retry times Specifies the number of times that the SGI
Embedded Support Partner attempts to
execute the action before it stops.
The value cannot exceed 23; however, it is
not recommended to set it greater than 4.
For example:
action to run is diagnostic
username to execute an action nobody
action timeout 3600
the number of times that 5
the event must be registered
before an action will be taken
the number of retry times 2
This diagnostic will run after the event is registered in the SGI Embedded Support Partner database 5 times. It will be executed with nobody privileges. If the diagnostic is still running after an hour (3600 seconds), it will be killed and restarted a second time (retry times = 2).
Use this command to add a new action. The following options are available:
Action description Provides a description of the action.
Example: page to John Dow
Action command string Specifies the exact action command
to execute.
Example: /usr/bin/espnotify -p 1234567
Username to execute the action Specifies the user account that the SGI
as (default = nobody) Embedded Support uses to execute the
command. (Default = nobody)
Action timeout Specifies the time period for which the
action can run without being killed.
The value that you specify must be a
multiple of 5. (Default = 600 seconds)
The number of times an event Specifies how many times the event must
must be registered before an be registered before the SGI Embedded
action will be taken Support Partner performs this action.
The number of retry times Specifies the number of times that
the SGI Embedded Support Partner attempts
to execute the action before it stops.
The value cannot exceed 23; however,
it is not recommended that you set it
greater than 4.
For example:
action to run is diagnostic
username to execute an action nobody
action timeout 3600
the number of times that 5
the event must be registered
before an action will be taken
the number of retry times 2
This diagnostic will run after the event is registered in the SGI Embedded Support Partner database 5 times. It will be executed with nobody privileges. If the diagnostic is still running after an hour (3600 seconds), it will be killed and restarted a second time (retry times = 2).
Examples of notification options:
For more information regarding notification options, refer to the espnotify man page.
The following list includes the accepted user format strings and any action-specific options:
For example: /usr/bin/espnotify -D system_name.sgi.com:0.0 -c %D
This displays a window on the machine system_name.sgi.com. The window contains data that is significant to the event.
Use this command to delete an action. Choose an action that you want to delete.
Note: The action will be deleted from the SGI Embedded Support Partner database. If this action is assigned to some events, the list of all affected events is displayed. You have a choice to cancel or proceed with deletion. Press Yes button to delete the action and remove the selected action from all events to which it is assigned. To cancel operation return back to previous page.
Use espnotify action to deliver a text/numeric message to a pager by specifying appropriate command line options. You may obtain more information on espnotify by using the man espnotify command.
To work properly, paging has to be configured. The SGI Embedded Support Partner provides the User Interface to set required configuration parameters. All the parameters are written to /etc/qpage.cf file.
Paging requires that a modem be connected to the system to dial the paging service provider to deliver a page. The Modem/Admin section enables modem configuration. The Service section enables configuration of the parameters of the Paging Service Provider(s). Because the service provider normally identifies each individual pager by means of a pager ID (which does not have to be the pager Touch-tone number), a pager ID must be provided in order to deliver the page. The Pager section enables you to configure different pagers that are associated with the Service.
Use this command to display the current values of the paging parameters and the following types of information:
You can configure the following Modem setup parameters:
Modem name
Specifies a unique name that the SGI Embedded
Support Partner uses to identify a modem. Entering an existing modem name will update the modem name.
No spaces are allowed.
Modem device
Specifies the device to which the modem is connected (for example, /dev/ttya)
Modem initialization command
Specifies the command that the SGI Embedded Support Partner should use to initialize the modem
before dialing the Service Provider. These initialization commands are modem specific and are available
in your modem manual. For example, many paging services require that error correction be turned off on your modem.
For some modems, this can be done by including &A0&K0&M0 in the modem initialization command
You can configure the following Administration Setup parameters:
Administrator's e-mail address Specifies the e-mail address of
the person to contact if Paging
fails to deliver a pager
The time interval for retrying Specifies the amount of time that the
espnotify should wait between retries
Use this command to set up information about a paging service.
You can configure the following parameters:
Service name
Specifies the unique name that the SGI Embedded Support Partner uses to identify paging service provider.
Entering an existing service name will result update the service name. No spaces are allowed.
Device
Specifies the device (modem name) that the SGI Embedded Support Partner should use to dial the service provider.
Use SETUP > Paging > Modem/Admin to set up any modems.
Maximum number of retries
Specifies the maximum number of times the SGI Embedded Support Partner should attempt to access this service
before it quits trying.
Maximum length of the message
Specifies the maximum number of characters that can be sent using this service. This depends on your service provider.
Phone number of the paging service
Specifies the IXO/TAP telephone number of the Service Provider. Do not confuse your pager's Touch-tone telephone
number with the service provider's IXO/TAP telephone number. They are never the same.
The telephone number should contain at least 7 numbers and should not include any spaces, "-", or other symbols.
Use this command to set up a specific pager.
You can configure the following parameters:
Pager Name Specifies a unique name to identify this pager
Pager ID Specifies the ID that is used by your Paging
service provider to identify the pager.
The ID is not necessarily be the touch-tone
phone number that you dial to access the pager.
Please, contact your service provider to get
this information.
Service Name Specifies the paging service (service name) to which
espnotify should deliver the page for this pager
Use the SETUP > Paging > Service
to set up any paging services that you want to use
The Availability Monitoring is a set of tools that collectivly monitor and report the availability of system(s) and diagnosis of system crashes. Availability monitoring tools gather information from diagnostic programs like ICRASH, FRU Analyzer, SYSLOG and identify the cause of system shutdowns. The system configuration information comes from configmon, hinv and versions. Availability monitoring tools can report data to various locations based on the Availability MailList setting.
Use this command to view the current values of the availability monitor parameters. It displays the following information:
Use this command to set up the availability monitor component of the SGI Embedded Support Partner.
You can configure the following parameters:
Automatic e-mail distribution (Enable or Disable)
Specifies whether availability monitor should
automatically distribute reports by e-mail.
Display of shutdown reason (Enable or Disable)
Specifies whether availability monitor should
display the reason for a shutdown
Include HINV information into e-mail (Yes or No)
Specifies whether availability monitor should
include HINV information in the diagnostic e-mail
messages that it generates.
Capturing of important system messages (Enable or Disable)
Specifies whether availability monitor should
capture important system messages.
Start uptime daemon (Yes or No)
Specifies whether availability monitor should
start the uptime daemon
Number of days between status updates (0 - 300,Default-60)
Availability monitor, using eventmond, periodically sends a status
report if the system is up for an extended period of time. This value
specifies the number of days after which a status report should be sent.
Interval in seconds between uptime check (User specified)
Specifies the number of seconds that event monitor
should wait before it performs an uptime check on the system.
(default = 300 seconds)
Use this command to set up the e-mail lists for availability information reports.
You can set up e-mail lists for the following reports:
The availability report contains computed system availability metrics.
The diagnostic report includes all of the availability report data and diagnostic data for troubleshooting.
All performance rules can be enabled or disabled via user interface.Use this command to display performance rules status.
The report table displays the following information:
There is a set of rules available to set up for performance monitoring.
The table below provides a short description for each rule:
cpu.context_switch High aggregate context switch rateAverage number of context switches per CPU per second exceeded threshold over the past sample interval.
cpu.excess_fpe Possible high floating point exception rateThis predicate attempts to detect processes generating very large numbers of floating point exceptions (FPEs). Characteristic of this situation is heavy system time coupled with low system call rates (exceptions are delivered through the kernel to the process, taking some system time, but no system call is serviced on the application's behalf).
cpu.load_average High 1-minute load averageThe current 1-minute load average is higher than the larger of min_load and ( per_cpu_load times the number of CPUs ). The load average measures the number of processes that are running, runnable or soon to be runnable (i.e. in short term sleep).
cpu.low_util Low average processor utilizationThe average processor utilization over all CPUs was below threshold percent during the last sample interval. This rule is effectively the opposite of cpu.util and is disabled by default - it is only useful in specialized environments where, for example, processing is batch oriented and low processor utilization is indicative of poor use of system resources. In such a situation the cpu.low_util rule should be enabled, and cpu.util disabled.
cpu.syscall High aggregate system call rateAverage number of system calls per CPU per second exceeded threshold over the past sample interval.
cpu.system Busy executing in system modeOver the last sample interval, the average utilization per CPU was busy percent or more, and the ratio of system time to busy time exceeded threshold percent.
cpu.util High average processor utilizationThe average processor utilization over all CPUs exceeded threshold percent during the last sample interval.
craylink.node_cb_errs CrayLink checkbit errors on Origin nodeFor some Origin 2000 node, at least one checkbit error was observed on the node (CrayLink) interface and/or the I/O interface in the last sample interval. Use the command
craylink.router_cb_errs CrayLink checkbit errors on Origin routeFor some CrayLink router port, at least one checkbit error was observed in the last sample interval. Use the command
filesys.buffer_cache Low buffer cache read hit ratioSome filesystem read activity (at least min_lread Kbytes per second of logical reads), and the read hit ratio in the buffer cache is below threshold percent.Note: It is possible for the read hit ratio to be negative more phsical reads than logical reads) - this can be as a result of:
filesys.dnlc_miss High directory name cache miss rateWith at least min_lookup directory name cache (DNLC) lookups per second being performed, threshold percent of lookups result in cache misses.
filesys.filling File system is filling upFilesystem is at least threshold percent full and the used space is growing at a rate that would see the file system full within lead_time.
memory.exhausted Severe demand for real memoryThe system is swapping modified pages out of main memory to the swap partitions, and has been doing this at the rate of at least threshold pages swapped out per second for at least pct of the last 10 samples, ie. sustained page out activity.
memory.swap_low Low free swap spaceThere is only threshold percent swap space remaining - the system may soon run out of virtual memory. Reduce the number and size of the running programs or add more swap(1) space before it completely runs out.
network.buffers Serious demand for network buffersDuring the last sample interval the rate at which processes tried to acquire network buffers (mbufs) and either failed or were stalled waiting for a buffer to be freed is greater than threshold times per second.
network.tcp_drop_connects High ratio of TCP connections droppedThere is some TCP connection activity (at least min_close connections closed per minute) and the ratio of TCP dropped connections to all closed connections exceeds threshold percent during the last sample interval. High drop rates indicate either network congestion (check the packet retransmission rate) or an application like a Web browser that is prone to terminating TCP connections prematurely, perhaps due to sluggish response or user impatience.
network.tcp_retransmit High number of TCP packet retransmissionsThere is some network output activity (at least 100 TCP packets per second) and the average ratio of retransmitted TCP packets to output TCP packets exceeds threshold percent during the last sample interval. High retransmission rates are suggestive of network congestion, or long latency between the end-points of the TCP connections.
per_cpu.context_switch High per CPU context switch rateThe number of context switches per second for at least one CPU exceeded threshold over the past sample interval. This rule only applies to multi-processor systems, for single-processor systems refer to the cpu.context_switch rule. For Origin 200 and Origin 2000 systems, use the command
per_cpu.many_util High number of saturated processorsThe processor utilization for at least pct percent of the CPUs exceeded threshold percent during the last sample interval. Only applies to multi-processor systems having more than min_cpu_count processors - for single-processor systems refer to the cpu.util rule, for multi-processor systems with less than min_cpu_count processors refer to the per_cpu.some_util rule.
per_cpu.some_util High per CPU processor utilizationThe processor utilization for at least one CPU exceeded threshold percent during the last sample interval. Only applies to multi-processor systems with less than max_cpu_count processors - for single-processor systems refer to the cpu.util rule, and for multi-processor systems with more than max_cpu_count processors refer to the cpu.many_util rule. For Origin 200 and Origin 2000 systems, use the command
per_cpu.syscall High per CPU system call rateThe number of system calls per second for at least one CPU exceeded threshold over the past sample interval. This rule only applies to multi-processor systems, for single-processor systems refer to the cpu.syscall rule. For Origin 200 and Origin 2000 systems, use the command
per_cpu.system Some CPU busy executing in system modeOver the last sample interval, at least one CPU was active for busy percent or more, and the ratio of system time to busy time exceeded threshold percent. Only applies to multi-processor systems, for single-processor systems refer to the cpu.system rule. For Origin 200 and Origin 2000 systems, use the command
per_disk.util High per spindle disk utilizationFor at least one spindle, disk utilization exceeded threshold percent during the last sample interval.
per_netif.collisions High collision rate in packet sendsMore than threshold percent of the packets being sent across an interface are causing a collision, and packets are being sent across the interface at packet_rate packets per second. Ethernet interfaces expect a certain number of packet collisions, but a high ratio of collisions to packet sends is indicitive of a saturated network.
per_netif.errors High network interface error rateFor at least one network interface, the error rate exceeded threshold errors per second during the last sample interval.
per_netif.packets High network interface packet transfersFor at least one network interface, the average rate of packet transfers (in and/or out) exceeded the threshold during the last sample interval. This rule is disabled by default because the per_netif.util rule is more generally useful as it takes into consideration each network interfaces' reported bandwidth. However, there are some situations in which this value is zero, in which case an absolute threshold-based rule like this one will make more sense (for this reason it should typically be applied to some network interfaces, but not others - use the "interfaces" variable to filter this).
per_netif.util High network interface utilizationFor at least one network interface, the average transfer rate (in and/or out) exceeded threshold percent of the peak bandwidth of the interface during the last sample interval.
rpc.bad_network RPC network transmission failureMore than threshold percent of sent client remote procedure call (RPC) packets are timing out before the server responds and the number of timeouts is significantly more than the number of duplicate packets being received (indicating lost packets). The networked file system (NFS) utilizes the RPC protocol for its client-server communication needs. This high failure rate when sending RPC packets may be due to faulty network hardware or inappropriately sized NFS packets (packets possibly too large).
rpc.slow_response RPC server response is slowMore than threshold percent of sent client remote procedure call (RPC) packets are timing out before the server responds and the number of timeouts is roughly equivalent to the number of duplicate packets being received. The network file system (NFS) utilizes the RPC protocol for its client-server communication needs. This high timeout rate when sending RPC packets may be because the NFS server is processing duplicate requests from the clients which were sent after the original requests timed out.
espping.response System Group Manager slow service responseA service being monitored by the SGI Embedded Support Partner Group Manager has taken more than threshold milliseconds to complete, during the last sample interval. The hosts parameter specifies hosts running the ssping PMDA, not hosts being monitored by this PMDA. The latter are encoded in the "instances" for each espping PMDA metric - run
espping.status System Group Manager service probe failureA service being monitored by the SGI Embedded Support Partner Group Manager has either failed, or not responded within a timeout period (as defined by espping.control.timeout) during the last sample interval. The hosts parameter specifies hosts running the espping PMDA, not hosts being monitored by this PMDA. The latter are encoded in the "instances" for each espping PMDA metric - run
Archive Database
Use the Archive Database command to delete a previously archived database or to get instructions for archiving.