Get The Most Out Of The T:LAN/RIO Alarm Reporting Capabilities (Part 1)
This is Part 1 of a series which attempts to provide a more comprehensive look at the SNMP alarm reporting capabilities of the Optima T:LAN and RIO products. Here is some background information before we get into the details.
Protocols & Data Exchanges
When talking about alarm events, it is important to distinguish between T:LAN’s use of SNMPv1 traps and RIO’s use of SNMPv2 InformRequests.
Both are typically referred to as traps, yet they cover completely different event types. They also act differently in terms of the underlying SNMP protocol, packet exchanges, expected replies (or lack thereof), repeat counts and repeat rates at which they are sent.
The following is an excerpt from the T:LAN User Guide, Chapter 4: Theory Of Operation – SNMP Traps. It covers the use of the SNMPv1 traps. These are used exclusively to communicate events recorded on the T:LAN.
Refer to the T:LAN User Guide, Chapter 5: Configuration – SNMP Menu for details on how to mask each of the available trap categories for any of the four possible NMS destinations. This allows precise steering and control over the types of SNMP traps being generated by each T:LAN unit and sent to each NMS destination.
The RIO alarms use SNMPv2 InformRequest PDUs to relay alarm information.
This is superior to SNMPv1 fire-and-forget traps. SNMPv1 did not resolve the uncertainty whether or not the recipient actually received the intended communication.
Therefore, with SNMPv2, the InformRequest sent by the alarm sender (T:LAN+RIO) will be ‘ACK‘ed by the recipient (NMS). This is done by sending back a copy of the processed SNMPv2 InformRequest. This assures the alarm originator that the recipient has indeed received and processed the event being reported.
Even though this is only a small change in the operation of SNMP ‘traps’, it eliminates the need to repeat the same event notification just to ensure proper reception.
It also saves valuable bandwidth, lowers the required processing overhead, and significantly reduces the possibility of causing ‘event storms’ when many events need to be reported at once.
How Many Event Entries Does The T:LAN Keep In Its Event Log?
Up to 640 events are kept in the T:LAN Event Log.
How Many SNMPv1 Traps (Also Called Generic Traps) Can The T:LAN Handle Concurrently?
The T:LAN can handle up to 16 concurrent SNMPv1 (generic) traps.
How Many Times Will Each Generic Trap Be Repeated?
Each generic trap will be repeated 3 times.
What Is The Repeat Interval Between Generic Traps?
The interval is 5 seconds.
Are These Values User Configurable?
No. Contrary to the operation of the SNMPv2 InformRequests, the parameters of the rarely used generic traps are not user configurable.
How Else Can A User Control The Generation Of SNMPv1 Traps In The T:LAN?
See the T:LAN User Guide, Chapter 5: Configuration – SNMP Menu. Each trap category can be masked individually for each NMS destination to stop unnecessary/nuisance traps:
What Is AAS (Adaptive Alarm Suppression)?
Here is how the RIO User Guide explains Adaptive Alarm Suppression:
Can Contact Bounce Be Reduced?
Of course. You can find all the details in the RIO User Guide, Chapter 1: Improved Resiliency of Contact Inputs.
What Can Be Done To Slow The Rate At Which Events Are Generated?
Here are several tips (which you can all combine) to reduce the number of alarm events being generated. As you are applying these measures right at the source (where it makes the most sense) you will really see the impact this will have on your overall network traffic!
ADD A QUALIFICATION TIMER.
Probably the most effective and important step! Do this for each offending discrete contact input. Only if the input stays in alarm for longer than the qualification period will the T:LAN+RIO issue an alarm.
Correspondingly a clear will only be issued once the input has gone back to the rest state for longer than the qualification period. This eliminates the ‘chatter’ usually seen from noisy alarm inputs.
ADD AAS (Adaptive Alarm Suppression).
Enabling the AAS feature on an alarm input will cause future alarms to be automatically squelched (for a defined period of time), should the input transition more than the allowed number of times within in a set time window.
This is a great way of reigning in misbehaving (or chatty) inputs. And it does NOT require manual intervention. The AAS algorithm automatically takes a suppressed input out of AAS again after a set period of time.
If the input still chatters away at that time, then it will once again be put under AAS control.
Should you observe this kind of pattern (several periods of AAS control following each other), then you have a clear indication that this input really needs some TLC or scheduled maintenance to bring it back in line.
MAKE SURE THE NMS RESPONDS PROPERLY.
Ensuring that the NMS answers each SNMPv2 INFORM REQUEST tells the T:LAN+RIO that it no longer needs to repeat an alarm notification and that the NMS acknowledges receipt. If that mechanism is not in place, the T:LAN+RIO will attempt to re-transmit the same SNMP notification based on the user selected interval period and repeat count.
CHANGE THE TRAP FORWARDING MODE.
The T:LAN+RIO support two distinct RIO Trap Forwarding Modes:
A) CONCURRENT EVENT REPORTING
Once an event is recorded, the trap forwarder begins reporting the state transition. If a new event for the same IO is recorded before the trap forwarder has finished reporting the previous event, then two or more concurrent trap notifications for the same IO will be sent out. Choose this mode if preserving the event sequence is of priority. This is the default setting!
B) ONLY MOST RECENT EVENT REPORTING
Newly recorded events abort any prior event reporting for the same IO still in progress. The trap forwarder immediately begins reporting the most recent event for the corresponding IO. Choose this mode if the most up to date state reporting is of priority.
CHANGE THE REPEAT COUNT AND REPEAT INTERVAL.
By default, the following settings are active in a T:LAN+RIO:
MINOR Alarms (Any Level 1-9):
Repeat MINOR ALARM RAISED NOTIFICATIONS for a maximum of 5 times, every 30s unless ACKed sooner by NMS destination.
Repeat MINOR ALARM CLEARED NOTIFICATIONS for a maximum of 5 times, every 30s unless ACKed sooner by NMS destination.
Result: T:LAN+RIO will only attempt to deliver MINOR events for a max period of 5 x 30s = 150s = 2 minutes and 30 seconds.
MAJOR Alarms (Any Level 1-9):
Repeat MAJOR ALARM RAISED NOTIFICATIONS for a maximum of 10 times, every 20s unless ACKed sooner by NMS destination.
Repeat MAJOR ALARM CLEARED NOTIFICATIONS for a maximum of 10 times, every 20s unless ACKed sooner by NMS destination.
Result: T:LAN+RIO will only attempt to deliver MAJOR events for a max period of 10 x 20s = 200s = 3 minutes and 20 seconds.
CRITICAL Alarms (Any Level 1-9):
Repeat CRITICAL ALARM RAISED NOTIFICATIONS for a maximum of 25 times, every 10s unless ACKed sooner by NMS destination.
Repeat CRITICAL ALARM CLEARED NOTIFICATIONS for a maximum of 25 times, every 10s unless ACKed sooner by NMS destination.
Result: T:LAN+RIO will only attempt to deliver CRITICAL events for a max period of 25 x 10s = 250s = 4 minutes and 10 seconds.
OTHER Alarms (Covers INFO/WARNING/FAIL Events):
Repeat INFO NOTIFICATIONS for a maximum of 2 times, every 50s unless ACKed sooner by NMS destination.
Result: T:LAN+RIO will only attempt to deliver INFO events for a max period of 2 x 50s = 100s = 1 minute and 40 seconds.
Repeat WARNING NOTIFICATIONS for a maximum of 3 times, every 40s unless ACKed sooner by NMS destination.
Result: T:LAN+RIO will only attempt to deliver WARNING events for a max period of 3 x 40s = 120s = 2 minutes.
Repeat FAIL NOTIFICATIONS for a maximum of 25 times, every 10s unless ACKed sooner by NMS destination. These are the highest level notifications, usually reserved for hardware fail events.
Result: T:LAN+RIO will only attempt to deliver FAIL events for a max period of 25 x 10s = 250s = 4 minutes and 10 seconds.
The RIO however supports a much wider range of repeat intervals: these can be specified as follows:
- 1 to 60 seconds
- 1 to 60 minutes
- 1 to 24 hours
- 1 to 60 days