Executive summary:
A basic necessity of any telecommunications mobile service provider is to know in real time if any part of their network, especially at the BTS level, has been physically compromised and no longer adequately servicing their local areas. This is especially true for service areas that are known to endure catastrophic natural events such as severe weather systems, earthquakes, forest fires, etc …
The Outage Manager was designed to provide mobile operators with this functionality and in most cases even give advanced warning when cell sites are beginning to incur problems due to external failures.
The interface to the Outage Manager is built to present a topological visualization of the operator’s network. A high degree of user interaction creates the ability to ‘surf’ through the network and easily navigate to any geographical area. All cell sites are precisely located in the GUI via county associations and coordinates and users can zoom in and out of view to change overall perspective and level of detail. Visual cues (i.e. color changes) easily identify faults or declining performance trends within geographical areas.
The value for the operator:
Knowing what is happening in real time to cell sites that are exposed to the elements of nature and vulnerable to catastrophic events is critical to operators as a disaster recovery mechanism. An early warning system of this nature will greatly strengthen an operators’ ability to ensure services to subscribers and maintain a high uptime of the network even during cataclysmic events. The highly dynamic and intuitively visual GUI would be valuable to users at all levels from upper management to switch techs to the NOC.
How it works:
At regular intervals, the system logs into all BSS systems (BSCs and RNCs) and attempts to determine the operational status. The initial test is simply to determine if the network element is reachable and responding. There could be a number of reasons why a login attempt fails ranging from network issues, login/password issues or possibly even an unlikely scenario of a catastrophic failure of the network element. In case any systems are not reachable or not responding, the appropriate cause code is logged in a heartbeat table for further inspection. For responding elements, the list of active alarms is retrieved for evaluation. The active alarms (if any) are compared against a known “Fingerprint List” and recognized alarms are categorized into 4 main categories:
- Site Up and operating on auxiliary battery power.
- Site Down due to loss of commercial power.
- Site Down due to loss of transport facility.
- Site Down due to unknown cause.
The GUI of the Outage Manager is a dynamic, fluid and highly interactive interface. Users are initially presented with a very high level view of the network from a national level. Mouse interactions make it easy to focus and zoom in on any specific geographical areas. As the user zooms in on a targeted area, the view also becomes more and more detailed. Individual network elements become visible and their operational status becomes more visible and detailed.
Return of Investment:
Large scale disasters can lead to huge outages on the operator’s network and result in massive financial losses. Having a reliable mechanism in place to accurately detect and report on such outages in real time as they are happening will greatly enhance the operator’s ability to react fast and manage the disaster in an efficient and effect manner. Such a real time application will lead to a recovery from the outage in the shortest possible time and limit its overall impact.
Conclusion:
Knowing when a node on the network is not functioning correctly is deemed highly valuable to operators as part of their regular maintenance activity on the network. But the value of such knowledge climbs exponentially during disastrous events in allowing the operator to recovery rapidly and thus any limit the financial impact on the corporation.
|