Strengthening network resilience

Stuff happens

Customers of M1, the smallest player in the Singapore mobile market, have endured three major service disruptions in thirteen months.  The worst, in January 2013, brought M1’s network down for almost three days, resulting in a hefty S$1.5m fine from the Infocomm Development Authority of Singapore (IDA).  In October 2013 a fire at one of Singtel’s exchange buildings brought down mobile, broadband and other IP-based services such as digital telephony.  When Singtel’s exchange went down, Starhub and M1 customers were not spared as these operators lease lines from Singtel, lines that were also damaged in the fire.  Singtel owns the only copper network and, through OpenNet, the only fibre network in Singapore.  With internet connectivity down, many industries including banking and financial services, payment systems, public transport and health services were crippled.  Singtel was fined a record S$6m by the IDA for the service outage.

DominoesSuch network outages are not unique to Singapore.  The European Union Agency for Network and Information Security (ENISA) reported 79 significant incidents of network outages across the EU in 2012.  The majority of these incidents affected mobile telephony and internet services.  Hardware failure and software bugs were the most common causes of these incidents.1

Such outages are likely to cause more damage and disruption as reliance on network infrastructure increases and demand for bandwidth grows.  The Internet of Things – devices communicating with one another or controlled by the owner via an internet connection – will cease to work when the underlying networks fail.

In the near term, service outages may simply be a feature of network strain as networks are upgraded to cope with increasing demand for bandwidth – teething problems that are common in many areas.  In the longer term, however, it becomes vital that networks are more resilient.  This is reflected in EC Commissioner Kroes’ remarks:  “[T]elecom touches everything and users are developing massive expectations of it.  Markets must function, devices must function, networks must function and investment needs to happen”.

The main aim of such investment should be to lower the probability of a network outage and reducing the harm caused by a network outage should it occur.  The underling issue is that, left to their own devices, network operators may not invest enough in providing resilience.  This is because of the external effects and the public good characteristics of resilience.  The failure of one operator’s network affects not just its own customers, but also those of other networks, and overall resilience depends on the efforts of all network operators.

Regulating for resilience

This means that achieving an appropriate level of resilience is likely to require imposing some regulatory requirements, backed by regular audits to ensure that these requirements are met.  In order to provide incentives for investment, a fine may be imposed on operators who have failed to invest in strengthening their networks.  Provided that the fine reflects the probability of network failure and the likely amount of harm caused to society  by such failure, operators should face the incentives to make appropriate investments in their networks.

The IDA conducted a review of the resilience of mobile networks in early 2013 and concluded that the mobile networks in Singapore were resilient.  Nonetheless, as M1’s woes show, service disruptions still occurred.  And despite  having been fined for the January 2013 outage, a subsequent outage occurred less than a month later.  Such incidents have prompted IDA to review its Telecom Resilience Code.  The IDA will also implement a new audit framework for mobile operators in 2014.  It would seem appropriate also to review the proportionality of the fines imposed for outages in order to increase their effectiveness in improving network resilience.

In any case, the harm caused by a network outages can only be reduced if some backup capacity is available to take over the load of the primary system when it goes down.

One for all and all for none

Most countries have multiple mobile networks, and the most efficient way of providing redundancy would seem to be to allow each network operator to tap into other mobile networks when its own network goes down.  Subscribers on the failed network may be allowed to “roam” on other networks.  All the mobile operators in a country will in effect, become jointly responsible for network resilience in the country.  In effect, this is infrastructure sharing in the case of a network outage, leveraging on all available network assets to ensure continuity of communication services for all.  The IDA is currently conducting studies on allowing roaming on other mobile networks in the case of a mobile network outage.

This requires, of course, some duplication of network infrastructure and limiting network sharing to instances of network outages.  Allowing network sharing in cause of normal operation in order to reduce costs would be counter-productive.

One needs to be careful, however, not to be too permissive even in the case of outages.  Otherwise, this may create disincentives for operators to invest in their networks.  Operators with inferior network have no incentive to upgrade their networks as they can simply free ride on rival networks for resilience.  The burden of ensuring network resilience falls to operators with superior networks, eventually  eroding the incentives of all operators to invest in their networks, as there is no commercial benefit to providing a backstop for others.

This means that setting the right terms and conditions upon which roaming is available is crucial.  If the operator whose network has failed faces stiff roaming costs, this reduces the temptation simply to rely on others for resilience.  Similarly, if the “host operator” can make substantial gains from offering such roaming services, it would make commercial sense to have a more resilient network.  Provided that  network failure is sufficiently costly to an operator, the benefits from network sharing arrangements for protecting customers from the harm caused by network outages should outweigh any disincentives to invest.

To this end, IDA should exercise caution if and when regulating roaming rates to in such network sharing arrangements.  Rates should not be so low as to undermine operators’ investment incentives.

Further, policy measures that link network outages to an operator’s bottom-line, should also provide incentives for improving network resilience.  For instance, an operator can be required to offer its customers rebates on their monthly bill proportionate to the extent of the disruption.  Post-paid subscribers could be allowed to end their contracts should instances of such outages exceed a certain frequency over a particular period.  Finally, the regulator could publish results from its network audits in an accessible manner to enable consumers to take this information into consideration when choosing a mobile network operator.

Technical challenges

There may also be technical limitations that stand in the way of network sharing as a backstop in case of outages.  An operator would require ample spare capacity to take on a sudden increase in load when a rival network goes down.  IDA has noted that “substantial investment by the mobile operators to significantly expand their network capacities” is necessary in order for national roaming to work.  The operators who have discussed the issue with IDA reported that “national roaming may overload other telco’s network and make matters worse.”  In addition, a sudden surge in roaming traffic could also result in significant lowering of Quality of Service (QoS) for customers on the home network, ultimately, doing more harm than good.

Singtel, StarHub and M1 have approximately 46%, 28%, and 26% subscriber share respectively.  It is difficult to see the two smaller players being able to accommodate traffic from Singtel’s subscribers on their current networks should Singtel face an outage.  If the required investments are substantial, roaming rates could be in excess of subscribers’ willingness to pay, rendering network sharing ineffective as a safety net.

IDA could mandate network sharing only in limited circumstances of severe and widespread disruptions to services.   Basic voice and SMS services can be pioritised whilst data traffic is offloaded to WiFi networks (presuming broadband access via WiFi networks is still running).  QoS requirements can be forgone in such cases of emergency.  For instance in the Netherlands, national roaming for voice and SMS services is allowed if an outage affects more than 500,000 people, and the network recovery is expected to take longer than 3 days.  In the US, Sweden and the Caribbean, national roaming for resilience purposes is allowed in cases of emergency.

HetNet to the rescue?

In the wake of M1’s service outage in January 2013, Singapore’s Minister for Communication and Information (the Minister) also suggested requiring the mobile operators to have geo-redundant networks by maintaining a separate set of core network equipment.

Whilst it is important to ensure network resilience, one also needs to have an eye on costs.  The Minister’s proposal to build an entire geo-redundant network is likely to be costly and the costs – which can be expected to be passed on to end customers – may outweigh the potential benefits from having a redundant network in the case of an outage.

In addition, this would not deal with the outages that arise as a result of software failure.  Given hardware and software failures are the two major causes of network outages, having some redundancy in software systems would seem necessary to strengthen network resilience.  Therefore, rather than focusing investments solely on a separate set of core network equipment, more selective investments, in critical portion of the network, replacing old or outdated equipment, or those most likely to fail, may be just as effective without necessarily adding inefficient costs.

Roaming on other mobile networks is of course only the first step.  In the case of an outage of a mobile network, rival mobile networks, WiFi connectivity and fixed voice services may be used as alternatives.

Most recently, proposals were made for a nationwide “Heterogeneous Network” (HetNet), an overarching network comprising WiFi, 3G and 4G layers.  The HetNet would allow users to utilise the network offering the best connectivity and would potentially also allow users to roam on rival networks in the case of a network outage.

IDA is currently consulting on the likely technical and policy requirements for a HetNet.  The announcements regarding the HetNet proposals to date suggest that the HetNet is likely to involve some automatic WiFi offloading alongside 4G and 3G coverage.  The HetNet may also utilise both Frequency and Time Division Duplexing 4G networks, supplementing traditional macro cell coverage with capacity-enhancing small cell coverage in targeted areas.  It could include multiple mobile networks with users being able to utilise the network offering the best quality of service.  It is unclear if the HetNet would be some sort of wholesale-only network, similar to the national fibre network funded by the government or a commercial arrangement between the mobile operators with users being able to roam between these operators’ networks and WiFi networks, potentially including carrier grade WiFi systems should these become available in the future.

Notwithstanding these details, a diversification of networks on which traffic may be carried would improve resilience.  In the case of a mobile network outage, access to an active broadband connection via WiFi will mean that users still enjoy IP-based services such as the use of OTT chat applications or IP-based voice services.  More generally, WiFi connectivity may reduce the load of roaming mobile data traffic, increasing the technical feasibility of network sharing in the case of an outage.  

The diversification of network systems used by the HetNet should in principle ensure that required investments to ensure network resilience are minimized by reducing the number of single points that may fail rather than trying to duplicate network infrastructure across the board.  This will need some co-ordination amongst operators, and some government investment to fund at least part of the development of the HetNet may still be necessary.  Given that the Singapore government has a track record of making these sorts of investments, it may only be a matter of time before a HetNet relieves customers of their service disruption woes.

 

Print This Post Print This Post

Footnotes

  1. ENISA, November 2013, National Roaming for Resilience, national roaming for mitigating mobile network outages. []

Comments are closed.