Nowadays, resilience is one of the major challenges for enterprises and organizations. It is a fact that business continuity is of major importance as it deals with the recovery of critical services or products of an organization to a predefined level in case of disruption. Furthermore Business continuity ensures that critical business functions will continue operate despite serious incidents.
A common characteristic of business continuity and telecommunications is that both operate in real-time. Incidents or disasters occur within dynamic environments without prior indication and can change dynamically. Telecommunications in such cases can be disrupted due to several reasons. Especially VoIP is vulnerable as it can fail in many different stages of its deployment mostly due to poor implementation.
For this article we define VoIP as the methodology, which uses a group of technologies for the delivery of voice/media communications through the IP network. For the easiness of the reader we will present this case study through examples and case scenarios.
For small and medium businesses VoIP is treated in most cases as the “telephony solution” that reduces cost of calls. VoIP offers free calls between branches and cheaper international calls. Most remote offices or individuals increase their productivity by using remote IP phones connected to the server, which is located on local headquarters. In most cases these scenarios suffer a basic flaw, which is not obvious until it happens.
Lets investigate the major reasons that can cause disruptions, starting from the remote environment and conclude with the core of such a network.
Remote workers and disruptions
Remote engineers of technical support often work from home or other remote locations. When power outage is happening, their IP phone as well as their router for Internet connectivity are switched off. The engineer may loose an important call that will break the current SLA. In that case it is advisable always to have a b-plan for contacting the engineer. A mobile number in this case could help, as this will buy him time to communicate with company and customer. Even if the company’s pbx has been configured to call the engineer on the mobile in the case of a failure of his/her VoIP Device there is one more factor that will not allow the operation to continue. Even if the support engineer is called on the mobile he/she has no access to Internet to continue work remotely. No console, web, CRM or ERP. The alternative descent solution in this case that could guarantee continuity of telecoms is a UPS (at SOHO) that can support sufficient power for at least 45 minutes of the Internet router, the laptop/pc and the voice IP phone. Even if the engineer is having a local power failure can still save the work or continue working from his/her local SOHO environment. UPS can only solve the power outage cases. In the event of a PSTN/ISDN failure the engineer cannot work, as there is no connectivity to the Internet. Again the mobile phone can be used as a hotspot using the 3G/4G capabilities, but this may trigger other implications. For example if the remote server has a white list of remote access IP’s and the mobile has been allocated with a different IP (usually dynamic) it may be impossible for the engineer to access remote infrastructure. If there is a VPN server waiting on the far end for the engineer’s laptop to get connected to it and the mobile provider is not allowing VPN connectivity then the laptop cannot get connected and the work cannot be done. PSTN and ISDN failures though are rare (comparing to power outages). In any case the company has to protect its support service by training the support engineers of what they do in case they loose contact with company or customers before it happens. As we will present later there is a case of that the premises holding the servers have a power outage and communication with remote engineer is lost.
VoIP Infrastructure and power considerations
In the previous paragraph we have seen how the remote end of the VoIP infrastructure can fail and we have proposed that a UPS and a mobile phone can maintain communication between all involved members. The next question arises is what happens within the premises of a medium company (lets assume 50 employees are working on the premises). Again VoIP can disrupt business in some cases but we should not blame it as this could also happen even if the company is using a legacy TDM PBX with PRI connections, which also uses power to operate. When a Telecom failure occurs we have to first find out what is the reason of disruption. In case of VoIP all pbx/servers switches and IP appliances are using power. UPS should be installed to support not only the server infrastructure but also the IP desktop phones. This will maintain communications and protect the PBX from being damaged due to several reasons (Corrupted database due to sudden shut-off, damaged hardware, etc.). It is obvious that the same applies for VPN hosting servers, session boarder controllers or any other equipment is related with the VoIP infrastructure. Use of Power over Ethernet (POE) switches id advisable as it can power IP phones using the Ethernet cable (no use of power adapters). That means that you may have all UPS installed locally and provide power the phones through the POE switches.
VoIP and employees
VoIP can offer great benefits but can also become an issue for the employees. Every time telecom infrastructure suffers a power outage the time of recovery of the service may vary from 3 to 5 minutes from the moment that power is restored. This is the average time for VoIP hardware appliances to reboot and be operational again. In case of emergency or disaster always have a legacy phone (or more) installed to a public places that can been used to call the emergency services. It is wise to use a phone connected directly to your telecom network (PSTN) bypassing any VoIP infrastructure. SIP to TDM modern gateways provide this functionality and the bypass automatically the VoIP connectivity in case of an outage. Assign roles to people that should call emergency services, from the legacy phone or from their mobile when they are outside the premises in a safe place. If you are relying on a SIP trunking provider for your outbound calls make sure they allow calls to emergency services. (most of the appliances in the market though provide connectivity to a legacy phone).
VoIP and Call Centers (critical)
Most of the issues mentioned in previous paragraphs apply for Call Centers. In that case you either accept a large volumes of telecom traffic from inbound calls (for reservations, ticket booking, technical support or other),which means that this is a critical operation (service) and disruption means loss. For companies making outgoing calls (for advertising or promotional purposes), the service is critical as well and disruption is not an option. Make sure you maintain server rooms with stable temperature/humidity and other conditions to a predefined stable level. Use UPS to support call center and make sure structural infrastructure (like plumping installation) is checked as a flood will destroy your equipment. Call centers most often use a mixture of VoIP and TDM technologies like PRIs to accept calls. It would be wise to also own a VoIP number which can support concurrent calls (depending on your traffic) that can also support the current infrastructure in case of emergency. In the event of a VoIP server breakdown consider have a secondary standby physical server, a backup of all your VoIP equipment stored in a safe place or a hosted pbx ready to serve your pbx in case of total destruction of the physical servers. A mixture of technologies and use of cloud nowadays are the key to success of having a secondary plan in case of en emergency. It is clear that having a backup plan is an extra cost but before rejecting it, considering how small the cost is comparing to the loss of your most important critical service, you telecom network.
PBX related options and incidents
In most cases the PBX consists of software components. Since a PBX is the heart of the system it is advisable to ask your PBX provider if it can work as a virtual machine and if you can have a secondary VM as a test environment (and backup) please do so. In this case keep in mind that telecom PCI cards do not always operate with virtual machines. In this case prefer to go for a gateway. Most of the modern gateways offer redundancy and failover capabilities. Do not choose extreme infrastructure that is difficult to be maintained and that is hard to operate. Choose one simple solution that fits your needs. (For example a 15 employees company is having a legacy phone that will work on crisis and it is placed on the front desk). It is obvious that while an incident is happening (e.g. en earthquake), you will not have time for changes in configuration of PBX. So go for automatic permanent stable solutions that will satisfy the scenario. Pre Configure alternative automatic route selection. When primary routing is failed secondary will be used. This allows always to dial a successful outbound call. If this is an emergency trunk inform the user about it (voice message).
VoIP – Go large
In large enterprises or organizations the scenario often looks complicated but its not. The budget allows investment on redundancy using Telco grade equipment, which is an advantage as this hardware is even harder to fail. The rule remains the same though. The equipment is not the major factor of success. The success scenario is the one that has been well analyzed, planed, tested, verified and so on, according to the policies of the organization.
VoIP and IP – Disruptions of the medium
In all cases VoIP is susceptible to other factors that may affect the operation of a telecom network. The most important part is to ensure that IP network will not introduce issues. So to minimize the risk of disruptions:
- Ensure that your provider is having an alternative solution in case their primary switch fails; If not you will not be able to terminate calls. Ask if they offer redundancy or cluster infrastructure and what SLA do they offer.
- Secure your telecom network using a Session Boarder Controller (SBC). This will protect your telecom networks not only from hackers, but from attackers trying to make your telecom service unavailable (DDOS attacks).
- Prefer Peer-to-Peer connectivity rather than SIP trucking on trustworthy providers.
- 711 Codec is the standard for the telecoms. In most cases you should also consider using also G.729, Opus or other low bandwidth codec, to save bandwidth. This will be needed if your primary internet connection is out of order and you are using a secondary one (which often provides less bandwidth).
- Ensure you have configured a stable and efficient IVR, which will route the traffic as it should during an incident.
- Consider having a spare TDM card and a (hot swap) hard drive (for raid) for your server. In case of a failure in a production server this will allow you to minimize the partial loss of traffic (and money) due to hardware failure.
- Use redundancy whenever it is possible and fits your budget. Hardware raid, redundant power supplies, clusters and cloud are some of the most known options that can secure your service.
- Finally always keep the VoIP network isolated from the data one. I do recommend to forbid Wi-Fi connectivity to your VoIP network, since I have seen cases forgotten Wi-Fi spot (with free access), from which I could reach all the servers of the premises. If you intend to use Wi-Fi and VoIP please strengthen your security to the maximum using the latest trends.
Unified Communications and VoIP Integration
The complete telecom system of a company consists of different components in order to offer presence and multimedia capabilities, mobility, enterprise resource planning (ERP) / customer relationship management (CRM), computer telephony integration (CTI), audio recording of calls (tapping), call detail records (CDR) analysis and other custom applications. Connecting VoIP to all of them is called integration.
- Since this is a quite complex and sensitive ecosystem the need of monitoring using, monitoring tools (NMS) is essential. SBC provides such capabilities for the calls and traffic monitoring. In order to prevent disruption of your system you should also monitor all components software and hardware regarding their health. A corrupted database for example can disrupt your telecoms. Also monitor your bandwidth and make sure the quality of service (QoS) is ok.
- Never apply patches on production servers directly. Use a test replicated (virtual machine) server first and ensure everything is fine before patching the production server. Always do a full reset of the server when you are done to ensure it works after a complete shut down. Do not upgrade just before bank holidays or other public holidays and make sure you can get technical support before the next working day.
- Before upgrading the firmware test it using a single device. There have been many cases that phones have been destroyed or malfunctioned because of the new upgrade. Leave the operators phones for the end.
- Do not change PBX options in complex scenarios on production server directly if you are not sure. It is unavoidable that sooner or later you will get undesired results. Always inform users before, and do not ever interrupt calls. One of them may cost millions to the company.
- Keep a record of versions of your equipment. Your escape plan is to roll back in case something is wrong.
This article is not written to scare people. It is written to highlight and prevent people from certain actions that have lead to disruption in the past. Most of the VoIP professional solutions are excellent. Sale-managers today already know the basics of how to guide you to choose the correct infrastructure and help you avoid the pitfalls. It is a fact that VoIP can be disrupted easily under certain circumstances. In most of the cases though, the blame is not on the technology itself, but due to poor implementation or unreliable components and bad decisions.
The basic concept of a telecom network is to keep it stable, secure and healthy. Avoid changes of your VoIP options every second day. Doing so you are changing the telecom habits of your users and customers which is not good. The more changes you are applying in the VoIP network, the greatest the level of complexity becomes. It is sure that if you do changes all the time (especially without keeping notes), you will forget a little detail that may cause disruption. There has been a case that a forgotten network port on a remote phone with a default username & password was the reason of attack.
VoIP has evolved to a service that provides survivability and can offer alternatives during, or after the end of an incident. Analysis of the requirements of the network (and options) in this case has been proved to be the little detail that will make the difference. Design of your network using a schematic diagram will give you the big picture and let you decide about functionalities and options. Through the implementation feedback is important as it will point out missing elements or extra needs. So as soon as you are happy do a validation of how the whole system works and highlight any missing details or changes needed. The benefit of using this procedure is that anytime you may go through the procedure again, revise, upgrade and so on.