Voice over IP (VoIP) is a methodology and group of technologies for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. Other terms commonly associated with VoIP are IP telephony, Internet telephony, broadband telephony, and broadband phone service.
The term Internet telephony specifically refers to the provisioning of communications services (voice, fax, SMS, voice-messaging) over the public Internet, rather than via the public switched telephone network (PSTN). The steps and principles involved in originating VoIP telephone calls are similar to traditional digital telephony and involve signaling, channel setup, digitization of the analog voice signals, and encoding. Instead of being transmitted over a circuit-switched network, however, the digital information is packetized, and transmission occurs as IP packets over a packet-switched network. Such transmission entails careful considerations about resource management different from time-division multiplexing (TDM) networks.
VoIP systems employ session control and signaling protocols to control the signaling, set-up, and tear-down of calls. They transport audio streams over IP networks using special media delivery protocols that encode voice, audio, video with audio codecs, and video codecs as Digital audio by streaming media. Various codecs exist that optimize the media stream based on application requirements and network bandwidth; some implementations rely on narrowband and compressed speech, while others support high fidelity stereo codecs. Some popular codecs include μ-law and a-law versions of G.711, G.722, a popular open source voice codec known as iLBC, a codec that only uses 8 kbit/s each way called G.729, and many others.
VoIP is available on many smartphones, personal computers, and on Internet access devices. Calls and SMS text messages may be sent over 3G or Wi-Fi.
SIP
The Session Initiation Protocol (IETF RFC 3261) is a widely implemented standard used in VoIP communications to setup and tear down phone calls. Figure 1 depicts (at a high level) the SIP messages that are exchanged during a phone conversation. A brief explanation will follow.
Figure 1. SIP Call Setup and tear down.
In step 1, the user’s device (called a User Agent in SIP terminology) registers with the domain registrar who is responsible for maintaining a database of records of all subscribers for the respective domain. User registration in VoIP is necessary because it provides the means to locate and contact a remote party. When Bob wants to contact Alice, he will send an INVITE request to a proxy server. Proxy servers are responsible for routing SIP messages and locating subscribers. When the proxy server receives an INVITE request, it attempts to locate the called party and relay progress to the caller by performing a number of steps, such as DNS lookups and the routing of various SIP messages (provisional and informational ). The step that is impacted by registration hijacking, as we will see shortly, is during the device registration in step 1 of this figure.
Registration Hijacking
Figure 2 depicts a valid registration message and response from the SIP registrar, which is used to announce a user’s point of contact. This indicates that the user’s device accepts calls.
Figure 2. REGISTER Request.
The REGISTER request contains the Contact: header which indicates the IP address of the user’s device (for either a VoIP soft or hard phone). When a proxy receives a request to process an incoming call (an INVITE), it will perform a lookup to identify where the respective user can be contacted. In this case, the user with the phone number 201-853-0102 can be reached at IP address 192.168.94.70. The proxy will forward the INVITE request to that IP address. The reader may notice that the advertised port is 5061. This port is reserved for SIPS and in this popular implementation it is actually in violation [ref 1] of RFC 3261.
The following Figure 3 displays a modified version of the REGISTER request that is sent by the attacker.
Figure 3. A modified version of the REGISTER request.
In this request, all the message headers and parameters remain the same except for the parameters in the Contact header. The information that has been changed in the Contact header is the IP address (192.168.1.3) which points to the attacker’s device. The REGISTER request is sent to the SIP Registrar at 192.168.1.2. The tool that was used to generate this request is SiVuS [ref 2] which is demonstrated below in Figure 4.
Figure 4. SIP Registration Spoofing Using SiVuS Message generator.
The hijacking attack works as follows:
Disable the legitimate user’s registration. This can be done by:
- performing a DoS attack against the user’s device
- deregistering the user (another attack which is not covered here)
- Generating a registration race-condition in which the attacker sends repeatedly REGISTER requests in a shorter timeframe (such as every 15 seconds) in order to override the legitimate user’s registration request.
2. Send a REGISTER request with the attacker’s IP address instead of the legitimate user’s. The following Figure 5 demonstrates the attack approach.
Figure 5. Overview of a registration hijack.
This attack is possible for the following reasons
- The signaling messages are sent in the clear, which allows an attacker to collect, modify and replay them as they wish.
- The current implementation of the SIP Signaling messages do not support integrity of the message contents, and thus modification and replay attacks are not detected.
This attack can be successful even if the remote SIP proxy server requires authentication of user registration, because the SIP messages are transmitted in the clear and can be captured, modified and replayed. This attack can be launched against both enterprise or residential users. For example, a home network that uses a poorly configured wireless access point can be compromised by an attacker who can intercept and replay registration requests. This also includes configurations where WEP (Wired Equivalent Privacy) or WPA (Wi-Fi protected access) is used, since there are known vulnerabilities that allows an attacker to gain unauthorized access. [ref 3] As such, the attacker can perform various attacks including making fraudulent calls or redirecting communications. In an enterprise environment an attacker can divert calls to unauthorized parties. For example, calls from stockholders can be diverted to an agent that is not authorized to handle certain trade transactions for customers. In some cases this attack can also be viewed as a “feature” for employees who prefer not to be disturbed.
This attack can be suppressed by implementing SIPS (SIP over TLS) and authenticating SIP requests and responses (which can include integrity protection). In fact, the use of SIPS and the authentication of responses can suppress many associated attacks including eavesdropping and message or user impersonation.
Eavesdropping
Eavesdropping in VoIP is somewhat different from the traditional eavesdropping in data networks, but the general concept remains the same. Eavesdropping in VoIP requires intercepting the signaling and associated media streams of a conversation. The signaling messages use separate network protocols (i.e., UDP or TCP) and ports from the media itself. Media streams typically are carried over UDP using the RTP (Real Time Protocol) protocol.
Figure 6. Steps to capture VoIP media streams using Ethereal.
The steps to capture and decode voice packets include
- Capture and Decode RTP packets. Capture packets and select Analyze -> RTP-> Show all streams options from the ethereal interface.
- Analyze Session. Select a stream to analyze and reassemble.
- Open a file to save the audio (.au) steam that contains the captured voice.
Some may argue that the eavesdropping attack can be suppressed in IP based networks with the use of Ethernet switches which restrict broadcasting traffic to the entire network, and thus limits who can access the traffic.
This argument can be discarded when ARP spoofing is introduced as a mechanism to launch a man-in-the-middle attack. We will not cover ARP spoofing in this article since it is documented in several publications. The basic concept, however, is that an attacker broadcasts spoofed advertisements of the MAC address and thus forces subsequent IP packets to flow through the attacker’s host . This thereby allows the eavesdropping of communications between two users. The following Figure 7 summarizes the ARP spoofing attack.
Figure 7. ARP Spoofing attack.
Using ARP spoofing, an attacker can capture, analyze and eavesdrop into VoIP communications.
The following Figure 8 demonstrates the use of the Cain tool which provides the ability to perform the man-in-the-middle attack and capture VoIP traffic.
SIP INVITE Flood Attacks
The Session Initiation Protocol (SIP) is a VoIP standard defined in RFC 3261. SIP INVITE messages are used to establish a media session between user and calling agents. In SIP INVITE flood attacks, the attacker sends numerous (often spoofed) INVITE messages to the victim, causing network degradation or a complete DoS condition.
Other VoIP Attacks
Buffer Overflow Attacks
Since H.225 messages are PER encoded, the attacker can misencode the PER encoding lengths and try and cause buffer overflow at the receiving endpoint. The ASN.1 representation of the H.225 protocol lays down some specific bounds on the lengths of the fields, and protocol modules may be susceptible to attacks based on these fields.
DoS Attacks
Attackers can try and send huge messages by specifying out-of-bound and large messages or fields. This leads to excessive memory usage at the endpoints and gateways and can lead to a DoS attack. The attackers can try to use PER encoding coupled with the ASN.1 representation to encode excessive recursive fields and lead to huge processing and memory overhead at the endpoint.
Viruses and malware
VoIP utilization involving softphones and software are vulnerable to worms, viruses and malware, just like any Internet application. Since these softphone applications run on user systems like PCs and PDAs, they are exposed and vulnerable to malicious code attacks in voice applications.
SPIT (Spamming over Internet Telephony)
If you use email regularly, then you must know what spamming is. Put simply, spamming is actually sending emails to people against their will. These emails consist mainly of online sales calls. Spamming in VoIP is not very common yet, but is starting to be, especially with the emergence of VoIP as an industrial tool. Every VoIP account has an associated IP address. It is easy for spammers to send their messages (voicemails) to thousands of IP addresses. Voicemailing as a result will suffer. With spamming, voicemails will be clogged and more space as well as better voicemail management tools will be required. Moreover, spam messages can carry viruses and spyware along with them.
This brings us to another flavor of SPIT, which is phishing over VoIP. Phishing attacks consist of sending a voicemail to a person, masquerading it with information from a party trustworthy to the receiver, like a bank or online paying service, making him think he is safe. The voicemail usually asks for confidential data like passwords or credit card numbers. You can imagine the rest!
Call tampering
Call tampering is an attack which involves tampering a phone call in progress. For example, the attacker can simply spoil the quality of the call by injecting noise packets in the communication stream. He can also withhold the delivery of packets so that the communication becomes spotty and the participants encounter long periods of silence during the call.
Man-in-the-middle attacks
VoIP is particularly vulnerable to man-in-the-middle attacks, in which the attacker intercepts call-signaling SIP message traffic and masquerades as the calling party to the called party, or vice versa. Once the attacker has gained this position, he can hijack calls via a redirection server.