The telecommunications industry spans over 100 years, and Asterisk integrates most— if not all—of the major technologies that it has made use of over the last century. To make the most out of Asterisk, you need not be a professional in all areas, but under- standing the differences between the various codecs and protocols will give you a greater appreciation and understanding of the system as a whole.
This appendix explains Voice over IP and what makes VoIP networks different from the traditional circuit-switched voice networks that were the topic of Appendix A. We will explore the need for VoIP protocols, outlining the history and potential future of each. We’ll also look at security considerations and these protocols’ abilities to work within topologies such as Network Address Translation (NAT). The following VoIP protocols will be discussed (some more briefly than others):
• IAX • SIP • H.323 • MGCP • Skinny/SCCP • UNISTIM
Codecs are the means by which analog voice can be converted to a digital signal and carried across the Internet. Bandwidth at any location is finite, and the number of simultaneous conversations any connection can carry is directly related to the type of codec implemented. We’ll also explore the differences between the following codecs in regard to bandwidth requirements (compression level) and quality:
• G.711 • G.726 • G.729A • GSM • iLBC • Speex • MP3
We will then conclude the appendix with a discussion of how voice traffic can be routed reliably, what causes echo and how to deal with it, and how Asterisk controls the au- thentication of inbound and outbound calls.
The Need for VoIP Protocols[编辑]
The basic premise of VoIP is the packetization* of audio streams for transport over Internet Protocol-based networks. The challenges to accomplishing this relate to the manner in which humans communicate. Not only must the signal arrive in essentially the same form that it was transmitted in, but it needs to do so in less than 150 milli- seconds. If packets are lost or delayed, there will be degradation to the quality of the communications experience, meaning that two people will have difficulty in carrying on a conversation.
The transport protocols that collectively are called “the Internet” were not originally designed with real-time streaming of media in mind. Endpoints were expected to re- solve missing packets by waiting longer for them to arrive, requesting retransmission, or, in some cases, considering the information to be gone for good and simply carrying on without it. In a typical voice conversation, these mechanisms will not serve. Our conversations do not adapt well to the loss of letters or words, nor to any appreciable delay between transmittal and receipt.
The traditional PSTN was designed specifically for the purpose of voice transmission, and it is perfectly suited to the task from a technical standpoint. From a flexibility standpoint, however, its flaws are obvious to even people with a very limited under- standing of the technology. VoIP holds the promise of incorporating voice communi- cations into all of the other protocols we carry on our networks, but due to the special demands of a voice conversation, special skills are needed to design, build, and maintain these networks.
The problem with packet-based voice transmission stems from the fact that the way in which we speak is totally incompatible with the way in which IP transports data. Speaking and listening consist of the relaying of a stream of audio, whereas the Internet protocols are designed to chop everything up, encapsulate the bits of information into thousands of packages, and then deliver each package in whatever way possible to the far end. Clearly, some way of dealing with this is required.
The mechanism for carrying a VoIP connection generally involves a series of signaling transactions between the endpoints (and gateways in between), culminating in two persistent media streams (one for each direction) that carry the actual conversation. There are several protocols in existence to handle this. In this section, we will discuss some that are important to VoIP in general and to Asterisk specifically.
IAX (The “Inter-Asterisk eXchange” Protocol)
If you claim to be one of the folks in the know when it comes to Asterisk, your test will come when you have to pronounce the name of this protocol. It would seem that you should say “eye-ay-ex,” but this hardly rolls off the tongue very well.† Fortunately, the proper pronunciation is in fact “eeks.”‡ IAX is an open protocol, meaning that anyone can download and develop for it.§
In Asterisk, IAX is supported by the chan_iax2.so module.
The IAX protocol was developed by Digium for the purpose of communicating with other Asterisk servers (hence the Inter-Asterisk eXchange protocol). It is very important to note that IAX is not at all limited to Asterisk. The standard is open for anyone to use, and it is supported by many other open source telecom projects, as well as by several hardware vendors. IAX is a transport protocol (much like SIP) that uses a single UDP port (4569) for both the channel signaling and media streams. As discussed later in this appendix, this makes it easier to manage when behind NATed firewalls.
IAX also has the unique ability to trunk multiple sessions into one dataflow, which can result in a tremendous bandwidth advantage when sending a lot of simultaneous chan- nels to a remote box. Trunking allows multiple media streams to be represented with a single datagram header, which lowers the overhead associated with individual channels. This helps to lower latency and reduce the processing power and bandwidth required, allowing the protocol to scale much more easily with a large number of active channels between endpoints. If you have a large quantity of IP calls to pass between two endpoints, you should take a close look at IAX trunking.
Since IAX was optimized for voice, it has received some criticism for not better sup- porting video—but in fact, IAX holds the potential to carry pretty much any media stream desired. Because it is an open protocol, future media types are certain to be incorporated as the community desires them.
IAX includes the ability to authenticate in three ways: plain text, MD5 hashing, and RSA key exchange. This, of course, does nothing to encrypt the media path or headers between endpoints. Many solutions involve using a Virtual Private Network (VPN) appliance or software to encrypt the stream in another layer of technology, which re- quires the endpoints to pre-establish a method of configuring and opening these tun- nels. However, IAX is now also able to encrypt the streams between endpoints with dynamic key exchange at call setup (using the configuration option encryp tion=aes128), allowing the use of automatic key rollover.
IAX and NAT
The IAX2 protocol was deliberately designed to work from behind devices performing NAT. The use of a single UDP port for both signaling and transmission of media also keeps the number of holes required in your firewall to a minimum. These considerations have helped make IAX one of the easiest protocols (if not the easiest) to implement in secure networks.
The Session Initiation Protocol (SIP) has taken the telecommunications industry by storm. SIP has pretty much dethroned the once-mighty H.323 as the VoIP protocol of choice—certainly at the endpoints of the network. The premise of SIP is that each end of a connection is a peer; the protocol negotiates capabilities between them. What makes SIP compelling is that it is a relatively simple protocol, with a syntax similar to that of other familiar protocols such as HTTP and SMTP. SIP is supported in Asterisk with the chan_sip.so module.‖
SIP was originally submitted to the Internet Engineering Task Force (IETF) in February 1996 as “draft-ietf-mmusic-sip-00.” The initial draft looked nothing like the SIP we know today and contained only a single request type: a call setup request. In March 1999, after 11 revisions, SIP RFC 2543 was born.
At first, SIP was all but ignored, as H.323 was considered the protocol of choice for VoIP transport negotiation. However, as the buzz grew, SIP began to gain in popularity, and while there may be a lot of different factors that accelerated its growth, we’d like to think that a large part of its success is due to its freely available specification.
SIP is an application-layer signaling protocol that uses the well-known port 5060 for communications. SIP can be transported with either the UDP or TCP transport-layer protocols. Asterisk does not currently have a TCP implementation for transporting SIP messages, but it is possible that future versions may support it (and patches to the code base are gladly accepted). SIP is used to “establish, modify, and terminate multimedia sessions such as Internet telephony calls.”#
SIP does not transport media (i.e., voice) between endpoints. Instead, the Real-time Transport Protocol (RTP) is used for this purpose. RTP uses high-numbered, unprivi- leged ports in Asterisk (10,000 through 20,000, by default).
A common topology to illustrate SIP and RTP, commonly referred to as the “SIP tra- pezoid,” is shown in Figure B-1. When Alice wants to call Bob, Alice’s phone contacts her proxy server, and the proxy tries to find Bob (often connecting through his proxy). Once the phones have started the call, they communicate directly with each other (if possible), so that the data doesn’t have to tie up the resources of the proxy.
SIP was not the first, and is not the only, VoIP protocol in use today (others include H.323, MGCP, IAX, and so on), but currently it seems to have the most momentum with hardware vendors. The advantages of the SIP protocol lie in its wide acceptance and architectural flexibility (and, we used to say, simplicity!).
SIP has earned its place as the protocol that justified VoIP. All new user and enterprise products are expected to support SIP, and any existing products will now be a tough sell unless a migration path to SIP is offered. SIP is widely expected to deliver far more than VoIP capabilities, including the ability to transmit video, music, and any type of real-time multimedia. While its use as a ubiquitous general-purpose media transport mechanism seems doubtful, SIP is unarguably poised to deliver the majority of new voice applications for the next few years.
SIP uses a challenge/response system to authenticate users. An initial INVITE is sent to the proxy with which the end device wishes to communicate. The proxy then sends back a 407 Proxy Authorization Request message, which contains a random set of characters referred to as a nonce. This nonce is used along with the password to generate an MD5 hash, which is then sent back in the subsequent INVITE. Assuming the MD5 hash matches the one that the proxy generated, the client is then authenticated.
Denial of service (DoS) attacks are probably the most common type of attack on VoIP communications. A DoS attack can occur when a large number of invalid INVITE re- quests are sent to a proxy server in an attempt to overwhelm the system. These attacks are relatively simple to implement, and their effects on the users of the system are immediate. SIP has several methods of minimizing the effects of DoS attacks, but ulti- mately they are impossible to prevent.
SIP implements a scheme to guarantee that a secure, encrypted transport mechanism (namely Transport Layer Security, or TLS) is used to establish communication between the caller and the domain of the callee. Beyond that, the request is sent securely to the end device, based upon the local security policies of the network. Note that the en- cryption of the media (that is, the RTP stream) is beyond the scope of SIP itself and must be dealt with separately.
More information regarding SIP security considerations, including registration hijack- ing, server impersonation, and session teardown, can be found in Section 26 of SIP RFC 3261.
SIP and NAT
Probably the biggest technical hurdle SIP has to conquer is the challenge of carrying out transactions across a NAT layer. Because SIP encapsulates addressing information in its data frames, and NAT happens at a lower network layer, the addressing infor- mation is not automatically modified, and thus the media streams will not have the correct addressing information needed to complete the connection when NAT is in place. In addition to this, the firewalls normally integrated with NAT will not consider the incoming media stream to be part of the SIP transaction, and will block the con- nection. Newer firewalls and session border controllers (SBCs) are SIP-aware, but this is still considered a shortcoming in this protocol, and it causes no end of trouble to network professionals needing to connect SIP endpoints using existing network infrastructure.
This International Telecommunication Union (ITU) protocol was originally designed to provide an IP transport mechanism for videoconferencing. It has become the stand- ard in IP-based video-conferencing equipment, and it briefly enjoyed fame as a VoIP protocol as well. While there is much heated debate over whether SIP or H.323 (or IAX) will come to dominate the VoIP protocol world, in Asterisk, H.323 has largely been deprecated in favor of IAX and SIP. H.323 has not enjoyed much success among users and enterprises, although it might still be the most widely used VoIP protocol among carriers.
The three versions of H.323 supported in Asterisk are handled by the modules chan_h323.so (supplied with Asterisk), chan_oh323.so (available as a free addon), and chan_ooh323.so (supplied in asterisk-addons).
You have probably used H.323 without even knowing it—Microsoft’s NetMeeting client is arguably the most widely deployed H.323 client.
H.323 was developed by the ITU in May 1996 as a means to transmit voice, video, data, and fax communications across an IP-based network while maintaining connectivity with the PSTN. Since that time, H.323 has gone through several versions and annexes (which add functionality to the protocol), allowing it to operate in pure VoIP networks and more widely distributed networks.
The future of H.323 is a subject of debate. If the media is any measure, it doesn’t look good for H.323; it hardly ever gets mentioned (certainly not with the regularity of SIP). H.323 is often regarded as technically superior to SIP, but, that sort of thing is seldom the deciding factor in whether a technology enjoys success. One of the factors that makes H.323 unpopular is its complexity (although many argue that the once-simple SIP is starting to suffer from the same problem).
H.323 still carries by far the majority of worldwide carrier VoIP traffic, but as people become less and less dependent on traditional carriers for their telecom needs, the future of H.323 becomes more difficult to predict with any certainty. While H.323 may not be the protocol of choice for new implementations, we can certainly expect to have to deal with H.323 interoperability issues for some time to come.
H.323 is a relatively secure protocol and does not require many security considerations beyond those that are common to any network communicating with the Internet. Since H.323 uses the RTP protocol for media communications, it does not natively support encrypted media paths. The use of a VPN or other encrypted tunnel between endpoints is the most common way of securely encapsulating communications. Of course, this has the disadvantage of requiring the establishment of these secure tunnels between endpoints, which may not always be convenient (or even possible). As VoIP becomes used more often to communicate with financial institutions such as banks, we’re likely to require extensions to the most commonly used VoIP protocols to natively support strong encryption methods.
H.323 and NAT
The H.323 standard uses the Internet Engineering Task Force (IETF) RTP protocol to transport media between endpoints. Because of this, H.323 has the same issues as SIP when dealing with network topologies involving NAT. The easiest method is to simply forward the appropriate ports through your NAT device to the internal client.
To receive calls, you will always need to forward TCP port 1720 to the client. In addi- tion, you will need to forward the UDP ports for the RTP media and RTCP control streams (see the manual for your device for the port range it requires). Older clients, such as Microsoft NetMeeting, will also require TCP ports forwarded for H.245 tun- neling (again, see your client’s manual for the port number range).
If you have a number of clients behind the NAT device, you will need to use a gate- keeper running in proxy mode. The gatekeeper will require an interface attached to the private IP subnet and the public Internet. Your H.323 client on the private IP subnet will then register to the gatekeeper, which will proxy calls on the clients’ behalf. Note that any external clients that wish to call you will also be required to register with the proxy server.
At this time, Asterisk can’t act as an H.323 gatekeeper. You’ll have to use a separate application, such as the open source OpenH323 Gatekeeper (http://www.gnugk.org), for this purpose.
The Media Gateway Control Protocol (MGCP) also comes to us from the IETF. While MGCP deployment is more widespread than one might think, it is quickly losing ground to protocols such as SIP and IAX. Still, Asterisk loves protocols, so naturally it has rudimentary support for it.
MGCP is defined in RFC 3435.* It was designed to make the end devices (such as phones) as simple as possible, and have all the call logic and processing handled by media gateways and call agents. Unlike SIP, MGCP uses a centralized model. MGCP phones cannot directly call other MGCP phones; they must always go through some type of controller.
Asterisk supports MGCP through the chan_mgcp.so module, and the endpoints are defined in the configuration file mgcp.conf. Since Asterisk provides only basic call agent services, it cannot emulate an MGCP phone (to register to another MGCP controller as a user agent, for example).
If you have some MGCP phones lying around, you will be able to use them with Asterisk. If you are planning to put MGCP phones into production on an Asterisk system, keep in mind that the community has moved on to more popular protocols, and you will therefore need to budget your software support needs accordingly. If pos- sible (for example, with Cisco phones), you should upgrade MGCP phones to SIP.
Finally, let’s take a look at two proprietary protocols that are supported in Asterisk.
The Skinny Client Control Protocol (SCCP) is proprietary to Cisco VoIP equipment. It is the default protocol for endpoints on a Cisco Call Manager PBX.† Skinny is supported in Asterisk, but if you are connecting Cisco phones to Asterisk, it is generally recom- mended that you obtain SIP images for any phones that support this and connect via SIP instead.
Asterisk’s Support for Nortel’s proprietary VoIP protocol, UNISTIM, makes it the first PBX in history to natively support proprietary IP terminals from the two biggest players in VoIP: Nortel and Cisco. UNISTIM support is totally experimental and does not yet work well enough to put it into production, but the fact that somebody has taken the trouble to implement it demonstrates the power of the Asterisk platform.
Codecs are generally understood to be various mathematical models used to digitally encode (and compress) analog audio information. Many of these models take into ac- count the human brain’s ability to form an impression from incomplete information.
We’ve all seen optical illusions; likewise, voice-compression algorithms take advantage of our tendency to interpret what we believe we should hear, rather than what we actually hear.‡ The purpose of the various encoding algorithms is to strike a balance between efficiency and quality.§
Originally, the term codec referred to a COder/DECoder: a device that converts between analog and digital. Now, the term seems to relate more to COmpression/ DECompression.
Before we dig into the individual codecs, take a look at Table B-1—it’s a quick reference that you may want to refer back to.
Table B-1. Codec quick reference
Codec Data bitrate (Kbps) License required? G.711 64 Kbps No G.726 16, 24, 32, or 40 Kbps No G.729A 8 Kbps Yes (no for pass-through) GSM 13 Kbps No iLBC 13.3 Kbps (30-ms frames) or 15.2 Kbps (20-ms frames) No Speex Variable (between 2.15 and 22.4 Kbps) No G.722 64 Kbps No
￼￼￼G.711 is the fundamental codec of the PSTN. In fact, if someone refers to PCM (dis- cussed in Appendix A) with respect to a telephone network, you are allowed to think of G.711. Two companding methods are used: μlaw in North America and alaw in the rest of the world. Either one delivers an 8-bit word transmitted 8,000 times per second. If you do the math, you will see that this requires 64,000 bits to be transmitted per second.
Many people will tell you that G.711 is an uncompressed codec. This is not exactly true, as companding is considered a form of compression. What is true is that G.711 is the base codec from which all of the others are derived.
G.711 imposes minimal (almost zero) load on the CPU.
This codec has been around for some time (it used to be G.721, which is now obsolete), and it is one of the original compressed codecs. It is also known as Adaptive Differential Pulse-Code Modulation (ADPCM), and it can run at several bitrates. The most com- mon rates are 16 Kbps, 24 Kbps, and 32 Kbps. As of this writing, Asterisk supports only the ADPCM-32 rate, which is far and away the most popular rate for this codec.
G.726 offers nearly identical quality to G.711, but it uses only half the bandwidth. This is possible because rather than sending the result of the quantization measurement, it sends only enough information to describe the difference between the current sample and the previous one. G.726 fell from favor in the 1990s due to its inability to carry modem and fax signals, but because of its bandwidth/CPU performance ratio it is now making a comeback. G.726 is especially attractive because it does not require a lot of computational work from the system.
Considering how little bandwidth it uses, G.729A delivers impressive sound quality. It does this through the use of Conjugate-Structure Algebraic-Code-Excited Linear Pre- diction (CS-ACELP).‖ Because of patents, you can’t use G.729A without paying a li- censing fee; however, it is extremely popular and is well supported on many different phones and systems.
To achieve its impressive compression ratio, this codec requires an equally impressive amount of effort from the CPU. In an Asterisk system, the use of heavily compressed codecs will quickly bog down the CPU.
G.729A uses 8 Kbps of bandwidth.
The Global System for Mobile Communications (GSM) codec is the darling of Asterisk. This codec does not come encumbered with a licensing requirement the way that G.729A does, and it offers outstanding performance with respect to the demand it places on the CPU. The sound quality is generally considered to be of a lesser grade than that produced by G.729A, but much of this comes down to personal opinion; be sure to try it out. GSM operates at 13 Kbps.
The Internet Low Bitrate Codec (iLBC) provides an attractive mix of low bandwidth usage and quality, and it is especially well suited to sustaining reasonable quality on lossy network links.
Naturally, Asterisk supports it (and support elsewhere is growing), but it is not as popular as the ITU codecs, and thus may not be compatible with common IP telephones and commercial VoIP systems. IETF RFCs 3951 and 3952 have been published in sup- port of iLBC, and iLBC is on the IETF standards track.
Because iLBC uses complex algorithms to achieve its high levels of compression, it has a fairly high CPU cost in Asterisk.
While you are allowed to use iLBC without paying royalty fees, the holder of the iLBC patent, Global IP Sound (GIPS), wants to know whenever you use it in a commercial application. The way you do that is by downloading and printing a copy of the iLBC license, signing it, and returning it to GIPS. If you want to read about iLBC and its license, you can do so at http://www.ilbcfreeware.org.
iLBC operates at 13.3 Kbps (30-ms frames) and 15.2 Kbps (20-ms frames).
Speex is a variable bitrate (VBR) codec, which means that it is able to dynamically modify its bitrate to respond to changing network conditions. It is offered in both narrowband and wideband versions, depending on whether you want telephone quality or better.
Speex is a totally free codec, licensed under the Xiph.org variant of the BSD license. An Internet draft for Speex is available, and more information about Speex can be found at its home page (http://www.speex.org).
Speex can operate at anywhere from 2.15 to 22.4 Kbps, due to its variable bitrate.
G.722 is an ITU-T standard codec that was approved in 1988. The G.722 codec pro- duces a much higher-quality voice in the same space as G.711 (64 Kbps) and is starting to become popular among VoIP device manufacturers. The patents for G.722 have expired, so it is freely available. If you have access to devices that support G.722, you’ll be impressed by the quality improvement.
Sure thing, MP3 is a codec. Specifically, it’s the Moving Picture Experts Group Audio Layer 3 Encoding Standard.# With a name like that, it’s no wonder we call it MP3! In Asterisk, the MP3 codec is typically used for music on hold (MoH). MP3 is not a tel- ephony codec, as it is optimized for music, not voice; nevertheless, it’s very popular with VoIP telephony systems as a method of delivering MoH.
Be aware that music cannot usually be broadcast without a license. Many people assume that there is no legal problem with connecting a radio station or CD as a music on hold source, but this is very rarely true.
Quality of Service
Quality of Service, or QoS as it’s more popularly termed, refers to the challenge of delivering a time-sensitive stream of data across a network that was designed to deliver data in an ad hoc, best-effort sort of way. Although there is no hard rule, it is generally accepted that if you can deliver the sound produced by the speaker to the listener’s ear within 150 milliseconds, a normal flow of conversation is possible. When delay exceeds 300 milliseconds, it becomes difficult to avoid interrupting each other. Beyond 500 milliseconds, normal conversation becomes increasingly awkward and frustrating.
In addition to getting it there on time, it is also essential to ensure that the transmitted information arrives intact. Too many lost packets will prevent the far end from com- pletely reproducing the sampled audio, and gaps in the data will be heard as static or, in severe cases, entire missed words or sentences. Even packet loss of 5 percent can severely impede a VoIP network.
TCP, UDP, and SCTP
If you’re going to send data on an IP-based network, it will be transported using one of the three transport protocols discussed here.
Transmission Control Protocol
The Transmission Control Protocol (TCP) is almost never used for VoIP, for while it does have mechanisms in place to ensure delivery, it is not inherently in any hurry to do so. Unless there is an extremely low-latency interconnection between the two end- points, TCP will tend to cause more problems than it solves.
The purpose of TCP is to guarantee the delivery of packets. In order to do this, several mechanisms are implemented, such as packet numbering (for reconstructing blocks of data), delivery acknowledgment, and re-requesting of lost packets. In the world of VoIP, getting the packets to the endpoint quickly is paramount—but 20 years of cellular telephony has trained us to tolerate a few lost packets.*
TCP’s high processing overhead, state management, and acknowledgment of arrival work well for transmitting large amounts of data, but they simply aren’t efficient enough for real-time media communications.
User Datagram Protocol
Unlike TCP, the User Datagram Protocol (UDP) does not offer any sort of delivery guarantee. Packets are placed on the wire as quickly as possible and released into the world to find their way to their final destinations, with no word back as to whether they got there or not. Since UDP itself does not offer any kind of guarantee that the data will arrive,† it achieves its efficiency by spending very little effort on what it is transporting.
TCP is a more “socially responsible” protocol because the bandwidth is more evenly distributed to clients connecting to a server. As the per- centage of UDP traffic increases, it is possible that a network could be- come overwhelmed.
Stream Control Transmission Protocol
Approved by the IETF as a proposed standard in RFC 2960, SCTP is a relatively new transport protocol. From the ground up, it was designed to address the shortcomings of both TCP and UDP, especially as related to the types of services that used to be delivered over circuit-switched telephony networks.
Some of the goals of SCTP were:
• Better congestion-avoidance techniques (specifically, avoiding denial of service attacks) • Strict sequencing of data delivery • Lower latency for improved real-time transmissions
By addressing the major shortcomings of TCP and UDP, SCTP’s developers hoped to create a robust protocol for the transmission of SS7 and other types of PSTN signaling over an IP-based network.
Differentiated service, or DiffServ, is not so much a QoS mechanism as a method by which traffic can be flagged and given specific treatment. Obviously, DiffServ can help to provide QoS by allowing certain types of packets to take precedence over others. While this will certainly increase the chance of a VoIP packet passing quickly through each link, it does not guarantee anything.
The ultimate guarantee of QoS is provided by the PSTN. For each conversation, a 64- Kbps channel is completely dedicated to the call; the bandwidth is guaranteed. Simi- larly, protocols that offer guaranteed service can ensure that a required amount of bandwidth is dedicated to the connection being served. As with any packetized net- working technology, these mechanisms generally operate best when traffic is below maximum levels. When a connection approaches its limits, it is next to impossible to eliminate degradation.
Multiprotocol Label Switching (MPLS) is a method for engineering network traffic patterns independent of layer-3 routing tables. The protocol works by assigning short labels (MPLS frames) to network packets, which routers then use to forward the packets to the MPLS egress router, and ultimately to their final destinations. Traditionally, routers make an independent forwarding decision based on an IP table lookup at each hop in the network. In an MPLS network, this lookup is performed only once, when the packet enters the MPLS cloud at the ingress router. The packet is then assigned to a stream, referred to as a Label Switched Path (LSP), and identified by a label. The label is used as a lookup index in the MPLS forwarding table, and the packet traverses the LSP independent of layer-3 routing decisions. This allows the administrators of large networks to fine-tune routing decisions and make the best use of network resources. Additionally, information can be associated with a label to prioritize packet forwarding.
MPLS contains no method to dynamically establish LSPs, but you can use the Reser- vation Protocol (RSVP) with MPLS. RSVP is a signaling protocol used to simplify the establishment of LSPs and to report problems to the MPLS ingress router. The advant- age of using RSVP in conjunction with MPLS is the reduction in administrative over- head. If you don’t use RSVP with MPLS, you’ll have to go to every single router and configure the labels and each path manually. Using RSVP makes the network more dynamic by distributing control of labels to the routers. This enables the network to become more responsive to changing conditions, because it can be set up to change the paths based on certain conditions, such as a certain path going down (perhaps due to a faulty router). The configuration within the router will then be able to use RSVP to distribute new labels to the routers in the MPLS network, with no (or minimal) human intervention.
The simplest, least expensive approach to QoS is not to provide it at all—the “best effort” method. While this might sound like a bad idea, it can in fact work very well. Any VoIP call that traverses the public Internet is almost certain to be best-effort, as QoS mechanisms are not yet common in this environment.
You may not realize it, but echo has been a problem in the PSTN for as long as there have been telephones. You probably haven’t often experienced it, because the telecom industry has spent large sums of money designing expensive echo-cancellation devices. Also, when the endpoints are physically close—e.g., when you phone your neighbor down the street—the delay is so minimal that anything you transmit will be returned back so quickly that it will be indistinguishable from the sidetone‡ normally occurring in your telephone. So, the fact of the matter is that there is echo on your local calls much of the time, but you cannot perceive it with a regular telephone because it happens almost instantaneously. It may help you to understand this if you consider that when you stand in a room and speak, everything you say echos back to you off of the walls and ceiling (and possibly the floor, if it’s not carpeted), but this does not cause any problems because it happens so fast you do not perceive a delay.
The reason that VoIP telephone systems such as Asterisk can experience echo is that the addition of a VoIP telephone introduces a slight delay. It takes a few milliseconds for the packets to travel from your phone to the server (and vice versa). Suddenly there is an appreciable delay, which allows you to perceive the echo that was always there, but never really noticeable.
Why Echo Occurs
Before we discuss measures to deal with echo, let’s first take a look at why echo occurs in the analog world.
If you hear echo, it’s not your phone that’s causing the problem; it’s the far end of the circuit. Conversely, echo heard on the far end is being generated at your end. Echo can be caused by the fact that an analog local loop circuit has to transmit and receive on the same pair of wires. If this circuit is not electrically balanced, or if a low-quality telephone is connected to the end of the circuit, signals it receives can be reflected back, becoming part of the return transmission. When this reflected circuit gets back to you, you will hear the words you spoke just moments before. Humans will perceive an echo beyond a certain amount of delay (possibly as low as 20 milliseconds for some people). This echo will become annoying as the delay increases.
In a cheap telephone, it is possible for echo to be generated in the body of the handset. This is why some cheap IP phones can cause echo even when the entire end-to-end connection does not contain an analog circuit.§ In the VoIP world, echo is usually introduced either by an analog circuit somewhere in the connection, or by a cheap endpoint reflecting back some of the signal (e.g., feedback through a hands-free or poorly designed handset or headset). The greater the latency on the network, the more annoying this echo can be.
Managing Echo on DAHDI Channels
You can enable and disable echo cancellation for DAHDI interfaces in the chan_dahdi.conf file. The default configuration enables echo cancellation with echocan cel=yes. echocancelwhenbridged=yes will enable echo cancellation for TDM bridged calls. While bridged calls should not require echo cancellation, this may improve call quality.
When echo cancellation is enabled, the echo canceller learns of echo on the line by listening for it throughout the duration of the call. Consequently, echo may be heard at the beginning of a call and lessen after a period of time. To avoid this situation, you can employ a method called echo training, which will mute the line briefly at the be- ginning of a call, and send a tone from which the amount of echo on the line can be determined. This allows Asterisk to deal with the echo more quickly. Echo training can be enabled with echotraining=yes.
Hardware Echo Cancellation
The most effective way to handle echo cancellation is not in software. If you are plan- ning on deploying a good-quality system, spend the extra money and purchase cards for the system that have onboard hardware echo cancellation. These cards are a bit more expensive, but they quickly pay for themselves in terms of reduced load on the CPU, as well as reduced load on you due to fewer user complaints.
Asterisk and VoIP[编辑]
It should come as no surprise that Asterisk loves to talk VoIP. But in order to do so, Asterisk needs to know which function it is to perform: that of client, server, or both. One of the most complex and often confusing concepts in Asterisk is the configuration of inbound and outbound authentication. Users and Peers and Friends—Oh My!
Connections that authenticate to us, or that we authenticate, are defined in the iax.conf and sip.conf files as users and peers. Connections that do both may be defined as friends. When determining which way the authentication is occurring, it is always important to view the direction of the channels from Asterisk’s viewpoint, as connec- tions are accepted and created by the Asterisk server.
A connection defined as a user is any system/user/endpoint that we allow to connect to us. Keep in mind that a user definition does not provide a method with which to call that user; the user type is used simply to create a channel for incoming calls.‖ A user definition will require a context name to be defined to indicate where the incoming authenticated call will enter the dialplan (in extensions.conf).
A connection defined as a peer type is an outgoing connection. Think of it this way: users place calls to us, while we place calls to our peers. Since peers do not place calls to us, a peer definition does not typically require the configuration of a context name. However, there is one exception: if calls that originate from your system are returned to your system in a loopback, the incoming calls (which originate from a SIP proxy, not a user agent) will be matched on the peer definition. The default context should handle these incoming calls appropriately, although it’s preferable for contexts to be defined for them on a per-peer basis.
In order to know where to send a call to a host, we must know its location in relation to the Internet (that is, its IP address). The location of a peer may be defined either statically or dynamically. A dynamic peer is configured with host=dynamic under the peer definition heading. Because the IP address of a dynamic peer may change constantly, it must register with the Asterisk box so calls can successfully be routed to it. If the remote end is another Asterisk box, the use of a register statement is required, as discussed in the next section.
Defining a type as a friend is a shortcut for defining it as both a user and a peer. However, connections that are both users and peers aren’t always defined this way, because defining each direction of call creation individually (using both a user and a peer definition) allows more granularity and control over the individual connections.
Figure B-2 shows the flow of authentication control in relation to Asterisk.
A register statement is a way of telling a remote peer where your Asterisk box is in relation to the Internet. Asterisk uses register statements to authenticate to remote providers when you are employing a dynamic IP address, or when the provider does not have your IP address on record. There are situations when a register statement is not required, but to demonstrate when a register statement is required, let’s look at an example.
Say you have a remote peer that is providing DID services to you. When someone calls the number +1-800-555-1212, the call goes over the physical PSTN network to your service provider and into its Asterisk server, possibly over its T1 connection. This call is then routed to your Asterisk server via the Internet.
Your service provider will have a definition in either its sip.conf or iax.conf configuration file (depending on whether you are connecting with the SIP or IAX protocol, respec- tively) for your Asterisk server. If you only receive calls from this provider, you will define it as a user (if it is another Asterisk system, you might be defined in its system as a peer).
Now let’s say that your box is on your home Internet connection, with a dynamic IP address. Your service provider has a static IP address (or perhaps a fully qualified do- main name), which you place in your configuration file. Since you have a dynamic address, your service provider specifies host=dynamic in its configuration file. In order to know where to route your +1-800-555-1212 call, your service provider needs to know where you are located in relation to the Internet. This is where the register statement comes into use.
The register statement is a way of authenticating and telling your peer where you are. In the [general] section of your configuration file, you place a statement similar to this:
register => username:secret@my_remote_peer
You can verify a successful registration with the use of the iax2 show registry and sip show registry commands at the Asterisk console.
We can barely scratch the surface of the complex matter of VoIP security in this ap- pendix; therefore, before we dig in, we want to steer you in the direction of the VoIP Security Alliance (http://www.voipsa.org). This fantastic resource contains an excellent mailing list, white papers, howtos, and a general compendium of all matters relating to VoIP security. Just as email has been abused by the selfish and criminal, so too will voice. The fine folks at VoIPSA are doing what they can to ensure that we address these challenges now, before they become an epidemic. In the realm of books on the subject, we recommend the most excellent Hacking Exposed VoIP by David Endler and Mark Collier (McGraw-Hill Osborne Media). If you are responsible for deploying any VoIP system, you need to be aware of this stuff.
Spam over Internet Telephony (SPIT)
We don’t want to think about this, but we know it’s coming. The simple fact is that there are people in this world who lack certain social skills, and that coupled with a kind of mindless greed, means that these folks think nothing of flooding the Internet with massive volumes of email. These same types of characters will think little of doing the same with voice. We already know what it’s like to get inundated with telemarketing calls; try to imagine what might happen when those telemarketers realize they can send voice spam at almost no cost. Regulation has not stopped email spam, and it will prob- ably not stop voice spam, so it will be up to us to prevent it.
Encrypting Audio with Secure RTP
If you can sniff the packets coming out of an Asterisk system, you can extract the audio from the RTP streams. This data can be fed offline to a speech processing system, which can listen for keywords such as “credit card number” or “PIN” and present the data it gathers to someone who has an interest in it. The stream can also be evaluated to see if there are DTMF tones embedded in it, which is dangerous because many services ask for passwords and credit card information to be input via the dialpad. In business, strategic information could also be gleaned from captured audio. Using Secure RTP can combat this problem by encrypting the RTP streams. More in- formation about SRTP is available in “Encrypting SIP calls” on page 150.
In the traditional telephone network, it is very difficult to successfully adopt someone else’s identity. Your activities can (and will) be traced back to you, and the authorities will quickly put an end to the fun. In the world of IP, it is much easier to remain anonymous. As such, it is no stretch to imagine that there are hordes of enterprising criminals out there who will be only too happy to make calls to your credit card com- pany or bank, pretending to be you. If a trusted mechanism is not discovered to combat spoofing, we will quickly learn that we cannot trust VoIP calls.
What Can Be Done?
The first thing to keep in mind when considering security on a VoIP system is that VoIP is based on network protocols, and needs be evaluated from that perspective. This is not to say that traditional telecom security should be ignored, but we need to pay attention to the underlying network.
Basic network security
One of the most effective things that can be done is to secure access to the voice network. The use of firewalls and VLANs are examples of how this can be achieved. By default, the voice network should be accessible only to those things that have a need. For ex- ample, if you do not have any softphones in use, do not allow client PCs access to the voice network.
Segregating voice and data traffic. Unless there is a need to have voice and data on the same network, there may be some value in keeping them separate (this can have other benefits as well, such as simplifying QoS configurations). It is not unheard of to build the in- ternal voice network on a totally separate LAN, using existing CAT3 cabling and ter- minating on inexpensive network switches. This configuration can even be less expensive.
DMZ. PlacingyourVoIPsysteminademilitarizedzone(DMZ)canprovideanadditional layer of protection for your LAN, while still allowing connectivity for relevant appli- cations. Should your VoIP system be compromised, it will be much more difficult to use it to launch an attack on the rest of your network, since it is not trusted. Regardless of whether you deploy within a DMZ, any abnormal traffic coming out of the system should be considered suspect.
Server hardening. Hardening your Asterisk server is critical. Not only are there perform- ance benefits to doing this (running nonessential processes can eat up valuable CPU and RAM resources), but the elimination of anything not required will reduce the chance that an exploited vulnerability in the operating system can be used to gain access and launch an attack on other parts of your network.
Running Asterisk as non-root is an essential part of system hardening. See Chapter 3 for more information.
Asterisk 1.8 includes the ability to use both SIP TLS for the encryption of signaling and SRTP for the encryption of the media between endpoints. More information about encrypting SIP calls can be found in “Encrypting SIP calls” on page 150. Asterisk has also supported encryption between endpoints using IAX2 since version 1.4). Informa- tion about enabling encryption across IAX2 trunks can be found in “IAX encryp- tion” on page 154. Physical security
Physical security should not be ignored. All terminating equipment (such as switches, routers, and the PBX itself) should be secured in an environment that can only be accessed by authorized persons. At the user end (such as under desks), it can be more difficult to deliver physical security, but if the network responds only to devices that it is familiar with (e.g., restricting DHCP to devices whose MAC addresses are known), the risk of unauthorized intrusions can be mitigated somewhat.
Over the last couple of years the telecom industry has embraced VoIP, which sets Asterisk up to do quite well. While Asterisk has been doing VoIP for years (well over a decade now), the integration of VoIP and traditional telephony into a single, powerful platform has made Asterisk a major player in the telecommunications industry.