`
`BUR GOODE, SENIOR MEMBER, IEEE
`
`Invited Paper
`
`During the recent Internet stock bubble, articles in the trade
`press frequently said that, in the near future, telephone traffic
`would be just another application running over the Internet. Such
`statements gloss over many engineering details that preclude voice
`from being just another Internet application. This paper deals with
`the technical aspects of implementing voice over Internet protocol
`(VoIP), without speculating on the timetable for convergence.
`First, the paper discusses the factors involved in making a high-
`quality VoIP call and the engineering tradeoffs that must be made
`between delay and the efficient use of bandwidth. After a discussion
`of codec selection and the delay budget, there is a discussion of
`various techniques to achieve network quality of service.
`Since call setup is very important, the paper next gives an
`overview of several VoIP call signaling protocols, including H.323,
`SIP, MGCP, and Megaco/H.248. There is a section on telephony
`routing over IP (TRIP). Finally, the paper explains some VoIP
`issues with network address translation and firewalls.
`
`Keywords—H.323, Internet telephony, MGCP, SIP, telephony
`routing over IP (TRIP), voice over IP (VoIP), voice quality.
`
`NOMENCLATURE
`
`ACD
`ALG
`ATM
`
`BGP-4
`
`BRI
`
`Codec
`CR-LDP
`
`DiffServ
`DHCP
`DSL
`DTMF
`EF
`FTP
`FXO
`
`Automatic call distributor.
`Application level gateway.
`Asynchronous
`transfer mode, a cell-
`switched communications technology.
`Border gateway protocol 4, an interdomain
`routing protocol.
`Basic rate interface (ATM interface, usu-
`ally 144 kb/s).
`Coder/decoder.
`Constrained route label distribution pro-
`tocol.
`Differentiated services.
`Dynamic host configuration protocol.
`Digital subscriber line.
`Dual tone multiple frequency.
`Expedited forwarding.
`File transfer protocol.
`Foreign Exchange Office.
`
`Manuscript received March 20, 2002; revised May 14, 2002.
`The author is with AT&T Labs, Weston, CT 06883 USA (e-mail:
`bgoode@att.com).
`Digital Object Identifier 10.1109/JPROC.2002.802005.
`
`H.323
`
`H.225
`
`H.235
`
`H.245
`
`An ITU-T standard protocol suite for
`real-time communications over a packet
`network.
`An ITU-T call signaling protocol (part of
`the H.323 suite).
`An ITU-T security protocol (part of the
`H.323 suite).
`An ITU-T capability exchange protocol
`(part of the H.323 suite).
`Hypertext transfer protocol.
`Internet assigned numbers authority.
`Internet engineering task force.
`Integrated services Internet.
`Internet telephony administrative domain.
`Internet telephony service provider.
`International Telecommunications Union.
`Internet protocol.
`system-to-intermediate
`Intermediate
`system, an interior routing protocol.
`Local area network.
`LAN
`Label distribution protocol.
`LDP
`Location server.
`LS
`Label switched path.
`LSP
`Label switching router.
`LSR
`Megaco/H.248 An advanced media gateway control pro-
`tocol standardized jointly by the IETF and
`the ITU-T.
`Media gateway.
`Media gateway control protocol.
`Mean opinion score.
`Multiprotocol label switching.
`MPLS with traffic engineering.
`Network address translation.
`Open shortest path first, an interior routing
`protocol.
`Private branch exchange, usually used
`on business premises to switch telephone
`calls.
`Per hop behavior.
`Primary rate interface (ATM interface, usu-
`ally 1.544 kb/s or 2.048 Mb/s).
`
`HTTP
`IANA
`IETF
`IntServ
`ITAD
`ITSP
`ITU
`IP
`IS-IS
`
`MG
`MGCP
`MOS
`MPLS
`MPLS-TE
`NAT
`OSPF
`
`PBX
`
`PHB
`PRI
`
`PROCEEDINGS OF THE IEEE, VOL. 90, NO. 9, SEPTEMBER 2002
`
`1495
`
`0018-9219/02$17.00 © 2002 IEEE
`
`GTL 1012
`IPR of U.S. Patent No. 8,340,260
`
`
`
`Fig. 1. Business use of VoIP.
`
`PSTN
`RAS
`
`RFC
`
`RSVP
`RSVP-TE
`RTP
`RTCP
`RTSP
`QoS
`SDP
`SG
`SIP
`SS7
`SCTP
`SOHO
`TCP
`TLS
`TDM
`TRIP
`URI
`URL
`UDP
`VAD
`VoIP
`
`Public switched telephone network.
`Registration, admission and status. RAS
`channels are used in H.323 gatekeeper
`communications.
`Request for comment, an approved IETF
`document.
`ReSerVation setup protocol.
`RSVP with traffic engineering extensions.
`Real-time transport protocol.
`Real-time control protocol.
`Real-time streaming protocol.
`Quality of service.
`Session description protocol.
`Signaling gateway.
`Session initiation protocol.
`Signaling system 7.
`Stream control transmission protocol.
`Small office/ home office.
`Transmission control protocol.
`Transport layer security.
`Time-division multiplexing.
`Telephony routing over IP.
`Uniform resource identifier.
`Uniform resource locator.
`User datagram protocol.
`Voice activity detection.
`Voice over Internet protocol.
`
`I. INTRODUCTION
`
`There is a plethora of published papers describing var-
`ious ways in which voice and data communications networks
`
`may “converge” into a single global communications net-
`work. This paper deals with the technical aspects of imple-
`menting VoIP, without speculating on the timetable for con-
`vergence. A large number of factors are involved in making
`a high-quality VoIP call. These factors include the speech
`codec, packetization, packet loss, delay, delay variation, and
`the network architecture to provide QoS. Other factors in-
`volved in making a successful VoIP call include the call setup
`signaling protocol, call admission control, security concerns,
`and the ability to traverse NAT and firewall.
`Although VoIP involves the transmission of digitized voice
`in packets, the telephone itself may be analog or digital. The
`voice may be digitized and encoded either before or concur-
`rently with packetization. Fig. 1 shows a business in which a
`PBX is connected to VoIP gateway as well as to the local tele-
`phone company central office. The VoIP gateway allows tele-
`phone calls to be completed through the IP network. Local
`calls can still be completed through the telephone company
`as in the past. The business may use the IP network to make
`all calls between its VoIP gateway connected sites or it may
`choose to split the traffic between the IP network and the
`PSTN based on a least-cost routing algorithms configured in
`the PBX. VoIP calls are not restricted to telephones served di-
`rectly by the IP network. We refer to VoIP calls to telephones
`served by the PSTN as “off-net” calls. Off-net calls may be
`routed over the IP network to a VoIP/PSTN gateway near the
`destination telephone.
`An alternative VoIP implementation uses IP phones and
`does not rely on a standard PBX. Fig. 2 is a simplified
`diagram of an IP telephone system connected to a wide area
`IP network. IP phones are connected to a LAN. Voice calls
`can be made locally over the LAN. The IP phones include
`
`1496
`
`PROCEEDINGS OF THE IEEE, VOL. 90, NO. 9, SEPTEMBER 2002
`
`
`
`Fig. 2. VoIP from end to end.
`
`Table 1
`Characteristics of Several Voice Codecs
`
`codecs that digitize and encode (as well as decode) the
`speech. The IP phones also packetize and depacketize the
`encoded speech. Calls between different sites can be made
`over the wide area IP network. Proxy servers perform IP
`phone registration and coordinate call signaling, especially
`between sites. Connections to the PSTN can be made
`through VoIP gateways.
`
`II. VOICE QUALITY
`
`Many factors determine voice quality, including the choice
`of codec, echo control, packet loss, delay, delay variation
`(jitter), and the design of the network. Packet loss causes
`voice clipping and skips. Some codec algorithms can correct
`for some lost voice packets. Typically, only a single packet
`can be lost during a short period for the codec correction al-
`gorithms to be effective. If the end-to-end delay becomes too
`
`long, the conversation begins to sound like two parties talking
`on a Citizens Band radio. A buffer in the receiving device
`always compensates for jitter (delay variation). If the delay
`variation exceeds the size of the jitter buffer, there will be
`buffer overruns at the receiving end, with the same effect as
`packet loss anywhere else in the transmission path.
`For many years, the PSTN operated strictly with the ITU
`standard G.711. However, in a packet communications net-
`work, as well as in wireless mobile networks, other codecs
`will also be used. Telephones or gateways involved in setting
`up a call will be able to negotiate which codec to use from
`among a small working set of codecs that they support.
`Codecs: There are many codecs available for digitizing
`speech. Table 1 gives some of the characteristics of a few
`standard codecs.1
`
`1Note that the G.xxx codecs are defined by the ITU. IS-xxx codecs are
`defined by the TIA.
`
`GOODE: VOICE OVER INTERNET PROTOCOL (VoIP)
`
`1497
`
`
`
`Fig. 3. Effect of codec concatenation on an MOS.
`
`The quality of a voice call through a codec is often
`measured by subjective testing under controlled conditions
`using a large number of listeners to determine an MOS.
`Several characteristics can be measured by varying the test
`conditions. Important characteristics include the effect of
`environmental noise, the effect of channel degradation (such
`as packet loss), and the effect of tandem encoding/decoding
`when interworking with other wireless and terrestrial
`transport networks. The latter characteristic is especially
`important since VoIP networks will have to interwork with
`switched circuit networks and wireless networks using
`different codecs for many years. The general order of the
`fixed-rate codecs listed in the table, from best to worst
`performance in tandem, is G.711, G.726, G.729e, G.728,
`G.729, G.723.1. Quantitative results are given in [1]. Since
`voice quality suffers when placing low-bit-rate codecs in
`tandem in the transmission path, the network design should
`strive to avoid tandem codecs whenever and wherever
`possible.
`Concatenation and Transcoding: The best packet
`network design codes the speech once near the speaker
`and decodes it once near the listener. Concatenation of
`low-bit-rate speech codecs, as well as the transcoding of
`speech in the middle of the transmission path, degrades
`speech quality. Fig. 3 shows the MOSs of several codecs
`with and without concatenation. (These results are from [1].
`An MOS of 5 is excellent, 4 is good, 3 is fair, 2 is poor,
`and 1 is very bad. Note that G.729
`2 means that speech
`coded with G.729 was decoded and then recoded with G.729
`before reaching the final decoder. G.729
`3 means that
`three G.729 codecs were concatenated in the speech path
`between the speaker and listener.) Fig. 4 shows the MOSs
`
`resulting from the interworking of different codecs, possibly
`in a transcoding situation.
`
`III. TRANSPORT
`
`Typical Internet applications use TCP/IP, whereas VoIP
`uses RTP/UDP/IP. Although IP is a connectionless best
`effort network communications protocol, TCP is a reliable
`transport protocol that uses acknowledgments and retrans-
`mission to ensure packet receipt. Used together, TCP/IP is a
`reliable connection-oriented network communications pro-
`tocol suite. TCP has a rate adjustment feature that increases
`the transmission rate when the network is uncongested, but
`quickly reduces the transmission rate when the originating
`host does not
`receive positive acknowledgments from
`the destination host. TCP/IP is not suitable for real-time
`communications, such as speech transmission, because
`the acknowledgment/retransmission feature would lead to
`excessive delays. UDP provides unreliable connectionless
`delivery service using IP to transport messages between
`end points in an internet. RTP, used in conjunction with
`UDP, provides end-to-end network transport functions for
`applications transmitting real-time data, such as audio and
`video, over unicast and multicast network services.[2] RTP
`does not reserve resources and does not guarantee quality of
`service. A companion protocol RTCP does allow monitoring
`of a link, but most VoIP applications offer a continuous
`stream of RTP/UDP/IP packets without regard to packet loss
`or delay in reaching the receiver.
`Although transmission may be inexpensive on major
`routes, in some parts of the world as well as in many private
`networks, transmission facilities are expensive enough to
`
`1498
`
`PROCEEDINGS OF THE IEEE, VOL. 90, NO. 9, SEPTEMBER 2002
`
`
`
`Fig. 4. Effects of transcoding.
`
`merit an effort to use bandwidth efficiently. This effort
`starts with the use of speech compression codecs. Use of
`low bandwidth leads to a long packetization delay and
`the most complex codecs. An engineering tradeoff must
`be made to achieve an acceptable packetization delay, an
`acceptable level of codec complexity, and an acceptable call
`transmission capacity requirement. Another technique for
`increasing bandwidth efficiency is voice activity detection
`and silence suppression. Voice quality can be maintained
`while using silence suppression if the receiving codec in-
`serts a carefully designed comfort noise during each silence
`period. For example, Annex B of ITU-T Recommendation
`G.729 defines a robust voice activity detector that measures
`the changes over time of the background noise and sends,
`at a low rate, enough information to the receiver to generate
`comfort noise that has the perceptual characteristics of the
`background noise at the sending telephone [3].
`Coding and packetization result in delays greater than
`users typically experience in terrestrial switched circuit
`networks. As we have seen, standard speech codecs are
`available for output coding rates in the approximate range
`of 64 to 5 kb/s. Generally, the lower the output rate, the
`more complex the codec. Packet design involves a tradeoff
`between payload efficiency (payload/total packet size) and
`packetization delay (the time required to fill the packet).
`For IPv4, the RTP/UDP/IP header is 40 bytes. A payload
`of 40 bytes would mean 50% payload efficiency. At 64
`kb/s, it only takes 5 ms to accumulate 40 bytes, but at 8
`
`kb/s it takes 40 ms to accumulate 40 bytes. A packetization
`delay of 40 ms is significant, and many VoIP systems use
`20-ms packets despite the low payload efficiency when
`using low-bit-rate codecs. For continuous speech, the call
`transmission capacity requirement
`(in kb/s) is related
`to the header size
`(in bits), the codec output rate
`(in
`kb/s) and the payload sample size
`(in milliseconds) as
`
`and
`
`assuming
`
`versus
`
`Fig. 5 shows a plot of
`b.
`There are several header compression algorithms that
`will
`improve payload efficiency [4]–[6]. The 40-byte
`RTP/UDP/IP header can be compressed to 2–7 bytes. A typ-
`ical compressed header is four bytes, including a two-byte
`checksum. In an IP network, header compression must be
`done on a link-by-link basis, because the header must be
`restored before a router can choose an outgoing interface.
`Therefore, this technique is most suitable for low-speed
`access links. Fig. 6 shows a plot of
`versus
`and
`assuming
`b.
`The lowest BW requirements lead to a long packetization
`delay and the most complex codecs. An engineering tradeoff
`must be made to achieve an acceptable packetization delay,
`an acceptable codec complexity, and an acceptable call band-
`width requirement. The following sections discuss quality
`and bandwidth efficiency in more detail.
`
`GOODE: VOICE OVER INTERNET PROTOCOL (VoIP)
`
`1499
`
`
`
`Fig. 5. The varying bands, from top to bottom, represent the following VoIP bandwidth
`requirements (40-byte headers): 120–140, 100–120, 80–100, 60–80, 40–60, 20–40, and 0–20.
`
`Fig. 6. From top to bottom, varying bands represent the following VoIP bandwidth requirements
`(4-byte headers): 70–80, 60–70, 50–60, 40–50, 30–40, 20–30, 10–20, 0–10.
`
`A. Delay
`
`Transmission time includes delay due to codec processing
`as well as propagation delay. ITU-T Recommendation G.114
`[8] recommends the following one-way transmission time
`limits for connections with adequately controlled echo (com-
`plying with G.131 [7]):
`• 0 to 150 ms: acceptable for most user applications;
`• 150 to 400 ms: acceptable for international connec-
`tions;
`400 ms: unacceptable for general network planning
`purposes; however, it is recognized that in some excep-
`tional cases this limit will be exceeded.
`ITU-T Recommendation G.114 Annex B describes the re-
`sults of subjective tests to evaluate the effects of pure delay on
`speech quality. A test completed in 1989 showed the percent
`of users rating the call as poor or worse (POW) for overall
`quality started increasing above 10% only for delays greater
`
`•
`
`than 500 ms, but POW for interruptability was above 10%
`for delays of 400 ms. One of the tests, completed in 1990,
`“was designed to obtain subjective reactions, in context of
`interruptability and quality, to echo-free telephone circuits
`in which various amounts of delay were introduced. The re-
`sults indicated that long delays did not greatly reduce mean
`opinion scores over the range of delay tested, viz. 1 to 1000
`ms of one-way delay… However, observations during the
`test and subject interviews after the test showed the subjects
`experienced some real difficulties in communicating at the
`longer delays, although subjects did not always associate the
`difficulty with the delay ”[8].
`A Japanese study in 1991 measured the effect of delay
`using six different tasks involving more or less interruptions
`in the dialogue. The delay detectability threshold was defined
`as the delay detected by 50% of a task’s subjects. As the
`interactivity required by the tasks decreased, the delay de-
`tectability threshold increased from 45 to 370 ms of one-way
`
`1500
`
`PROCEEDINGS OF THE IEEE, VOL. 90, NO. 9, SEPTEMBER 2002
`
`
`
`Table 2
`Delay Budget for VoIP Using G.729 Codec
`
`delay. As the one-way delay increased from 100 to 350 ms,
`the MOS connection quality decreased from 3.74 ( 0.52)
`to 3.48 ( 0.48), and the connection acceptability decreased
`from 80% to 73% [8].
`Delay variation, sometimes called jitter, is also important.
`The receiving gateway or telephone must compensate for
`delay variation with a jitter buffer, which imposes a delay
`on early packets and passes late packets with less delay so
`that the decoded voice streams out of the receiver at a steady
`rate. Any packets that arrive later than the length of the jitter
`buffer are discarded. Since we want low packet loss, the jitter
`buffer delay is the maximum delay variation that we ex-
`pect. This jitter buffer delay must be included in the total
`end-to-end delay that the listener experiences during a con-
`versation using packet telephony.
`
`B. Delay Budget
`Packetized voice has larger end-to-end delays than a TDM
`system, making the above delay objectives challenging. A
`sample on-net delay budget for the G.729 (8 kb/s) codec is
`shown in Table 2.
`This budget is not precise. The allocated jitter buffer delay
`of 60 ms is only an estimate; the actual delay could be larger
`or smaller.2 Since the sample budget does not include any
`specific delays for header compression and decompression,
`we may consider that, if those functions are employed, the
`associated processing delay is lumped into the access link
`delay.
`This delay budget allows us to stay within the G.114 guide-
`lines, leaving 29 ms for the one-way backbone network delay
`(Dnw) in a national network. This is achievable in small
`countries. Network delays in the Asia Pacific region, as well
`as between North America and Asia, may be higher than 100
`ms. According to G.114, these delays are acceptable for in-
`ternational links. However, the end-to-end delays for VoIP
`calls are considerably larger than for PSTN calls.
`
`2In the absence of Network QoS, the jitter buffer delay could be larger.
`With QoS and an adaptive jitter buffer, the delay could adapt down to a lower
`value during a long conversation.
`
`IV. NETWORK QOS
`
`There are various approaches to providing QoS in IP net-
`works. Before discussing the QoS options, one must consider
`whether QoS is really necessary. Some Internet engineers as-
`sert that the way to provide good IP network performance is
`through provisioning, rather than through complicated QoS
`protocols. If no link in an IP network is ever more than 30%
`occupied, even in peak traffic conditions, then the packets
`should flow through without any queue delays, and elabo-
`rate protocols to give priority to one class of packet are not
`necessary. The design engineer should consider the capacity
`of the router components to forward small voice packets as
`well as the bandwidth of the inter-router links in determining
`the occupancy of the network. If the occupancy is low, then
`performance should be good. Essentially, the debate is over
`whether excess network capacity (including link bandwidth
`and routers) is less expensive than QoS implementation.
`The development of QoS features has continued because
`of the perception of some network engineers that real-time
`traffic (as well as other applications) may sometimes re-
`quire priority treatment to achieve good performance. In
`some parts of the world, bandwidth is at least an order of
`magnitude more expensive than it is in the United States. In
`some cases, access links may be expensive and broadband
`access difficult to obtain, so that QoS may be desirable on
`the access links even if the core network is lightly loaded.
`Wireless access links are especially expensive, so QoS is
`important for wireless mobile IP phone calls.
`QoS can be achieved by managing router queues and
`by routing traffic around congested parts of the network.
`Two key QoS concepts are the IntServ [9] and DiffServ.
`The IntServ concept is to reserve resources for each flow
`through the network. RSVP [10] was originally designed to
`be the reservation protocol. When an application requests
`a specific QoS for its data stream, RSVP can be used to
`deliver the request to each router along the path and to
`maintain router state to provide the requested service. RSVP
`transmits two types of Flow Specs conforming to IntServ
`rules. The traffic specification (Tspec) describes the flow,
`and the service request specification (Rspec) describes the
`
`GOODE: VOICE OVER INTERNET PROTOCOL (VoIP)
`
`1501
`
`
`
`service requested under the assumption that the flow adheres
`to the Tspec. Current implementations of IntServ allow a
`choice of Guaranteed Service or Controlled-Load Service.
`Guaranteed Service [11] involves traffic policing by a
`leaky token bucket model to control average traffic. Peak
`traffic is limited by a peak rate parameter
`and an interval
`so that no more than
`bytes are transmitted in any
`interval
`. The packet size is restricted to be in the range
`[
`], so that smaller packets are considered to be of size
`and packets larger than
`are in violation of the contract.
`A bandwidth requirement is stated, and enough bandwidth
`is reserved on each hop to satisfy all the requirements of the
`flow. (The bandwidth requirement may not be the same on
`each hop [12].) If each node and hop can accept the service
`request, the flow should be lossless because the queue size
`reserved for the flow can be set to the length parameter of
`the token bucket. This service is designed for interactive
`real-time applications. To use it effectively, one needs a
`strict and realistic end-to-end delay budget in addition to
`bandwidth requirements of the flow.
`Controlled-Load Service uses the same Tspec as Guar-
`anteed Service. However, an Rspec is not defined. Flows
`using this service should experience the same performance
`as they would in a lightly loaded “best-effort” network. Con-
`trolled-Load Service would be appropriate for call admission
`control and would prevent the delays and packet losses that
`make real-time traffic suffer when the network is congested.
`There are several reasons for not using IntServ with
`RSVP for IP telephony. Although IntServ with RSVP would
`work on a private network for small amounts of traffic,
`the large number of voice calls that IP telephony service
`providers carry on their networks would stress an IntServ
`RSVP system. First, the bandwidth required for voice itself
`is small, and the RSVP control traffic would be a significant
`part of the overall traffic. Second, RSVP router code was
`not designed to support many thousands of simultaneous
`connections per router.
`It should be noted, however, that RSVP is a signaling pro-
`tocol, and it has been proposed for use in contexts other than
`IntServ. For example, RSVP-TE is a constraint-based routing
`protocol for establishing LSPs with associated bandwidth
`and specified paths in an MPLS network [13]. RSVP has also
`been proposed as the call admission control mechanism for
`VoIP in differentiated services networks.
`
`A. Differentiated Services
`
`Since IntServ with RSVP does not scale well to support
`many thousands of simultaneous connections, the IETF
`has developed a simpler framework and architecture to
`support DiffServ [14]. The architecture achieves scalability
`by aggregating traffic into classifications that are conveyed
`by means of IP-layer packet marking using the DS field in
`IPv4 or IPv6 headers. Sophisticated classification, marking,
`policing, and shaping operations need only be implemented
`at network boundaries. Service provisioning policies al-
`locate network resources to traffic streams by marking
`
`and conditioning packets as they enter a differentiated
`services-capable network, in which the packets receive a
`particular PHB based on the value of the DS field.
`The primary goal of differentiated services is to allow dif-
`ferent levels of service to be provided for traffic streams on a
`common network infrastructure. A variety of resource man-
`agement techniques may be used to achieve this, but the end
`result will be that some packets will receive different (e.g.,
`better) service than others. This will, for example, allow ser-
`vice providers to offer a real-time service giving priority to
`the use of bandwidth and router queues, up to the configured
`amount of capacity allocated to real-time traffic.
`Despite the term “differentiated services,” the IETF Diff-
`Serv working group undertook to define standards that have
`more generality than specific services. The reason is that
`if the IETF were to define new standard services, everyone
`would have to agree on what constitutes a useful service and
`every router would have to implement the mechanisms to
`support it. To deploy that new service, you would have to
`upgrade the entire Internet. Since a router has only a few
`functions, it makes more sense to standardize forwarding be-
`havior (“send this packet first” or “drop this packet last”). So
`the DiffServ working group first defined PHBs, which could
`be combined with rules to create services.3
`An important requirement is scalability, since the IETF in-
`tended differentiated services to be deployed in very large
`networks. To achieve scalability, the DiffServ architecture
`prescribes treatment for aggregated traffic rather than mi-
`croflows and forces much of the complexity out of the core
`of the network into the edge devices, which process lower
`volumes of traffic and lesser numbers of flows.
`The DiffServ architecture is based on a simple model
`where packets entering a network are classified and possibly
`conditioned at the boundaries of the network, and then
`assigned to different behavior aggregates. Each behavior
`is identified by a single DS codepoint. Within the core of
`the network, packets are forwarded according to the PHB
`associated with the DS codepoint.
`One candidate PHB for voice service is EF. The objective
`of the EF PHB is to build a low-loss, low-latency, low-jitter,
`assured bandwidth, end-to-end service through DS domains.
`Such a service would appear to endpoints like a point-to-
`point connection or “virtual leased line.” Since router queues
`cause traffic to experience loss, jitter, and excessive latency,
`EF PHB tries to ensure that all EF traffic experiences ei-
`ther no or very small queues. Since queues arise when the
`short-term traffic arrival rate exceeds the departure rate at
`some node, this ensures that, at every node, the aggregate
`EF traffic maximum arrival rate is less than the EF minimum
`departure rate [15]–[17]. The original idea was to ensure low
`delay and no packet loss. Subsequent analysis has shown
`that, under the no loss hypothesis, evaluating the worst-case
`arrival patterns on each node leads to poor delay bounds after
`just a few hops. Using a worst-case analysis to determine ad-
`mission criteria would lead to unacceptably low utilization.
`
`3Recently, the IETF DiffServ Working Group has started considering per
`domain behaviors, but as of this writing the work is still in progress.
`
`1502
`
`PROCEEDINGS OF THE IEEE, VOL. 90, NO. 9, SEPTEMBER 2002
`
`
`
`However, simulations and early EF trials show that good per-
`formance can be achieved with reasonable efficiency [18].
`The appeal of DiffServ is that it is relatively simple (com-
`pared to IntServ), yet provides applications like VoIP some
`improvement in performance compared to “best-effort” IP
`networks. However, DiffServ relies on ample network ca-
`pacity for EF traffic and makes use of standard routing proto-
`cols that make no attempt to use the network efficiently. Con-
`fronted with network congestion, EF would drop packets at
`the edge instead of queuing or rerouting them. DiffServ has
`no topology-aware admission control mechanism. The IETF
`DiffServ Working Group has not recommended a mechanism
`for rejecting additional VoIP calls if accepting them would
`degrade the quality of calls in progress.4
`
`B. MPLS-Based QoS
`For several decades, traffic engineering and automated
`rerouting of telephone traffic have increased the efficiency
`and reliability of the PSTN. Frame relay and ATM also
`offer source (or “explicit”) routing capabilities that enable
`traffic engineering. However, IP networks have relied on
`destination-based routing protocols that send all the packets
`over the shortest path, without regard to the utilization of
`the links comprising that path. In some cases, links can be
`congested by traffic that could be carried on other paths
`comprised of underutilized links. It is possible to design an
`IP network to run on top of a frame relay or ATM (“Layer
`2”) network, providing some traffic engineering features,
`but this approach adds cost and operational complexity.
`MPLS offers IP networks the capability to provide traffic
`engineering as well as a differentiated services approach
`to voice quality. MPLS separates routing from forwarding,
`using label swapping as the forwarding mechanism. The
`physical manifestation of MPLS is the LSR. LSRs perform
`the routing function in advance by creating LSPs connecting
`edge routers. The edge router (an LSR) attaches short
`(four-byte) labels to packets. Each LSR along the LSP
`swaps the label and passes it along to the next LSR. The last
`LSR on the LSP removes the label and treats the packet as
`a normal IP packet.
`MPLS LSPs can be established using LDP [19], RSVP-TE
`[20], or CR-LDP [21]. When using LDP, LSPs have no
`associated bandwidth. However, when using RSVP-TE or
`CR-LDP, each LSP can be assigned a bandwidth, and the
`path can be designated for traffic engineering purposes.
`MPLS traffic engineering (MPLS-TE) combines extensions
`to OSPF or IS-IS, to distribute link resource constraints,
`with the label distribution protocols RSVP-TE or CR-LDP.
`Resource and policy attributes are configured on every
`link and define the capabilities of the network in terms of
`bandwidth, a Resource Class Affinity string, and a traffic en-
`gineering link metric. When performing the constraint-based
`path computation, the originating LSR compares the link
`attributes received via OSPF or IS-IS to those configured on
`the LSP.
`
`4Indeed, the working group co-chairs probably did not believe that admis-
`sion control was within their charter.
`
`Differentiated services can be combined with MPLS
`to map DiffServ Behavior Aggregates onto LSPs [22].
`QoS policies can be designated for particular paths. More
`specifically, the EXP field of the MPLS label can be set
`so that each label switch/router in the path knows to give
`the voice packets highest priority, up to the configured
`maximum bandwidth for voice on a particular link. When
`the high-priority bandwidth is not needed for voice, it can
`be used for lower priority classes of traffic.
`DiffServ and MPLS DiffServ are implemented indepen-
`dently of the routing computation. MPLS-TE computes
`routes for aggregates across all classes and performs admis-
`sion control over the entire LSP bandwidth. MPLS-TE and
`MPLS DiffServ can be used at the same time. Alternatively,
`DiffServ can be combined with traffic engineering to es-
`tablish separate tunnels for different classes. DS-TE makes
`MPLS-TE aware of DiffServ, so that one can establish
`separate LSPs for different classes, taking into account
`the bandwidth available to each class. So, for example,
`a separate LSP could be established for voice, and that
`LSP could be given higher priority than other LSPs, but
`the amount of voice traffic on a link could be limited to a
`certain percentage of the total link bandwidth. This capa-
`bility is currently being standardized by the IETF Traffic
`Engineering Working Group [23], [24].
`Voice DS-TE tunnels can be based on a delay metric or
`a bandwidth metric. Combin