throbber
Signal Processing: Image Communication 15 (1999) 95}126
`
`Scalable Internet video using MPEG-4
`
`Hayder Radha*, Yingwei Chen, Kavitha Parthasarathy, Robert Cohen
`
`Philips Research, 345 Scarborough Rd, Briarcliw Manor, New York, 10510, USA
`
`Abstract
`
`Real-time streaming of audio-visual content over Internet Protocol (IP) based networks has enabled a wide range of
`multimedia applications. An Internet streaming solution has to provide real-time delivery and presentation of a continu-
`ous media content while compensating for the lack of Quality-of-Service (QoS) guarantees over the Internet. Due to the
`variation and unpredictability of bandwidth and other performance parameters (e.g. packet loss rate) over IP networks,
`in general, most of the proposed streaming solutions are based on some type of a data loss handling method and a layered
`video coding scheme. In this paper, we describe a real-time streaming solution suitable for non-delay-sensitive video
`applications such as video-on-demand and live TV viewing.
`The main aspects of our proposed streaming solution are:
`1. An MPEG-4 based scalable video coding method using both a prediction-based base layer and a "ne-granular
`enhancement layer;
`2. An integrated transport-decoder bu!er model with priority re-transmission for the recovery of lost packets, and
`continuous decoding and presentation of video.
`In addition to describing the above two aspects of our system, we also give an overview of a recent activity within
`MPEG-4 video on the development of a "ne-granular-scalability coding tool for streaming applications. Results for the
`performance of our scalable video coding scheme and the re-transmission mechanism are also presented. The latter
`results are based on actual testing conducted over Internet sessions used for streaming MPEG-4 video in real-
`time. ( Published by 1999 Elsevier Science B.V. All rights reserved.
`
`1. Introduction
`
`Real-time streaming of multimedia content over
`Internet Protocol (IP) networks has evolved as one
`of the major technology areas in recent years.
`A wide range of interactive and non-interactive
`multimedia Internet applications, such as news on-
`demand, live TV viewing, and video conferencing
`rely on end-to-end streaming solutions. In general,
`
`* Corresponding author.
`E-mail address: hmr@philabs.research.philips.com (H. Radha)
`
`streaming solutions are required to maintain real-
`time delivery and presentation of the multimedia
`content while compensating for the lack of Quality-
`of-Service (QoS) guarantees over IP networks.
`Therefore, any Internet streaming system has to
`take into consideration key network performance
`parameters such as bandwidth, end-to-end delay,
`delay variation, and packet loss rate.
`To compensate for the unpredictability and
`variability in bandwidth between the sender and
`receiver(s) over the Internet and Intranet net-
`works, many streaming solutions have resorted
`to variations of layered (or scalable) video cod-
`ing methods (see for example [22,24,25]). These
`
`0923-5965/99/$ - see front matter ( 1999 Published by Elsevier Science B.V. All rights reserved.
`PII: S 0 9 2 3 - 5 9 6 5 ( 9 9 ) 0 0 0 2 6 - 0
`
`1
`
`Stingray Ex. 1007
`
`

`

`96
`
`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`solutions are typically complemented by packet
`loss recovery [22] and/or error resilience mecha-
`nisms [25] to compensate for the relatively high
`packet-loss rate usually encountered over the Inter-
`net [2,30,32,33,35,47].
`Most of the references cited above and the ma-
`jority of related modeling and analytical research
`studies published in the literature have focused on
`delay-sensitive (point-to-multipoint or multipoint-
`to-multipoint) applications such as video con-
`ferencing over the Internet Multicast Backbone
`} MBone. When compared with other types of
`applications (e.g. entertainment over the Web),
`these delay-sensitive applications impose di!erent
`kind of constraints, such as low encoder complexity
`and very low end-to-end delay. Meanwhile, enter-
`tainment-oriented Internet applications such as
`news and sports on-demand, movie previews and
`even &live’ TV viewing represent a major (and grow-
`ing) element of the real-time multimedia experience
`over the global Internet [9].
`Moreover, many of
`the proposed streaming
`solutions are based on either proprietary or video
`coding standards that were developed at times
`prior to the phenomenal growth of the Internet.
`However, under the audio, visual, and system
`activities of the ISO MPEG-4 work, many aspects
`of the Internet have being taken into considera-
`tion when developing the di!erent parts of the
`standard.
`In particular, a recent activity in
`MPEG-4 video has focused on the development of
`a scalable compression tool for streaming over IP
`networks [4,5].
`In this paper, we describe a real-time streaming
`system suitable for non-delay-sensitive1 video ap-
`plications (e.g. video-on-demand and live TV view-
`ing) based on the MPEG-4 video-coding standard.
`The main aspects of our real-time streaming system
`are:
`1. A layered video coding method using both a
`prediction-based base layer and a "ne-granular
`enhancement layer: This solution follows the
`
`1 Delay sensitive applications are normally constrained by an
`end-to-end delay of about 300}500 ms. Real-time, non-delay-
`sensitive applications can typically tolerate a delay on the order
`of few seconds.
`
`recent development in the MPEG-4 video group
`for the standardization of a scalable video com-
`pression tool for Internet streaming applications
`[3,4,6].
`2. An integrated transport-decoder bu!er model
`with a re-transmission based scheme for the re-
`covery of lost packets, and continuous decoding
`and presentation of video.
`The remainder of this paper is organized as follows.
`In Section 2 we provide an overview of key design
`issues one needs to consider for real-time, non-
`delay-sensitive IP streaming solutions. We will also
`highlight how our proposed approach addresses
`these issues. Section 3 describes our real-time
`streaming system and its high level architecture.
`Section 4 details the MPEG-4 based scalable video
`coding scheme used by the system, and provides an
`overview of the MPEG-4 activity on "ne-granu-
`lar-scalability. Simulation results for our scalable
`video compression solution are also presented in
`Section 4. In Section 5, we introduce the integrated
`transport layer-video decoder bu!er model with
`re-transmission. We also evaluate the e!ectiveness
`of the re-transmission scheme based on actual tests
`conducted over the Internet involving real-time
`streaming of MPEG-4 video.
`
`2. Design considerations for real-time streaming
`
`The following are some high-level issues that
`should be considered when designing a real-time
`streaming system for entertainment oriented ap-
`plications.
`
`2.1. System scalability
`
`The wide range of variation in e!ective band-
`width and other network performance character-
`istics over the Internet [33,47] makes it necessary
`to pursue a scalable streaming solution. The vari-
`ation in QoS measures (e.g. e!ective bandwidth)
`is not only present across the di!erent access
`technologies to the Internet (e.g. analog modem,
`ISDN, cable modem, LAN, etc.), but it can even
`be observed over relatively short periods of time
`over a particular session [8,33]. For example, a
`recent study shows that the e!ective bandwidth
`
`2
`
`

`

`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`97
`
`of a cable modem access link to the Internet may
`vary between 100 kbps to 1 Mbps [8]. Therefore,
`any video-coding method and associated streaming
`solution has to take into consideration this wide
`range of performance characteristics over IP net-
`works.
`
`2.2. Video compression complexity, scalability,
`and coding ezciency
`
`The video content used for on-demand applica-
`tions is typically compressed o!-line and stored
`for later viewing through unicast IP sessions. This
`observation has two implications. First, the com-
`plexity of the video encoder is not as major an
`issue as in the case with interactive multipoint-to-
`multipoint or even point-to-point applications (e.g.
`video conferencing and video telephony) where
`compression has to be supported by every terminal.
`Second, since the content is not being compressed
`in real-time, the encoder cannot employ a vari-
`able-bit-rate (VBR) method to adapt to the avail-
`able bandwidth. This emphasizes the need for
`coding the material using a scalable approach. In
`addition, for multicast or unicast applications in-
`volving a large number of point-to-multipoint ses-
`sions, only one encoder (or possibly very few
`encoders for simulcast) is (are) usually used. This
`observation also leads to a relaxed constraint on
`the complexity of the encoder, and highlights the
`need for video scalability. As a consequence of the
`relaxed video-complexity constraint for entertain-
`ment-oriented IP streaming, there is no need to
`totally avoid such techniques as motion estimation
`which can provide a great deal of coding e$ciency
`when compared with replenishment-based solu-
`tions [24].
`Although it is desirable to generate a scalable
`video stream for a wide range of bit-rates (e.g.
`15 kbps for analog-modem Internet access to
`around 1 Mbps for cable-modem/ADSL access), it
`is virtually impossible to achieve a good coding-
`e$ciency/video-quality tradeo! over such a wide
`range of rates. Meanwhile, it is equally important
`to emphasize the impracticality of coding the video
`content using simulcast compression at multiple
`bit-rates to cover the same wide range. First, simul-
`cast compression requires the creation of many
`
`streams (e.g. at 20, 40, 100, 200, 400, 600, 800 and
`1000 kbps). Second, once a particular simulcast
`bitstream (coded at a given bit-rate, say R) is se-
`lected to be streamed over a given Internet session
`(which initially can accommodate a bit-rate of
`R or higher), then due to possible wide variation
`of the available bandwidth over time, the Inter-
`net session bandwidth may fall below the bit-
`rate R. Consequently, this decrease in bandwidth
`could signi"cantly degrade the video quality. One
`way of dealing with this issue is to switch, in real-
`time, among di!erent simulcast streams. This, how-
`ever, increases complexities on both the server and
`the client sides, and introduces synchronization
`issues.
`A good practical alternative to this issue is to
`use video scalability over few ranges of bit-rates.
`For example, one can create a scalable video
`stream for the analog/ISDN access bit-rates (e.g.
`to cover 20}100 kbps bandwidth), and another
`scalable stream for a higher bit-rate range (e.g.
`200 kbps}1 Mbps). This approach leads to another
`important requirement. Since each scalable stream
`will be build on the top of a video base layer, this
`approach implies that multiple base layers will be
`needed (e.g. one at 20 kbps, another at 200 kbps,
`and possibly another at 1 Mbps). Therefore, it is
`quite desirable to deploy a video compression stan-
`dard that provides good coding e$ciency over
`a rather wide range of possible bit-rates (in the
`above example 20 kbps, 200 kbps and 1 Mbps). In
`this regard, due to the many video-compression
`tools provided by MPEG-4 for achieving high
`coding e$ciency and in particular at low bit-rates,
`MPEG-4 becomes a very attractive choice for
`compression.
`
`2.3. Streaming server complexity
`
`Typically, a unicast server has to output tens,
`hundreds, or possibly thousands of video streams
`simultaneously. This greatly limits the type of pro-
`cessing the server can perform on these streams in
`real-time. For example, although the separation of
`an MPEG-2 video stream into three temporal
`layers (I, P and B) is a feasible approach for a
`scalable multicast (as proposed in [22]), it will be
`quite di$cult to apply the same method to a large
`
`3
`
`

`

`98
`
`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`number of unicast streams. This is the case since the
`proposed layering requires some parsing of the
`compressed video bitstream. Therefore, it is desir-
`able to use a very simple scalable video stream that
`can be easily processed and streamed for unicast
`sessions. Meanwhile, the scalable stream should be
`easily divisible into multiple streams for multicast
`IP similar to the receiver-driven paradigm used in
`[22,24].
`Consequently, we adopt a single, "ne-granular
`enhancement layer that satis"es these require-
`ments. This simple scalability approach has two
`other advantages. First, it requires only a single
`enhancement layer decoder at the receiver (even if
`the original "ne-granular stream is divided into
`multiple sub-streams). Second, the impact of packet
`losses is localized to the particular enhancement-
`layer picture(s) experiencing the losses. These and
`other advantages of the proposed scalability ap-
`proach will become clearer later in the paper.
`
`2.4. Client complexity and client-server
`communication issues
`
`There is a wide range of clients that can access
`the Internet and experience a multimedia streaming
`application. Therefore, a streaming solution should
`take into consideration a scalable decoding ap-
`proach that meets di!erent client-complexity re-
`quirements. In addition, one has to incorporate
`robustness into the client for error recovery and
`handling, keeping in mind key client-server com-
`plexity issues. For example, the deployment of an
`elaborate feedback scheme between the receivers
`for #ow control and error
`and the sender (e.g.
`handling) is not desirable due to the potential im-
`plosion of messages at the sender [2,34,35]. How-
`ever, simple re-transmission techniques have been
`proven e!ective for many unicast and multicast
`multimedia
`applications
`[2,10,22,34]. Conse-
`quently, we employ a re-transmission method for
`the recovery of lost packets. This method is com-
`bined with a client-driven #ow control model that
`ensures the continuous decoding and presentation
`of video while minimizing the server complexity.
`
`In summary, a real-time streaming system
`tailored for entertainment IP applications should
`
`provide a good balance among these requirements:
`(a) scalability of the compressed video content,
`(b) coding e$ciency across a wide range of bit-
`rates, (c) low complexity at the streaming server,
`and (d) handling of lost packets and end-to-end
`#ow control using a primarily client-driven ap-
`proach to minimize server complexity and meet
`overall system scalability requirements. These ele-
`ments are addressed in our streaming system as
`explained in the following sections.
`
`3. An overview of the scalable video streaming
`system
`
`The overall architecture of our scalable video
`streaming system is shown in Fig. 1.2 The system
`consists of three main components: an MPEG-4
`based scalable video encoder, a real-time streaming
`server, and a corresponding real-time streaming
`client which includes the video decoder.
`MPEG-4 is an international standard being de-
`veloped by the ISO Moving Picture Experts Group
`for the coding and representation of multimedia
`content.3 In addition to providing standardized
`methods for decoding compressed audio and video,
`MPEG-4 provides standards for the representa-
`tion, delivery, synchronization, and interactivity of
`audiovisual material. The powerful MPEG-4 tools
`yield good levels of performance at low bit-rates,
`while at the same time they present a wealth of new
`functionality [20].
`The video encoder generates two bitstreams:
`a base-layer and an enhancement-layer compressed
`video. An MPEG-4 compliant stream is coded
`based on an MPEG-4 video Veri"cation Model
`(VM).4 This stream, which represents the base
`
`2 The "gure illustrates the architecture for a single, unicast
`server-client session. Extending the architecture shown in the
`"gure to multiple unicast sessions, or to a multicast scenario is
`straightforward.
`3 http://drogo.cselt.stet.it/mpeg/
`4 The VM is a common set of tools that contain detailed
`encoding and decoding algorithms used as reference for testing
`new functionalities. The video encoding was based on the
`MPEG-4 video group, MoMuSys software Version VCD-06-
`980625.
`
`4
`
`

`

`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`99
`
`Fig. 1. The end-to-end architecture of an MPEG-4 based scalable video streaming system.
`
`is
`layer of the scalable video encoder output,
`coded at a low bit-rate. The particular rate selected
`depends on the overall range of bit-rates targeted
`by the system and the complexity of the source
`material. For example, to serve clients with ana-
`log/ISDN modems’ Internet access, the base-layer
`video is coded at around 15}20 kbps. The video
`enhancement layer is coded using a single "ne-
`granular-scalable bitstream. The method used
`for coding the enhancement
`layer follows the
`recent development in the MPEG-4 video "ne-
`granular-scalability (FGS) activity for Internet
`streaming applications [4,5]. For the above ana-
`log/ISDN-modem access example, the enhance-
`ment layer stream is over-coded to a bit-rate
`
`around 80}100 kbps. Due to the "ne granularity
`of the enhancement layer, the server can easily
`select and adapt to the desired bit-rate based on
`the conditions of
`the network. The scalable
`video coding aspects of the system are covered in
`Section 4.
`The server outputs the MPEG-4 base-layer
`video at a rate that follows very closely the bit-rate
`at which the stream was originally coded. This
`aspect of the server is crucial for minimizing under-
`#ow and over#ow events at the client. Jitter is
`introduced at the server output due, in part, to the
`packetization of the compressed video streams.
`Real-time Transport Protocol (RTP) packetization
`[15,39] is used to multiplex and synchronize the
`
`5
`
`

`

`100
`
`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`base and enhancement layer video. This is accomp-
`lished through the time-stamp "elds supported
`in the RTP header. In addition to the base and
`enhancement streams, the server re-transmits lost
`packets in response to requests from the client. The
`three streams (base, enhancement and re-transmis-
`sion) are sent using the User Datagram Protocol
`(UDP) over IP. The re-transmission requests
`between the client and the server are carried in
`an end-to-end,
`reliable control
`session using
`Transmission Control Protocol (TCP). The server
`rate-control aspects of the system are covered in
`Section 5.
`In addition to a real-time MPEG-4 based, scala-
`ble video decoder, the client includes bu!ers and
`a control module to regulate the #ow of data and
`ensure continuous and synchronized decoding of
`the video content. This is accomplished by deploy-
`ing an Integrated Transport Decoder (ITD) bu!er
`model which supports packet-loss
`recovery
`through re-transmission requests. The ITD bu!er
`model and the corresponding re-transmission
`method are explained in Section 5.
`
`4. MPEG-4 based scalable video coding for
`streaming
`
`4.1. Overview of video scalability
`
`Many scalable video-coding approaches have
`been proposed recently for real-time Internet ap-
`plications. In [22] a temporal layering scheme is
`applied to MPEG-2 video coded streams where
`di!erent picture types (I, P and B) are separated
`into corresponding layers (I, P and B video layers).
`These layers are multicasted into separate streams
`allowing receivers with di!erent session-bandwidth
`characteristics to subscribe to one or more of these
`layers. In conjunction with this temporal layering
`scheme, a re-transmission method is used to re-
`cover lost packets. In [25] a spatio-temporal layer-
`ing scheme is used where temporal compression is
`based on hierarchical conditional replenishment
`and spatial compression is based on a hybrid
`DCT/subband transform coding.
`In the scalable video coding system developed in
`[45], a 3-D subband transform with camera-pan
`
`compensation is used to avoid motion compensa-
`tion drift due to partial reference pictures. Each
`subband is encoded with progressively decreasing
`quantization step sizes. The system can support,
`with a single bitstream, a range of bit-rates from
`kilobits to megabits and various picture resolutions
`and frame rates. However, the coding e$ciency of
`the system depends heavily on the type of motion
`in the video being encoded. If the motion is other
`than camera panning, then the e!ectiveness of the
`temporal redundancy exploitation is limited. In ad-
`dition, the granularity of the supported bit-rates is
`fairly coarse.
`Several video scalability approaches have been
`adopted by video compression standards such as
`MPEG-2, MPEG-4 and H.263. Temporal, spatial
`and quality (SNR) scalability types have been de-
`"ned in these standards. All of these types of scala-
`ble video consist of a Base Layer (BL) and one or
`multiple Enhancement Layers (ELs). The BL part
`of the scalable video stream represents, in general,
`the minimum amount of data needed for decoding
`that stream. The EL part of the stream represents
`additional information, and therefore it enhances
`the video signal representation when decoded by
`the receiver.
`For each type of video scalability, a certain scala-
`bility structure is used. The scalability structure
`de"nes the relationship among the pictures of
`the BL and the pictures of the enhancement layer.
`Fig. 2 illustrates examples of video scalability
`structures. MPEG-4 also supports object-based
`scalability structures for arbitrarily shaped video
`objects [17,18].
`Another type of scalability, which has been
`is xne-
`primarily used for coding still
`images,
`granular scalability. Images coded with this type
`of scalability can be decoded progressively. In
`other words,
`the decoder can start decoding
`and displaying the image after receiving a very
`small amount of data. As more data is received,
`the quality of the decoded image is progressively
`enhanced until the complete information is re-
`ceived, decoded, and displayed. Among lead inter-
`national standards, progressive image coding is
`one of the modes supported in JPEG [16] and
`the still-image, texture coding tool
`in MPEG-4
`video [17].
`
`6
`
`

`

`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`101
`
`Fig. 2. Examples of video scalability structures.
`
`When compared with non-scalable methods,
`a disadvantage of video scalable compression is
`its inferior coding e$ciency. In order to increase
`coding e$ciency, video scalable methods normally
`rely on relatively complex structures (such as the
`spatial and temporal scalability examples shown in
`Fig. 2). By using information from as many pictures
`as possible from both the BL and EL, coding
`e$ciency can be improved when compressing an
`enhancement-layer picture. However, using predic-
`tion among pictures within the enhancement layer
`either eliminates or signi"cantly reduces the "ne-
`granular scalability feature, which is desirable for
`environments with a wide range of available band-
`width (e.g. the Internet). On the other hand, using
`a "ne-granular scalable approach (e.g. progressive
`JPEG or the MPEG-4 still-image coding tool) to
`compress each picture of a video sequence prevents
`the employment of prediction among the pictures,
`and consequently degrades coding e$ciency.
`
`4.2. MPEG-4 video based xne-granular-scalability
`(FGS)
`
`In order to strike a balance between coding-
`e$ciency and "ne-granularity requirements, a
`recent activity in MPEG-4 adopted a hybrid scala-
`bility structure characterized by a DCT motion
`compensated base layer and a "ne granular scal-
`able enhancement
`layer [4,5]. This scalability
`structure is illustrated in Fig. 3. The video cod-
`ing scheme used by our system is based on this
`scalability structure [5]. Under this structure, the
`server can transmit part or all of the over-coded
`enhancement layer to the receiver. Therefore, un-
`like the scalability solutions shown in Fig. 2, the
`FGS structure enables the streaming system to
`adapt to varying network conditions. As explained
`in Section 2, the FGS feature is especially needed
`when the video is pre-compressed and the con-
`dition of the particular session (over which the
`
`7
`
`

`

`102
`
`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`Fig. 3. Video scalability structure with "ne-granularity.
`
`Fig. 4. A streaming system employing the MPEG-4 based "ne-granular video scalability.
`
`bitstream will be delivered) is not known at the time
`when the video is coded.
`Fig. 4 shows the internal architecture of the
`MPEG-4 based FGS video encoder used in our
`streaming system. The base layer carries a min-
`imally acceptable quality of video to be reliably
`delivered using a re-transmission, packet-loss re-
`covery method. The enhancement layer improves
`upon the base layer video, fully utilizing the esti-
`
`mated available bandwidth (Section 5.5). By em-
`ploying a motion compensated base layer, coding
`e$ciency from temporal redundancy exploitation
`is partially retained. The base and a single-en-
`hancement layer streams can be either stored for
`later transmission, or can be directly streamed
`by the server in real-time. The encoder interfaces
`with a system module that performs estimates of
`the range of bandwidth [R.*/, R.!9] that can be
`
`8
`
`

`

`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`103
`
`
`
`
`
`R%2"R2!R1,2, R%N"RN!RN~1,
`
`supported over the desired network. Based on this
`information, the module conveys to the encoder the
`
`bit-rate RBL)R.*/ that must be used to compress
`the base-layer video.5 The enhancement layer is
`
`over-coded using a bit-rate (R.!9!RBL). It is im-
`portant to note that the range [R.*/, R.!9] can be
`determined o!-line for a particular set of Internet
`"20 kbps
`access technologies. For example, R.*/
`"100 kbps can be used for analogue-
`and R.!9
`modem/ISDN access technologies. More sophisti-
`cated techniques can also be employed in real-time
`to estimate the range [R.*/, R.!9]. For unicast
`streaming, an estimate for the available bandwidth
`R can be generated in real-time for a particular
`session. Based on this estimate, the server transmits
`the enhancement layer using a bit-rate REL:
`
`REL"min(R.!9!RBL, R!RBL).
`Due to the "ne granularity of the enhancement
`layer, its real-time rate control aspect can be imple-
`mented with minimal processing (Section 5.5). For
`multicast streaming, a set of intermediate bit-rates
`R1, R2,2, RN can be used to partition the en-
`hancement layer into substreams. In this case,
`N "ne-granular streams are multicasted using the
`bit-rates:
`
`R%1"R1!RBL,
`
`
`
`
`where
`(R22(RN~1(RN)R.!9.
`
`
`
`RBL(R1
`Using a receiver-driven paradigm [24], the client
`can subscribe to the base layer and one or more of
`the enhancement layers’ streams. As explained
`earlier, one of the advantages of the FGS approach
`is that the EL sub-streams can be combined at the
`receiver into a single stream and decoded using
`a single EL decoder.
`
`
`
`5 Typically, the base layer encoder will compress the signal
`using the minimum bit-rate R.*/. This is especially the case when
`the BL encoding takes place o!-line prior to the time of trans-
`mitting the video signal.
`
`compression
`alternative
`are many
`There
`methods one can choose from when coding the
`BL and EL layers of the FGS structure shown in
`Fig. 3. MPEG-4 is highly anticipated to be the
`next widely-deployed audio-visual standard for in-
`teractive multimedia applications. In particular,
`MPEG-4 video provides superior low-bit-rate cod-
`ing performance when compared with other
`MPEG standards (i.e. MPEG-1 and MPEG-2),
`and provides object-based functionality. In addi-
`tion, MPEG-4 video has demonstrated its coding
`e$ciency even for medium-to-high bit-rates. There-
`fore, we use the DCT-based MPEG-4 video tools
`for coding the base layer. There are many excellent
`documents and papers describe the MPEG-4 video
`coding tools [17,18,43,44].
`For the EL encoder shown in Fig. 4, any embed-
`ded or "ne-granular compression scheme can be
`used. Wavelet-based solutions have shown excel-
`lent coding-e$ciency and "ne-granularity perfor-
`mance for image compression [41,37]. In the
`following sub-section, we will discuss our wavelet
`solution for coding the EL of the MPEG-4 based
`scalable video encoder. Simulation results of our
`MPEG-4 based FGS coding method will be pre-
`sented in Section 4.3.2.
`
`4.3. The FGS enhancement layer encoder using
`wavelet
`
`In addition to achieving a good balance between
`coding e$ciency and "ne granularity, there are
`other criteria that need to be considered when
`selecting the enhancement layer coding scheme.
`These criteria include complexity, maturity and ac-
`ceptability of that scheme by the technical and
`industrial communities for broad adaptation. The
`complexity of such scheme should be su$ciently
`low, in particular, for the decoder. The technique
`should be reasonably mature and stable. Moreover,
`it is desirable that the selected technique has some
`roots in MPEG or other standardization bodies to
`facilitate its broad acceptability.
`Embedded wavelet coding satis"es all of the
`above criteria. It has proven very e$cient in coding
`still images [38,41] and is also e$cient in coding
`video signals [46]. It naturally provides "ne granu-
`lar scalability, which has always been one of its
`
`9
`
`

`

`104
`
`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`strengths when compared to other transform-based
`coding schemes. Because wavelet-based image
`compression has been studied for many years now,
`and because its relationship with sub-band coding
`is well established there exist fast algorithms and
`implementations to reduce its complexity. More-
`over, MPEG-4 video includes a still-image com-
`pression tool based on the wavelet transform [17].
`This still-image coding tool supports three com-
`pression modes, one of which is "ne granular. In
`addition, the image-compression methods current-
`ly competing under the JPEG-2000 standardiz-
`ation activities are based on the wavelet transform.
`All of the above factors make wavelet based coding
`for the FGS enhancement layer a very attractive
`choice.
`Ever since the introduction of EZW (Embedded
`Zerotrees of Wavelet coe$cients) by Shapiro [41],
`much research has been directed toward e$cient
`progressive encoding of images and video using
`wavelets. Progress in this area has culminated re-
`cently with the SPIHT (Set Partitioning In Hier-
`archical Trees) algorithm developed by Said and
`Pearlman [38]. The still-image, texture coding
`tool
`in MPEG-4 also represents a variation of
`EZW and gives comparable performance to that
`of SPIHT.
`Compression results and proposals for using dif-
`ferent variations of the EZW algorithm have been
`recently submitted to the MPEG-4 activity on FGS
`video [6,17,19,40]. These EZW-based proposals in-
`clude the scalable video coding solution used in our
`streaming system. Below, we give a brief overview
`of the original EZW method and highlight how the
`recent wavelet-based MPEG-4 proposals (for cod-
`ing the FGS EL video) di!er from the original
`EZW algorithm. Simulation results are shown at
`the end of the section.
`
`4.3.1. EZW-based coding of the enhancement-layer
`video
`The di!erent variations of the EZW approach
`[6,17,19,37,38,40,41] are based on: (a) computing
`a wavelet-transform of the image, and (b) coding
`the resulting transform by partitioning the wavelet
`coe$cients into sets of hierarchical, spatial-orienta-
`tion trees. An example of a spatial-orientation tree
`is shown in Fig. 5. In the original EZW algorithm
`
`Fig. 5. Examples of the hierarchical, spatial-orientation trees of
`the zero-tree algorithm.
`
`[41], each tree is rooted at the highest level (most
`coarse sub-band) of the multi-layer wavelet trans-
`form. If there are m layers of sub-bands in the
`hierarchical wavelet transform representation of
`the image, then the roots of the trees are in the ‚‚
`m
`sub-band of the hierarchy as shown in Fig. 5. If the
`number of coe$cients in sub-band ‚‚
`m is Nm, then
`there are Nm spatial-orientation trees representing
`the wavelet transform of the image.
`In EZW, coding e$ciency is achieved based on
`‘
`a
`the hypothesis of
`decaying spectrum
`: the energies
`of the wavelet coe$cients are expected to decay in
`the direction from the root of a spatial-orientation
`tree toward its descendants. Consequently, if the
`wavelet coe$cient cn of a node n is found insigni"c-
`ant (relative to some threshold „
`"2k), then it is
`k
`highly probable that all descendants D(n) of the
`node n are also insigni"cant (relative to the same
`threshold „
`k). If the root of a tree and all of its
`descendants are insigni"cant
`then this tree is
`referred to as a Zero-Tree Root (ZTR). If a node
`D(„
`n is insigni"cant (i.e. Dcn
`k) but one (or more)
`its descendants is (are) signi"cant then this
`of
`scenario represents a violation of the &decaying
`spectrum’ hypothesis. Such a node is referred to as
`an Isolated Zero-Tree (IZT). In the original EZW
`
`10
`
`

`

`H. Radha et al. / Signal Processing: Image Communication 15 (1999) 95}126
`
`1

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket