Scalable Video

The international video coding standard formally referred to as ITU-T Recommendation H.264 and ISO/IEC MPEG-4 (Part 10) Advanced Video Coding (or referred to in short as H.264/AVC) represents the state-of-the-art in single-layer video compression. In comparison with MPEG-2, the most widely used video coding standard today, H.264/AVC provides a typical improvement of the coding efficiency by a factor of two, while keeping the complexity within an acceptable range.

The AVC architecture is quite similar to that of previous video coding standards. The improved coding efficiency can be attributed to the use of several advanced coding tools. Multi-hypothesis block-based motion estimation and compensation with variable block sizes, multiple reference frames and quarter-pel accuracy are used to temporally decorrelate the video frames. Advanced intra-prediction modes are provided to predict the blocks for which inter-prediction is not efficient. An in-loop deblocking filter provides improved visual quality by smoothing out the blocking artifacts. A new integer 4x4 DCT-alike transform is used to spatially decorrelate the intra-frames and prediction-error frames. Finally, the produced coefficients are efficiently coded using CABAC, an advanced context-based arithmetic coder.

While the H.264/AVC standard provides state-of-the-art compression performance for single layer video coding, it does not support any form of scalability (SVC). In MPEG, SVC has always been part of video group. However, for a short period of time, it was considered a MPEG-21 activity (21000-13). Since then, when tests were accomplished on several proposals, it became apparent that the best solution was closely related to MPEG-4 AVC/H.264. Hence, the MPEG standardisation body decided last year to specify an extension of the latter rather than a new specification in MPEG-21 framework. Truly speaking, there has been in an AHG related to MPEG-21 on Exploration in Wavelet Video Coding with a list of mandates namely, maintain and edit the wavelet codec reference document. The reference software is from MSRA (see MPEG docs, N6914, M12176, M12339).

Besides, MPEG-21 has been dealing with SUPPORT (descriptions) for the negotiation/adaptation framework in which such a video coding system maybe embedded, i.e. MPEG-21 Digital Item Adaptation (DIA). Both MPEG-21 DIA as MPEG-4 (Scalable) AVC are addressed in SUIT. In parallel, the JVT group has also been working on a scalable version of H.264 which would be expected due to close similarities between MPEG-4 AVC and H.264. Recently, JVT/MPEG Video has just edited the 3rd SVC version, JSVM3.0.

Additionally, to address the issue of scalability, a standardization effort for AVC amendment 1: Scalable Video Coding was recently initiated under the supervision of JVT. The current working draft combines the coding primitives of H.264/AVC with an open-loop coding structure employing motion compensated temporal filtering, with Laplacian pyramids to support resolution scalability and with a framework similar to MPEG-4 FGS to support quality scalability. Thanks to advanced inter-layer prediction techniques, the coding efficiency loss incurred by scalability is minimized in this architecture.

Multiple Description Coding

A new family of communication services involving the delivery of image and video packets over broadband and bandwidth-limited error-prone channels has emerged in the last few years. In order to increase the reliability over these types of channels, diversity is commonly resorted to, besides error correction techniques. Multiple Description Coding (MDC) has been introduced to efficiently overcome the channel impairments (erasures and bit-errors) over diversity-based systems allowing the decoders to extract meaningful information from a subset of a bit-stream.

In order to achieve robust communication over unreliable channels, the MDC system has to deliver highly erasure and error-resilient bit-streams. Additionally, video coding systems operating over variable bandwidth channels require fine-grain scalable bit-streams in order to dynamically adapt to the varying network conditions. A progressive MDC algorithm conceived so as to meet these two basic requirements was proposed in the literature and is based on multiple description uniform scalar quantizers (MDUSQ). Further developments in the area include the Embedded Multiple Description Scalar Quantizers (EMDSQ) which, for an erasure channel model, provide a fine-grain rate adaptation, and a progressive quality improvement of the central-channel reconstruction.

One of the main advantages of a scalable video codec is the fact that one bit stream can be adapted such that it can be consumed on a particular terminal and this without the need to re-encode the content. As such SVC helps to enable the Universal Multimedia Access (UMA) paradigm which can be paraphrased as “create once, consume everywhere”. It is generally seen as a good idea to execute the adaptation not on the terminal itself, but by the (active) network or by the content provider; in short by an adaptation engine.

MPEG-21 DIA standardizes various tools which help an adaptation engine to execute. The most important tools in this context are the Usage Environment Description (UED) tool and the Universal Constraints Description (UCD) tool. The former describes the context in which the content will be consumed. This XML-based description can express information about the end-user’s preference and natural environment, the terminal characteristics, and the network. The latter expresses additional constraints on the usage context, e.g., only a part of the terminal’s display might be available to render the content. The adaptation engine can take this information into account to decide upon the most optimal adaptation. Sending the XML-based information from the terminal to the adaptation engine (during the content negotiation phase) results in a new problem: the plain text serialization format of XML generates overhead and leads to required bandwidth. Alternative XML serialization formats, such as binary encoded XML, can solve these issues.


DVB-T/H standard, ETSI EN 300 744, describes the physical transmission system for digital terrestrial television broadcasting which consists basically of two encoders, two interleavers and an OFDM. General speaking, the system input is a transport stream (TS), a stream of 188 bytes encapsulating several services and applications. Real time elementary streams of compressed audio and video are packetized as PES packets. Thus, TSs carry MPEG streams (MPEG-2/4), MPEG-2 DSMCC data structures and signalization data (PSI/SI), all multiplexed according to MPEG-2 systems standard. One interesting DSMCC structure is the multiprotocol encapsulation (MPE) used to transport IP datagrams. The DVB-H standard (TM2977_r4) adds to DVB-T another layer of encoding (Reed-Solomon) IP packets at a cost of increasing the transmission delay. One good advantage of DVB-H is its support for low power devices. We should note that the recent DVB-S2 standard describes several novel tools namely the merger/slicer to transmit generic packets (no TS is required) and the support for an adaptive coding and modulation scheme. Standardization related activities of IP datagrams over MPEG-2 Transport Stream are also underway in IETF’s ipdvb WG. In other way around, DVB signals over IP networks have been studied in some DVB TMs, and in particular compressed video H.264/AVC over IP networks (DVB-AVC).


DVB-Return Channel Terrestrial has been endorsed as an ETSI DVB-RCT (EN 301 958) standard which was published in 2002 and accepted as a global standard in countries where DVB-T standard has been adopted. DVB-RCT was designed to meet commercial requirements for a low cost, high capacity and efficient wireless interaction channel for Interactive DTT operating in the VHF/UHF bands. It can be deployed in small cells, to implement denser networks of up to 3.5 km radius, providing the user a bitrate capacity of up to several Mbps. DVB-RCT utilizes an innovative concept based on Multiple Access OFDM (OFDMA) for broadband and mobile services and based on a two-dimensional grid that combines time and frequency division access techniques.

Aggregate data rate (system capacity) on the return channel can reach 26 Mbit/s using 8 MHz bandwidth channels. System capacity can be shared by a large community of users in the same time through splitting of the OFDM symbol comprising 1k or 2k FFT size and thus offering dynamic allocation of the sub-channels, each comprising one carrier, 4 carriers or 29 carriers according to user requirement for bandwidth. DVB-RCT has been designed to comply with mobility requirements and to take advantage of the DVB-T mobility features that has been substantiated in ACTS MOTIVATE project. DVB-RCT has been upgraded and the new version, as an extension of the standard to mobility, has been recently validated in field tests and has demonstrated an excellent performance up to 150Km/h. Coding techniques (Permuted OFDMA) are employed for the selection and grouping of carriers into sub-channels in order to minimize interference among users` burst messages received by the base station. DVB-RCT makes use of various error correction schemes (Turbo codes or Reed Solomon + Convolutional punctured) and different modulation constellations (n-QAM). Added to that is the employment of various adaptive strategies on the Base Station side such as, dynamic resource allocation (BoD) to users (n Sub-Channels), selection of optimal modu¬lation order (n-QAM) and coding scheme which best fit the requirement for maintaining a reli¬able communication between the user and the Base Station. Besides, Permuted OFDMA provides intra-cell immunity, resulting in a low probability of collisions between sub-channels assigned to different users.


WiMAX is a wireless Metropolitan Area Network (MAN) standard. The air interface of IEEE 802.16 for fixed Broadband Wireless Access Systems is also known as the IEEE Wireless MAN air interface. The IEEE802.16 standard covers frequency bands between 2-66GHz. This is divided into the IEEE 802.16a, d, and e standards for frequency bands between 2GHz and 11GHz to enable non line-of-sight performance, and mobility. The IEEE802.16e extension enables nomadic capabilities for laptops and other mobile devices allowing users to benefit from metro area portability of an xDSL-like service. Both FDD and TDD operations are supported. A WiMAX profile is a subset of IEEE 802.16 adopted for use in a country or countries to ensure interoperability. The WiMAX frequency ranges can be licensed or license-exempt.

Portable Internet Access named Wibro is a WiMAX profile based on the IEEE802.16e draft 3 standard. It is allocated the 2.3 GHz band by the Ministry of Information and Communication (MIC), Korea. This is to provide a high data rate wireless Internet access with PSS (Personal Subscriber Station) under the stationary or mobile environment, anytime, and any where. The portable Internet project group (PG302), which was established in June 2003 approved OFDMA with TDD frame of size 5 ms in a 10 MHz channel. It is to provide service for a coverage radius of 1 km for mobiles up to 60 km/hr. with a frequency re-use of 1/1. It uses QPSK, 16-QAM and 64-QAM modulation in the down link (DL) and supports a peak data rate of 18 Mbps. In the up-link (UL) there is a 64-QAM modulation, and supports a peak data-rate of 6.1Mbps. Convolutional turbo code (CTC) is used as the FEC. Good coverage is supported by introducing safety channels at the cell boundaries to reduce interference. H-ARQ is used to improve link efficiency. Users with low mobility use the so-called band selection AMC subchannel (contiguous tones) to increase the transmission rate, and fast mobile users with poorer channel uses diversity subchannel (tones randomized and sufficiently separated) to improve frequency diversity. Fast AMC is based on full CQI or differential CQI, and achieves a peak user data rate of 3 Mbps and 1 Mbps in the DL and UL respectively. The corresponding spectral efficiency is 0.4b/s/Hz. It supports flexible bandwidth allocation on frame-by-frame basis, and also supports flexible quality of services such as real time polling, non-real time polling and best effort service. It has a sleep mode that will reduce terminal power consumption. Moreover, for low mobility users, an optional TDD smart antenna feature is used to increase coverage and transmission