- Shopping Bag ( 0 items )
Digital Video and HD: Algorithms and Interfaces provides a one-stop shop for the theory and engineering of digital video systems. Equally accessible to video engineers and those working in computer graphics, Charles Poynton’s revision to his classic text covers emergent compression systems, including H.264 and VP8/WebM, and augments detailed information on JPEG, DVC, and MPEG-2 systems. This edition also introduces the technical aspects of file-based workflows and outlines the emerging domain of metadata, placing it in the context of digital video processing.
With the help of hundreds of high quality technical illustrations, this book presents the following topics:
* Basic concepts of digitization, sampling, quantization, gamma, and filtering
* Principles of color science as applied to image capture and display
* Scanning and coding of SDTV and HDTV
* Video color coding: luma, chroma (4:2:2 component video, 4fSC composite video)
* Analog NTSC and PAL
* Studio systems and interfaces
* Compression technology, including M-JPEG and MPEG-2
* Broadcast standards and consumer video equipment
A digital image is represented by a rectangular array (matrix) of picture elements (pels, or pixels). Pixel arrays of several image standards are sketched in Figure 1.1. In a greyscale system each pixel comprises a single component whose value is related to what is loosely called brightness. In a colour system each pixel comprises several components – usually three – whose values are closely related to human colour perception.
Historically, a video image was acquired at the camera, conveyed through the channel, and displayed using analog scanning; there was no explicit pixel array. Modern cameras and modern displays directly represent the discrete elements of an image array having fixed structure. Signal processing at the camera, in the pipeline, or at the display may perform spatial and/or temporal resampling to adapt to different formats.
The pixel array is for one image is a frame. In video, digital memory used to store one image is called a framestore; in computing, it's a framebuffer. The total pixel count in an image is the number of image columns NC (or in video, samples per active line, SAL) times the number of image rows NR (or active lines, LA). The total pixel count is usually expressed in megapixels (Mpx).
In video and in computing, a pixel comprises the set of all components necessary to represent colour (typically red, green, and blue). In the mosaic sensors typical of digital still cameras (DSCs) a pixel is any colour component individually; the process of demosaicking interpolates the missing components to create a fully populated image array. In digital cinema cameras the DSC interpretation of pixel is used; however, in a digital cinema projector, a pixel is a triad.
The value of each pixel component represents brightness and colour in a small region surrounding the corresponding point in the sampling lattice.
Pixel component values are quantized, typically to an integer value that occupies between 1 and 16 bits – and often 8 or 10 bits – of digital storage. The number of bits per component, or per pixel, is called the bit depth. (We use bit depth instead of width to avoid confusion: The term width refers to the entire picture.)
Aspect ratio is simply the ratio of an image's width to its height. Standard aspect ratios for film and video are sketched, to scale, in Figure 1.2. What I call simply aspect ratio is sometimes called display aspect ratio (DAR) or picture aspect ratio (PAR). Standard-definition (SD) television has an aspect ratio of 4:3.
Equation 1.1 relates picture and sample aspect ratios. To assign n square-sampled pixels to a picture having aspect ratio AR, choose image column and image row counts (c and r, respectively) according to Equation 1.2.
Cinema film commonly uses 1.85:1 (which for historical reasons is called either flat or spherical), or 2.4:1 ("CinemaScope," or colloquially, 'scope). Many films are 1.85:1, but "blockbusters" are usually 2.4:1. Film at 2.4:1 aspect ratio was historically acquired using an aspherical lens that squeezes the horizontal dimension of the image by a factor of two. The projector is equipped with a similar lens, to restore the horizontal dimension of the projected image. The lens and the technique are called anamorphic. In principle, an anamorphic lens can have any ratio; in practice, a ratio of exactly two is ubiquitous in cinema.
Widescreen refers to an aspect ratio wider than 4:3. High-definition (HD) television is standardized with an aspect ratio of 16:9. In video, the term anamorphic usually refers to a 16:9 widescreen variant of a base video standard, where the horizontal dimension of the 16:9 image occupies the same width as the 4:3 aspect ratio standard. Consumer electronic equipment rarely recovers the correct aspect ratio of such conversions (as we will explore later in the chapter.)
HD is standardized with an aspect ratio of 16:9 (about 1.78:1), fairly close to the 1.85:1 ordinary movie aspect ratio. Figure 1.3 below illustrates the origin of the 16:9 aspect ratio. Through a numerological coincidence apparently first revealed by Kerns Powers, the geometric mean of 4:3 (the standard aspect ratio of conventional television) and 2.4 (the aspect ratio of a CinemaScope movie) is very close – within a fraction of a percent – to 16:9. (The calculation is shown in the lower right corner of the figure.) A choice of 16:9 for HD meant that SD, HD, and CinemaScope shared the same "image circle": 16:9 was a compromise between the vertical cropping required for SD and the horizontal cropping required for CinemaScope.
In mathematics, coordinate values of the (two-dimensional) plane range both positive and negative. The plane is thereby divided into four quadrants (see Figure 1.4). Quadrants are denoted by Roman numerals in the counterclockwise direction. In the continuous image plane, locations are described using Cartesian coordinates [x, y] – the first coordinate is associated with the horizontal direction, the second with the vertical. When both x and y are positive, the location is in the first quadrant (quadrant I). In image science, the image lies in this quadrant. (Adobe's Postscript system uses first-quadrant coordinates.)
In matrix indexing, axis ordering is reversed from Cartesian coordinates: A matrix is indexed by row then column. The top row of a matrix has the smallest index, so matrix indices lie in quadrant IV. In mathematics, matrix elements are ordinarily identified using 1-origin indexing. Some image processing software packages use 1-origin indexing – in particular, matlab and Mathematica, both of which have deep roots in mathematics. The scan line order of conventional video and image processing usually adheres to the matrix convention, but with zero-origin indexing: Rows and columns are usually numbered [r, c] from [0, 0] at the top left. In other words, the image is in quadrant IV (but eliding the negative sign on the y-coordinate), but ordinarily using zero-origin indexing.
Digital image sampling structures are denoted width × height. For example, a 1920 ×1080 system has columns numbered 0 through 1919 and rows (historically, "picture lines") numbered 0 through 1079.
In human vision, the three-dimensional world is imaged by the lens of the eye onto the retina, which is populated with photoreceptor cells that respond to light having wavelengths ranging from about 400 nm to 700 nm. In video and in film, we build a camera having a lens and a photosensitive device, to mimic how the world is perceived by vision. Although the shape of the retina is roughly a section of a sphere, it is topologically two dimensional. In a camera, for practical reasons, we employ a flat image plane, sketched in Figure 1.5 above, instead of a section of a sphere. Image science involves analyzing the continuous distribution of optical power that is incident on the image plane.
Signals captured from the physical world are translated into digital form by digitization, which involves two processes: sampling (in time or space) and quantization (in amplitude), sketched in Figure 1.6 below. The operations may take place in either order, though sampling usually precedes quantization.
Quantization assigns an integer to signal amplitude at an instant of time or a point in space, as I will explain in Quantization, on page 37. Virtually all image exchange standards – TIFF, JPEG, SD, HD, MPEG, H.264 – involve pixel values that are not proportional to light power in the scene or at the display: With respect to light power, pixel values in these systems are nonlinearly quantized.
A continuous one-dimensional function of time, such as audio sound pressure level, is sampled through forming a series of discrete values, each of which is a function of the distribution of a physical quantity (such as intensity) across a small interval of time. Uniform sampling, where the time intervals are of equal duration, is nearly always used. (Details will be presented in Filtering and sampling, on page 191.)
A continuous two-dimensional function of space is sampled by assigning, to each element of the image matrix, a value that is a function of the distribution of intensity over a small region of space. In digital video and in conventional image processing, the samples lie on a regular, rectangular grid.
Analog video was not sampled horizontally; however, it was sampled vertically by scanning and sampled temporally at the frame rate. Historically, samples were not necessarily digital: CCD and CMOS image sensors are inherently sampled, but they are not inherently quantized. (On-chip analog-to-digital conversion is now common in CMOS sensors.) In practice, though, sampling and quantization generally go together.
A perceptual quantity is encoded in a perceptually uniform manner if a small perturbation to the coded value is approximately equally perceptible across the range of that value. Consider the volume control on your radio. If it were physically linear, the roughly logarithmic nature of loudness perception would place most of the perceptual "action" of the control at the bottom of its range. Instead, the control is designed to be perceptually uniform. Figure 1.7 shows the transfer function of a potentiometer with standard audio taper: Angle of rotation is mapped to sound pressure level such that rotating the knob 10 degrees produces a similar perceptual increment in volume across the range of the control. This is one of many examples of perceptual considerations built into the engineering of electronic systems. (For another example, see Figure 1.8.)
Excerpted from DIGITAL VIDEO AND HD by CHARLES POYNTON Copyright © 2012 by ELSEVIER INC.. Excerpted by permission of Morgan Kaufmann. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
Part 1 - Introduction
Chapter 1 - Raster Images
Chapter 2 - Quantization
Chapter 3 - Brightness Contrast Controls
Chapter 4 - Raster Images in Computing
Chapter 5 - Raster Scanning
Chapter 6 - Image Structure
Chapter 7 - Resolution
Chapter 8 - Constant Luminance
Chapter 9 - Rendering Intent
Chapter 10 - Introduction to Luma Chroma
Chapter 11 - Introduction to Component SDTV
Chapter 12 - Introduction to Composite NTSC PAL
Chapter 13 - Introduction to HDTV
Chapter 14 - Introduction to Compression
Chapter 15 - Digital Video Interfaces
Part 2 - Principles
Chapter 16 - Filtering and Sampling
Chapter 17 - Resampling, Interpolation, and decimation
Chapter 18 - Image Digitization and Reconstruction
Chapter 19 - Perception and Visual Acuity
Chapter 20 - Luminance and Lightness
Chapter 21 - The CIE System of Colorimetry
Chapter 22 - Color Science for Video
Chapter 23 - Gamma
Chapter 24 - Luma and Color Differences
Chapter 25 - Component Video Color Coding for SDTV
Chapter 26 - Component Video Color Coding for HDTV
Chapter 27 - NTSC PAL Chroma Modulation
Chapter 28 - NTSC PAL Frequency Interleaving
Chapter 29 - NTSC Y'IQ System
Chapter 30 - Frame, Field, Line, and Sample Rates
Chapter 31 - Timecode
Chapter 32 - Video Signal Structure
Chapter 33 - Digital Sync., TRS, Ancillary Data, and Interface
Chapter 34 - Analog SDTV Sync, Genlock, and Interface
Chapter 35 - Videotape Recording
Chapter 36 - 2-3 Pulldown
Chapter 37 - Deinterlacing
Part 3 - Video Compression
Chapter 38 - JPEG and Motion-JPEG Compression
Chapter 39 - MPEG-2 Video Compression
Part 4 - Studio Standards
Chapter 40 - 525/59.94 Component Video
Chapter 41 - 525/59.94 NTSC Composite Video
Chapter 42 - 625/50 Component Video
Chapter 43 - 625/50 PAL Composite Video
Chapter 44 - SDTV Test Signals
Chapter 45 - 1280x720 HDTV
Chapter 46 - 1920x1080 HDTV
Chapter 47 - Electrical and Mechanical Interfaces
Part 5 - Broadcast Consumer Video
Chapter 48 - Analog NTSC nbsp; PAL Broadcast Standards
Chapter 49 - Consumer Analog NTSC PAL
Chapter 50 - Digital Television Broadcast Standards
A - YUV and Luminance Considered Harmful
B - Introduction to Radiometry Photometry
C - Glossary of Video Signal Terms