Read an Excerpt
DIGITAL VIDEO AND HDALGORITHMS AND INTERFACES
By CHARLES POYNTON
Morgan KaufmannCopyright © 2012 ELSEVIER INC.
All right reserved.
Chapter OneRaster images
A digital image is represented by a rectangular array (matrix) of picture elements (pels, or pixels). Pixel arrays of several image standards are sketched in Figure 1.1. In a greyscale system each pixel comprises a single component whose value is related to what is loosely called brightness. In a colour system each pixel comprises several components – usually three – whose values are closely related to human colour perception.
Historically, a video image was acquired at the camera, conveyed through the channel, and displayed using analog scanning; there was no explicit pixel array. Modern cameras and modern displays directly represent the discrete elements of an image array having fixed structure. Signal processing at the camera, in the pipeline, or at the display may perform spatial and/or temporal resampling to adapt to different formats.
The pixel array is for one image is a frame. In video, digital memory used to store one image is called a framestore; in computing, it's a framebuffer. The total pixel count in an image is the number of image columns NC (or in video, samples per active line, SAL) times the number of image rows NR (or active lines, LA). The total pixel count is usually expressed in megapixels (Mpx).
In video and in computing, a pixel comprises the set of all components necessary to represent colour (typically red, green, and blue). In the mosaic sensors typical of digital still cameras (DSCs) a pixel is any colour component individually; the process of demosaicking interpolates the missing components to create a fully populated image array. In digital cinema cameras the DSC interpretation of pixel is used; however, in a digital cinema projector, a pixel is a triad.
The value of each pixel component represents brightness and colour in a small region surrounding the corresponding point in the sampling lattice.
Pixel component values are quantized, typically to an integer value that occupies between 1 and 16 bits – and often 8 or 10 bits – of digital storage. The number of bits per component, or per pixel, is called the bit depth. (We use bit depth instead of width to avoid confusion: The term width refers to the entire picture.)
Aspect ratio is simply the ratio of an image's width to its height. Standard aspect ratios for film and video are sketched, to scale, in Figure 1.2. What I call simply aspect ratio is sometimes called display aspect ratio (DAR) or picture aspect ratio (PAR). Standard-definition (SD) television has an aspect ratio of 4:3.
Equation 1.1 relates picture and sample aspect ratios. To assign n square-sampled pixels to a picture having aspect ratio AR, choose image column and image row counts (c and r, respectively) according to Equation 1.2.
Cinema film commonly uses 1.85:1 (which for historical reasons is called either flat or spherical), or 2.4:1 ("CinemaScope," or colloquially, 'scope). Many films are 1.85:1, but "blockbusters" are usually 2.4:1. Film at 2.4:1 aspect ratio was historically acquired using an aspherical lens that squeezes the horizontal dimension of the image by a factor of two. The projector is equipped with a similar lens, to restore the horizontal dimension of the projected image. The lens and the technique are called anamorphic. In principle, an anamorphic lens can have any ratio; in practice, a ratio of exactly two is ubiquitous in cinema.
Widescreen refers to an aspect ratio wider than 4:3. High-definition (HD) television is standardized with an aspect ratio of 16:9. In video, the term anamorphic usually refers to a 16:9 widescreen variant of a base video standard, where the horizontal dimension of the 16:9 image occupies the same width as the 4:3 aspect ratio standard. Consumer electronic equipment rarely recovers the correct aspect ratio of such conversions (as we will explore later in the chapter.)
HD is standardized with an aspect ratio of 16:9 (about 1.78:1), fairly close to the 1.85:1 ordinary movie aspect ratio. Figure 1.3 below illustrates the origin of the 16:9 aspect ratio. Through a numerological coincidence apparently first revealed by Kerns Powers, the geometric mean of 4:3 (the standard aspect ratio of conventional television) and 2.4 (the aspect ratio of a CinemaScope movie) is very close – within a fraction of a percent – to 16:9. (The calculation is shown in the lower right corner of the figure.) A choice of 16:9 for HD meant that SD, HD, and CinemaScope shared the same "image circle": 16:9 was a compromise between the vertical cropping required for SD and the horizontal cropping required for CinemaScope.
In mathematics, coordinate values of the (two-dimensional) plane range both positive and negative. The plane is thereby divided into four quadrants (see Figure 1.4). Quadrants are denoted by Roman numerals in the counterclockwise direction. In the continuous image plane, locations are described using Cartesian coordinates [x, y] – the first coordinate is associated with the horizontal direction, the second with the vertical. When both x and y are positive, the location is in the first quadrant (quadrant I). In image science, the image lies in this quadrant. (Adobe's Postscript system uses first-quadrant coordinates.)
In matrix indexing, axis ordering is reversed from Cartesian coordinates: A matrix is indexed by row then column. The top row of a matrix has the smallest index, so matrix indices lie in quadrant IV. In mathematics, matrix elements are ordinarily identified using 1-origin indexing. Some image processing software packages use 1-origin indexing – in particular, matlab and Mathematica, both of which have deep roots in mathematics. The scan line order of conventional video and image processing usually adheres to the matrix convention, but with zero-origin indexing: Rows and columns are usually numbered [r, c] from [0, 0] at the top left. In other words, the image is in quadrant IV (but eliding the negative sign on the y-coordinate), but ordinarily using zero-origin indexing.
Digital image sampling structures are denoted width × height. For example, a 1920 ×1080 system has columns numbered 0 through 1919 and rows (historically, "picture lines") numbered 0 through 1079.
In human vision, the three-dimensional world is imaged by the lens of the eye onto the retina, which is populated with photoreceptor cells that respond to light having wavelengths ranging from about 400 nm to 700 nm. In video and in film, we build a camera having a lens and a photosensitive device, to mimic how the world is perceived by vision. Although the shape of the retina is roughly a section of a sphere, it is topologically two dimensional. In a camera, for practical reasons, we employ a flat image plane, sketched in Figure 1.5 above, instead of a section of a sphere. Image science involves analyzing the continuous distribution of optical power that is incident on the image plane.
Signals captured from the physical world are translated into digital form by digitization, which involves two processes: sampling (in time or space) and quantization (in amplitude), sketched in Figure 1.6 below. The operations may take place in either order, though sampling usually precedes quantization.
Quantization assigns an integer to signal amplitude at an instant of time or a point in space, as I will explain in Quantization, on page 37. Virtually all image exchange standards – TIFF, JPEG, SD, HD, MPEG, H.264 – involve pixel values that are not proportional to light power in the scene or at the display: With respect to light power, pixel values in these systems are nonlinearly quantized.
A continuous one-dimensional function of time, such as audio sound pressure level, is sampled through forming a series of discrete values, each of which is a function of the distribution of a physical quantity (such as intensity) across a small interval of time. Uniform sampling, where the time intervals are of equal duration, is nearly always used. (Details will be presented in Filtering and sampling, on page 191.)
A continuous two-dimensional function of space is sampled by assigning, to each element of the image matrix, a value that is a function of the distribution of intensity over a small region of space. In digital video and in conventional image processing, the samples lie on a regular, rectangular grid.
Analog video was not sampled horizontally; however, it was sampled vertically by scanning and sampled temporally at the frame rate. Historically, samples were not necessarily digital: CCD and CMOS image sensors are inherently sampled, but they are not inherently quantized. (On-chip analog-to-digital conversion is now common in CMOS sensors.) In practice, though, sampling and quantization generally go together.
A perceptual quantity is encoded in a perceptually uniform manner if a small perturbation to the coded value is approximately equally perceptible across the range of that value. Consider the volume control on your radio. If it were physically linear, the roughly logarithmic nature of loudness perception would place most of the perceptual "action" of the control at the bottom of its range. Instead, the control is designed to be perceptually uniform. Figure 1.7 shows the transfer function of a potentiometer with standard audio taper: Angle of rotation is mapped to sound pressure level such that rotating the knob 10 degrees produces a similar perceptual increment in volume across the range of the control. This is one of many examples of perceptual considerations built into the engineering of electronic systems. (For another example, see Figure 1.8.)
Excerpted from DIGITAL VIDEO AND HD by CHARLES POYNTON Copyright © 2012 by ELSEVIER INC.. Excerpted by permission of Morgan Kaufmann. All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.