Capture and Representation in Image Formation



Key Takeaway:

Capture and representation are the foundational stages that transform a real-world scene into a digital image suitable for subsequent processing and analysis.

1. Image Capture

1.1 Physical Interaction and Illumination

  • A scene comprises objects illuminated by natural or artificial light sources.

  • Light reflected from or transmitted through the scene carries information about shape, color, and texture.




1.2 Optical System

  • A lens focuses incoming light rays onto the image sensor.

  • The aperture adjusts the light flux, trading off depth of field and brightness.

  • The shutter controls exposure time, influencing motion blur and noise levels.

1.3 Sensor Technologies

  • CCD (Charge-Coupled Device) and CMOS (Complementary Metal-Oxide Semiconductor) arrays consist of millions of photodiodes that convert photons into analog voltages1.

  • Line-scan sensors capture one row at a time (common in scanners); area-scan sensors capture the full 2D scene at once.

1.4 Digitization: Sampling and Quantization

  • Sampling discretizes the continuous spatial domain into an M×NM \times N grid of pixels (spatial resolution).

  • Quantization maps each pixel’s analog voltage into a finite set of levels (gray-level or color resolution).

  • E.g., an 8-bit grayscale image uses values 0–255 per pixel2.

  • Higher spatial or gray-level resolution improves fidelity but increases data size and noise sensitivity.

2. Image Representation

2.1 Pixel-Based Models

  • A digital image is a 2D function I(u,v)I(u,v) where u,v{0,,M1}×{0,,N1}u,v\in\{0,\dots,M-1\}\times\{0,\dots,N-1\} and

I(u,v){0,1,,2B1}, I(u,v)\in\{0,1,\dots,2^B-1\},

with BB = number of bits per pixel3.

  • Grayscale: Single channel of intensities.

  • RGB Color: Three channels—Red, Green, Blue—each quantized separately.

2.2 Data Structures and File Formats

  • Bitmap (raster) stores raw pixel arrays (e.g., BMP, TIFF, PNG).

  • Compressed formats exploit redundancy via lossless (PNG) or lossy (JPEG) schemes.

  • Vector graphics (SVG) represent shapes mathematically—more suited to diagrams than natural images4.

2.3 Mathematical Models

  • Pinhole Camera Model:

    x=fXZ,y=fYZ, x = f\,\frac{X}{Z},\quad y = f\,\frac{Y}{Z},

    where (X,Y,Z)(X,Y,Z) are world coordinates, (x,y)(x,y) image plane coordinates, and ff focal length.

  • Lens Distortion: Real lenses introduce radial and tangential distortion functions that must be calibrated and corrected.

Understanding capture and representation is critical: it determines image fidelity, influences noise characteristics, and underpins all higher-level vision algorithms.