Digital Video Compression
This page describes the compression formats used for digital video from a user's point of view (speed, quality, and similar issues) rather than the technical issues (e.g. how compression works). It also has a bit about the various digital video formats. If you're interested in analog formats too, see this page.
This is mainly intended to provide guidelines for anyone wishing to convert digital home movies into QuickTime format for sending to others, whether by email, through a web site, or on a CD-R or similar method.
The Tough Decisions
When compressing video, you have to select values for things like frame rate, frame size, "quality" values and other options that are not particularly easy to guess.
Two Different Goals
Your choices will almost always be based on trying to meet one or the other of the two following goals:
- Best possible quality for a given fixed data rate.
- Best possible quality for a given fixed total data size.
Usually you are limited by only one of these. For example, if you are making a video that people can play over a modem connection, you are concerned with the data rate of the movie (bytes per second), rather than its total size. On the other hand, if you're making the movie to go on a CD-ROM, you are concerned about how much space the movie file takes up, rather than the actual number of bytes per second.
In general, your constraint depends on the distribution method of your digital video:
|If you are distributing the video via||You have a fixed||Typical example|
|total data size||keep it below 1 megabyte|
|"streaming" (playing while transmitting) via modem||data rate||keep it below 4 K bytes per second|
|"streaming" (playing while transmitting) via broadband||data rate||keep it below 30 K bytes per second|
|download from web site then play from local hard disk||total data size||keep it below 10 MB (available space on web server)|
|movie file on a CD-ROM||total data size||keep it below 600 MB|
Note that the total data size limit is really a data rate limit in disguise — if you have a total size limit, you can divide by the length of the video to get a data rate limit:
data rate = total size / length in seconds
For example, if the total size is limited to 2 MB and the length of the video is 45 seconds, the data rate limit will be about 44 K bytes per second (because 2,000,000 divided by 45 is 44,444)
In early 2002, there were 9 codecs suitable for the task of encoding (compressing) general-purpose video at resolutions of 320x240 or greater and at internet or CD-ROM data rates. The codecs are (in alphabetical order):
- 3ivX Delta 3.5, available from www.3ivx.com
- DIVX codec a.k.a. "OpenDivX.component" by Adrian Bourke (I forget where I found this, if you really want it search for "OpenDivXInstall.sit", but as you'll see it wasn't very good)
- H.263 (included in QuickTime) (Note: Apple VC H.263 is a modified version used for iChat AV, which doesn't seem to work for creating movie files)
- MPEG-4 (if you have MacOS 10.2 "Jaguar" and Quicktime Pro 6)
- On2 VP3 Video 3.2, available from www.vp3.com
- Pixlet (if you have MacOS 10.3 "Panther")
- Sorenson Video version 2.20.304 (included in QuickTime)
- Sorenson Video version 3.10.101 (Standard) (included in QuickTime)
- ZyGoVideo Basic, available from www.zygovideo.com
I tested each of these codecs under MacOS X on a G4/800 machine (new iMac, and iBook G4) and all but the most recent 2 on a G3/300 (Blue and White G3 desktop). All tests used the same source material, a 6-minute iMovie project consisting of a wide variety of different types of material, listed here in order of increasing "difficulty":
- Traditional fixed-camera theatre (Monty Python), 41 seconds
- Traditional animation (the Simpsons), 83 seconds
- Photo collage animation (Monty Python), 26 seconds
- Fixed-camera with overlaid fine-print text, 18 seconds
- Basketball game, 29 seconds (includes a few "flashbulb frames" which cause big problems with certain codecs)
- 3 commercial spots with a large amount of subject motion (AT&T mlife "speedskaters", Coca-Cola "2002 Olympics day 17", Spider-Man movie "trailer #2 30-second version"), 90 seconds
- First-person shots with continuous fast camera motion (CBS "Amazing Race 2" and my own trip to Alaska), 71 seconds
For each codec, I adjusted the settings and compressed over and over again until I found settings that created output at 160 K bytes per second, on a 320x240 image size. (This data rate was chosen based on the goal of getting 1 hour of video on a CD-ROM, while leaving a safety margin.) For each compressed movie the audio settings were IMA 4:1, Stereo, 22050 Hz.
I viewed the resulting compressed movies, looking for any particularly notable flaws. The flaws were used to generate a list of things to look for to fairly grade the codecs — for example, if codec A failed to show the wood floor on the basketball court clearly, then a rule was made for assigning a grade for each codec's handling of the court floor. After identifying all such "critical points" on the 6-minute video, I graded the codecs on these points.
I also measured the CPU time needed to encode (compress) and decode (play) on the G3/300 and G4/800 systems. For encoding, the total time taken to compress was measured, with iMovie in the foreground and nothing else (aside from the Finder) running. For playback, a perl script in a Terminal window measured the CPU usage of the QuickTime Player while it played the movie twice through. On the G3/300 the movie was played at Normal size; on the G4/800 the movie was played at Double size. For each, the average and peak CPU usage are given.
|actual data rate||173.5||151.2||159.8||164.5||177.8||159.7||159.5||154.6|
|key frame every||n/a||60||60||NO||n/a||60||60||60|
|limit data rate to||n/a||160||140||298||n/a||140||137||200|
encode time (G4/800)
|playback CPU (G4/800)|
|playback peak CPU (G4/800)|
Overall quality (detail)
|NBA court floor 0:07||B+||C||B||D||D-||D||D||B|
|NBA flash-photo frames 0:11||A||A||C+||C-||E||D-||C+||B-|
1 Sometimes and for no apparent reason, parts of the frame become blocky as if a forced keyframe were being generated.
2 Occasionally, a new shot takes several frames to become well-resolved.
3 This is the only codec that dropped frames when playing a 320x240x30 movie at double-size on the 800-MHz G4 iMac.
4 See below (applies to old G3/300 tests only)
5 This codec does not smooth the image when playing at double size.
6 This codec actually failed with a fatal error (causing iMovie to stop exporting part-way through the movie, but not causing iMovie to "crash") on the NBA flash-frames. It fails only if the "limit data rate" is set lower than about 150K bytes per second. I was able to get around this problem by moving the NBA clips to the beginning of the iMovie.
7 See below (applies to old G3/300 tests only)
8 This codec allows only very coarse control of the quality and data rate — a quality setting of 83 produced a data rate of 144.3 Kbps. For the comparison, quality 91 was used because the codec doesn't allow any other quality setting in between 83 and 91, and doesn't allow the "limit data rate" parameter.
9 This codec has no delta frames — every frame is a keyframe, resulting in its very poor quality scores (see below for fuller explanaton).
10 This codec also produced excellent results at a 640x480 frame size and the same data rate, but the lack of a deinterlace filter for iMovie makes it less desirable. If you're interested, the settings to use are framesize 640x480, quality 99, keyframe 90, datarate 140.
The 3ivX and H.263 codecs allow limited control of the quality setting — some values are available and others are not because the slide-bar control gets "snapped" to certain values. This sometimes makes it impossible to match a target data rate.
Key Frame Every
In the On2 codec you should leave this setting turned off, because On2 has its own keyframe generation rules (settable through the Options panel). For the other codecs that allow it, you should have keyframes turned on, because if you don't the codec will make every frame a keyframe. This makes playback in reverse and random seeks ("scrubbing") work well, but causes either a severe loss of quality or a severe increase in data rate, or both.
Limit Data Rate To
3ivX does not allow setting this value, and H.263 ignores it. Of the others, only ZyGo actually adheres to the value you give it. The others give either a higher rate or a lower rate, forcing you to adjust the value and try again until you get the result you want.
These are simply the amount of time (in minutes) it took to export the iMovie to QuickTime. The movie was 6 minutes long, so you can see that even the G4/800 failed to encode in "real time"; the fastest codec (On2) achieved a 2:1 ratio. You can also see that some codecs have been optimized for the G4 (AltiVec) and others have not.
This is the total amount of CPU time that was used by QuickTime player while playing the movie two times through. On the G4/800 the movie was played at double size, and on the G3/300 it was played at Normal size. I measured this by writing a perl script that monitored the player via the ps command, and took an average of the CPU percentage during the entire 12-minute period.
Playback Peak CPU
This is an average of the three highest individual CPU percentage values measured by the perl script during the entire 12-minute playback period. The ratio between the average and peak CPU values shows how much variability the codec exhibits when playing different types of material. This can sometimes be a result of variable data rate (such as with the 3ivX codec, which increases or decreases the data rate as much as it has to to match the quality setting) or it can be the result of variations in the types of drawing that need to be performed to deal with different types of frames (fast motion with low detail versus low motion with high detail). Codecs with less variation can be preferable if you want to make sure your videos will always play back smoothly on slower computers.
Discussion of the Grades
These grades are for the G3/300 playback at Normal size. I gave an "A" if no frames were dropped at any time during two repeated playbacks of the entire 6-minute video. A "B" indicates that there were only one or two dropped frames, "C" for occasional periods of a second or two, and "D" if it's dropping frames all the time.
Problems With Specific Codecs
Apple Pixlet performed very poorly because it is optimized for playing huge framesizes quickly, not for compressing small framesizes efficiently. Apple's goal with Pixlet was to make something that could play "studio-grade" frames (1920x1080 resolution at 48 bits per pixel) with no delta frames (allowing scrubbing without latency) with the processing power (soon to be) available on a desktop computer. (The best a current home computer can do with other codecs at this frame rate is something like 720x480 DV, which also is not compressed very much. For those who remember Quicktime 1.0, its early codecs like Apple Video had the same goals and similar drawbacks.) Pixlet compresses a lot better than DV, but not nearly as well as delta-frame based codecs, however it can play at full framerate in reverse with 960x540 frames, or bigger frames on faster processors (like a dual 2 GHz G5). Like all codecs, including still-image compressors like JPEG, the compression efficiency goes up as the framesize goes up. This is because typical images have fractal-like statistical properties (when you quadruple the number of pixels, you do not quadruple the amount of information in the pixels, because smooth areas in the image require no additional information). So, although you need a high data rate to get good results from Pixlet at 320x240, you don't need a much higher data rate to get good results from Pixlet at higher resolution, like 640x480.
The OpenDIVX codec dropped the same number of frames on the easiest material as it did on the hardest, playing 25-28 frames per second throughout. This would normally get a "C" grade but I gave it a "D" because of the really bad artifacts that happen whenever it drops frames.
Only one codec (ZyGo) failed to play at Double size on the G4/800.
This grade is based on my overall impression of the movie quality while watching the movie play at double size on the G4/800. The grade mostly reflects the codec's handling of detail. An "A" was given if absolutely no "blocky edges" or missing (blurred out) detail was noticed. A "C" is given if the video is still watchable, that is if the blockiness does not interfere with general enjoyment and does not hide any essential, critical details.
NBA Court Floor
Most of the codecs had some trouble with the texture of the wooden floor in the long shots of the basketball sequence. In these shots, taken from center-court about 20 rows up from the sidelines, the camera shows about 1/3 of the court length (e.g. from the center line to one of the free-throw lines) and the court fills one half of the (vertical) height of the frame. In such shots, the floorboards are clearly visible at 320x240 resolution but are of a subtle light to medium brown texture. This caused problems with almost all of the codecs — they would show parts of the court floor as a plain perfectly uniform light-brown color, instead of a textured wood grain. I assigned grades based on how much (percent) of the total pixel area of wood texture was preserved.
NBA Flash-Photo Frames
If you single-step through basketball footage, (particularly instant replay shots), you will find occasional frames in which a flashbulb went off, causing everything in the picture to be abnormally bright and/or overexposed for just a single frame. My test video includes three of these, and two of them are just 2 frames apart from each other resulting in a sequence of 5 frames that goes "normal-flash-normal-flash-normal". These flash frames give the codecs trouble because the codec thinks that the flash frame and the frame immediately after it are each the beginning of a new shot, causing the codec to generate a keyframe, often at lower resolution. The codecs that got an "A" showed no degradation at all in these frames — the frames right after a flash and the flash frames themselves all appeared at full resolution, of equal quality to the frames before the flash. The older Sorenson codec handled these extremely poorly, taking 6 or 7 frames to recover from the double-flash sequence. Sorenson 3 handled it somewhat better in terms of quality, but failed with a fatal error when the basketball video was at its original place in the middle of the iMovie. The fatal error caused the iMovie export to abort leaving a partial (but playable) QuickTime movie that ended just after the first flash-frame.
Monty Python Live-Action
Traditional stage play live action (actors on a set or soundstage, slow pans). Footage from the "Biggles Dictates a Letter" sketch of Monty Python's Flying Circus, season 3 episode 7. Several codecs had difficulty dealing with the green venetian blind, which has a significant highlight from reflected stage lighting. In the original, the highlight is smooth and fades gradually as you move from left to right along an individual strip of the venetian blind. It also moves and changes as the venetian blind swings towards and away from the camera. The codecs that did poorly gave the highlight sharp edges.
Several fairly challenging shots (simulating pans, truck shots and the like) from episode DABF09, "The Old Man and the Key".
Three different types of on-screen fine print text (two superimposed in moving backgrounds, one on a plain black screen).
Monty Python Animation
A sequence of the typical photo-collage animation from the same episode as above.
AT&T mlife "Speedskaters"
A commercial that appeared during the 2002 winter Olympics.
Alaska Strip Mall
Amateur DV-video footage, a long pan shot taken in daytime, at varying zoom settings.
Based on the grades it is clear that 3ivX outperforms all of the others at this data rate and resolution, and ZyGo comes in second. However, based on the drawbacks of ZyGo for double-size playback (notes 3 and 5) I did not consider it to be a viable option.
Since 3ivX performed so well at the 320x240 resolution, I proceeded to try some larger sizes to see how much quality I could get while sticking to my target data rate of 160 K bytes per second. I switched the audio encoding to QDesign Music 2 (at 48 k bits per second), which is somewhat poorer quality than IMA 4:1 but uses up only 6K bytes/second, 16K less than the 22K needed by IMA 4:1. This gives an extra 16K bytes per second to the video.
The next logical size above 320x240 is 480x360. However, this size does not work well with 3ivX because 3ivX is highly optimized for frames of sizes that are a multiple of 16x16 pixels. The 480x360 size was effectively unplayable even at Normal size on the G4/800.
In order to have an exact ratio of 4:3 with both dimensions a multiple of 16, the frame must be a multiple of 64x48. The closest matches to 480x360 are 448x336 and 512x384. I only tried the smaller of these because it can be played double-size on a 1024x768 display with the normal QuickTime movie player. This is important to me because I sometimes want my videos to be playable on Windows computers, or in web browsers, and in those situations you cannot necessarily count on being able to use a full-screen player. However, if you do have a full-screen player the 512x384 size is a natural because it is exactly half of the 1024x768 SVGA monitor resolution.
I also tried a few sizes that are closer to 480x360, but are not exact 4:3 ratios. The best of these is 464x352. It is not exactly 4:3, but the difference (1.3182 versus 1.3333) is small enough that it should be very hard to notice. This size allows me to produce movies with 3ivX that play smoothly at double size, coming as near as possible to filling the screen without requiring a special player.
Data Rate Tables for 3ivX
The purpose of these tables is to help compensate for the lack of "true data rate limiting" in the basic (free) version of the 3ivX codec. Examples showing how to use the table are given below.
A: Note that these two videos have similar data rates at low quality settings, but quite different rates at high settings.
B: Notice that tv1 starts lower than tv7, but ends up higher than tv7
Descriptions of the video clips: All sampled from a cable-TV source; the signal went analog to digital (TiVO) back to analog (S-video) and back to digital (Hollywood Dazzle converter):
tv1: U2 "Beautiful Day" video: Lots of cuts, little camera motion, medium object motion
tv2: Chemical Brothers "Star Guitar" music video: Continuous fast motion (one single long shot out the window of a train moving fast)
tv3: "Moulin Rouge" trailer
tv4: "Lord of the Rings" trailer
tv5: "Gosford Park" trailer
tv6: "Amigos" clip
tv7: "Pac-Man World 2" commercial
tv8: "In the Bedroom" trailer
tv9: U2 "Stuck in a Moment" video
Practical use of this table typically involves this two-pass technique:
Situation: I have a 5-minute source video I want to compress, and I want the highest quality that won't exceed 160K bytes per second (600 MB per hour). I try quality level 60 first, and get a movie that has a data rate of 116K. Looking at the chart under the q60 column I see that the closest match is 118.5 (in the tv1 row). This identifies my video as being similar in complexity to tv1. Moving to the right from 118.5 I see 159.5 in the q75 column. Therefore, I know I can match my intended data rate on this video clip by going up to quality 75. I do so, and get a movie that has a data rate of 155. This is the closest I can get to my desired rate, and I only had to encode the video twice to achieve it.
This two-pass technique is the same as that used in the professional compression programs (like Cleaner, Squeeze, etc.), only they do it on smaller parts of the video (e.g. a few frames at a time) to achieve a constant bitrate by varying the quality level and stitch the resulting bits of compressed video together.
Here is the old table, compiled wsith the codecs available under MacOS 10.1 and using the G3/300 and iMac G4/800 for the time benchmarks:
|actual data rate||173.5||165.8||151.2||165.3||167.1||166.3||166.2|
|key frame every||n/a||n/a||60||NO||60||60||60|
|limit data rate to||n/a||176||(160)||202||98||95||160|
encode time (G3/300)
|encode time (G4/800)||13||34||19||12||29||15||17|
|playback CPU (G4/800)||53.0||54.4||51.1||44.8||35.3||46.2||60.1|
|playback peak CPU (G4/800)||81.1||66.3||78.8||67.1||71.3||81.4||71.7|
|playback CPU (G3/300)||80.5||76.8||76.2||78.9||61.2||65.2||80.2|
|playback peak CPU (G3/300)||99.9||88.8||86.8||88.0||75.0||81.2||96.3|
Dropped Frames (G3/300)
|Overall quality (detail)||A-||B-||C+||D||B-||B||A-|
|NBA court floor 0:07||B+||C||C||D||D||D||B|
|NBA flash-photo frames 0:11||A||B||A||C-||D-||C+||B-|
1 through 3: same as for the first table, above.
4 This is the only codec tested that could not handle jog/scrub or backwards playback. Both result from the fact that the decoder does not fully reconstruct the frame on a random seek the way all other codecs do; instead it just applies that frame's deltas to whatever was in the frame buffer before the seek! This problem also shows up whenever the codec drops frames during normal playback (as it did when playing at Normal size on the G3/300)
5 and 6: same as for the first table, above.
7 On the G3/300, when it cannot keep up this codec just plays every frame at a slower rate until it's time for a keyframe and then skips ahead, causing a "jerky slow motion" effect.
This page was written in the "embarrassingly readable" markup language RHTF, and was last updated on 2004 Jan 31. s.27