Author: Roupen Mouradian, Chief Technical Officer
It is no wonder with such a customizable and scalable compression algorithm that we would find the standard for DVD Video.
MPEG-2 has so many options and customizations that you can find a solution for almost any media service with it.
In the Apple development environment, MPEG-2 has been integrated with every application: Final Cut Pro, DVD Studio Pro, Compressor, Motion, After Effects, the list goes on.
But with the flexibility of MPEG-2 comes a sleu of options and configuration parameters to get exactly what you want. This document is an attempt to explain a great deal of these and clarify all the terms you'll find when you attempt to build an MPEG-2 video.
One of the most crucial parts of your MPEG-2 video is determining bit rate. The type of media and the bandwidth of your target audience will greatly determine your bit rate. Most commonly you will be using MPEG-2 for your DVD Video needs. Firstly, you need to make sure at your desired bit rate you do not exceed the maximum size of your media. For single layer DVDs you have a total of 4.7 gigabytes (which converts over to 4.37 gigabytes in binary terms). When encoding your video with MPEG-2 you must specify what bandwidth you want to use per second rather than you desired eventual filesize. Fortunately, Apple has a good suggestion on how to calculate the size of your final video data. When using a single layer DVD (4.7 gb), take the magic number 560 and divide it by the number of minutes your video is. This will give you the resulting megabits per second to encode your video at.
If you are using a dual layer DVD, simply double the number to 1120.
However, your task isn't complete yet. DVD players are required to only support a maximum bandwidth of 10.08 mbps for all video streams, audio streams, and subtitles. This means that if the bitrate at any given time of all these elements exceeds that maximum, you might experience unexpected results.
So, with the original formula, if you have 30 minutes of video on a single layer DVD, this doesn't mean you should encode it at 18.7 megabits per second. This will not be playable on normal DVD players and, on top of this, DVD Studio Pro will not build your DVD and instead give you a warning stating the DVD will not play.
The actual maximum number you will have to compress at will depend on how many video streams you have, subtitle streams, and your audio compression. However, in general, for normal DVDs with just video and compressed AC-3 audio, you can get away with about 9.6 mbps.
CBR? VBR? 1-pass? 2-pass? 9-PASS?
CBR - Constant Bit Rate, VBR - Variable Bit Rate. Now that you understand what they mean, we'll get into the details of what each one will mean for your video. Primarily, all these terms will have to do with how powerful your computers are and how quickly you need your final video.
CBR is a one-pass algorithm. It will go through your movie frame by frame and encode every frame using the same compression. What this means is a predictable filesize in the end and fast encoding. Because CBR doesn't bother to be "smart" and adjust bit rate depending on the content of your video, it's the most basic form of MPEG-2 encoding and will give you the quickest results.
VBR is a one-pass or multi-pass algorithm. What VBR does different from CBR is it tries to understand your movie and, using its pseudo-intelligence, try to allocate higher bit rates to areas with a lot of movement and action, and lower bit rates to areas with higher amounts of action and movement.
For instance, assume you have some video of an individual giving a lecture and showing support slides. In areas where the lecturer is walking around the stage and the camera is moving to follow him, there is a lot of motion. However, say the lecturer is explaining a slide and the camera is fixed on this one slide where no movement is taking place.
You can see how in CBR this would be a waste because the same bit rate would be used for a high movement situation such as the lecturer walking as would be used for a low movement situation such as viewing the slide.
With variable bit rate calculations, the MPEG-2 compressor will adjust bit rates and use a lower one for the parts where the slide is in play and higher ones for the parts where the lecturer is moving around.
However, the ability for the VBR encoding to work to its maximum effect depends on how many passes you allow it to take through the video. With every pass it makes a more detailed internal report of motion estimations throughout the course of the video.
With one pass, it simply tries to guess and learn as it goes along the video and adjust as well as it can. This leads to better results than CBR, but it can also lead to unpredictable file sizes because it doesn't know enough about the video to accurately allocate bit rates across the whole movie. These file size differences will very rarely affect your workflow, however.
With multiple passes, everytime motion estimation gets better. Apple compressor offers up to a 2-pass VBR. On the first pass, no actual video is written. Rather information is only gathered for the second pass. With this more comprehensive motion information, you can achieve more optimal bit rates and more reliable final file sizes. Though this is the best approach for your MPEG-2 encoding, it is also the most time consuming.
2-Pass VBR tends to take about 1.75x longer than VBR and about 2x longer than CBR.
Although crucial to the MPEG process, you will rarely ever encounter GOP configuration while you are setting up your MPEG-2 encoding/transcoding. However, it is a good idea to be familiar with what GOP is because you will experience it in other parts of your DVD authoring process such as setting up chapter markers.
GOP is short for "Group of Pictures".
Essentially, the way your MPEG video optimizes itself is by attempting to carry over as much still content from one frame to the next as possible.
In general, unless the camera itself is moving, there will only be specific regions of your video which will be changing such as a person's mouth when they are talking.
It seems almost pointless to save the whole information about a frame when basically you just want to say "keep everything the same as before, just change the area around the person's mouth where they are talking".
The way groups of pictures works is there are three types of frames in your movie:
- I-Frames - Short for Intra-Frame. These are the key frames of the GOP. They are independent of the previous or future frames are are compressed in their entirety. Essentially they function as key frames. Only on I-Frames can you put chapter markers and only one I-Frame can appear per GOP Pattern.
- P-Frames - Short for Predicted Frame. These frames refer to previous frames for reuse of frame segments. Because they do not use the entire frame information, the amount of data stored in a P-Frame is reduced from an I-Frame.
- B-Frames - Short for Bi-Directional Frame. These frames refer to previous and future frames for motion changes. They are the most efficient types of frames, however because of their behavior they cannot handle harsh actions, such as switched scenes, or shaky camera movement, very well.