Understanding Frames Per Second (FPS) (269068)



The information in this article applies to:

  • Microsoft Windows Media Tools 4.0
  • Microsoft Windows Media Tools 4.1
  • Microsoft Windows Media Encoder 7
  • Microsoft Windows Media Encoder 7.1

This article was previously published under Q269068

SUMMARY

Frames per Second (FPS) is a measure of how motion video is displayed. The term applies equally to film video and digital video. Each frame is a still image.

MORE INFORMATION

Background

Technological means can be used to suggest the appearance of movement. To create the perception of motion, the brain automatically adds or fills in missing information. It does this first through a concept known as persistence of vision, where a visual stimulus continues to be registered by the brain for a very short time after the stimulus ends. Secondly, it takes advantage of what is known as the phi function. For example, if two adjacent lights alternately flash on and off, we see a single light shifting back and forth. This is because we tend to fill in gaps between closely spaced objects of vision. These are exploited by motion pictures, which consist of rapid successions of still frames in which the "moving" objects are displaced a very short distance from one another.

Because of these phenomena, the higher the FPS, the smoother the motion appears. In general, the minimum FPS needed to avoid jerky motion is about 30 FPS. For high-motion content, an encoding session around 60 FPS may be more beneficial.

When dealing with FPS, it is important to also understand other terms that are used throughout the Windows Media Encoder:

    • NTSC: National Television Standards Committee. The NTSC is responsible for setting television and video standards in the United States. The NTSC standard for television defines a composite video signal with a refresh rate of 29.97 FPS. The NTSC standard also requires that these frames be interlaced.
    • PAL: Phase Alternating Line. The dominant television standard in Europe. The PAL standard delivers 25 FPS.
    • Interlaced: Each NTSC or PAL video frame consists of two "fields." When displaying video, an NTSC television draws one field every 1/60th of a second, and PAL televisions display one field every 1/50th of a second. Interlacing involves merging the alternating fields of an image into a single frame. The process of separating the single field back into the original two fields is called DeInterlacing.
    • Telecine: Most film content is created at 24 FPS. To meet the NTSC standard, extra frames are added to reach the 30 FPS requirement. This is done through an algorithm that creates an intermediate frame between two other frames. The process that removes the frames that were added when 24 FPS film was converted to 30 FPS video is known as Inverse Telecine.






Understanding Key Frames

When content is streamed, it is very costly (in terms of CPU and network bandwidth) to send all of the video data from each single frame. To solve this problem, the Encoder has a key frame frequency setting. The key frame (also known as the I-frame) is a data frame that contains all of the video data. The intermittent frames only send changes, or deltas, from the key frame.

Take for example a speaker at a podium. It makes more sense to only send the area of the screen that is changing (the speakers mouth, and maybe hands), then it does to send the background information as well. In this example, a higher key frame interval can be selected without harming video quality.

If the content is of higher motion, then it is more beneficial to set the key frame rate lower. Examples of high-motion video are action movies or sporting events. In these instances, the entire frame is changing very quickly so a lot of data is going to be sent, and a higher-end computer is required to achieve desired results.

Even though the key frame rate is set, it is possible for a key frame to be sent before the time interval elapses. If the delta is high enough, the Encoder automatically creates a new key frame.

Factors That Affect FPS

There are primarily three factors that affect the frame rate, and they are all interrelated:
  • CPU: A general rule of thumb for this is the higher the frame rate, the higher the CPU requirement. For the Windows Media Encoder to keep up with the data that is being sent to it from the capture card, a fast CPU is required. To keep synchronized with the audio stream, the Encoder will start dropping video frames to keep up with the data input. When frames are dropped the Current FPS statistic is lower than the Expected FPS statistic on the monitor panel. Note that some capture cards are only able to capture at 15 FPS, while others are able to capture at the full 30 FPS. Consult your card's manufacturer for specific details about your card.
  • Content Type: As stated previously, high-motion content requires more resources than low-motion video. Therefore, it is more important to properly select a CPU to handle the task, and to choose the right profile and key frame rate. If the encoding process cannot keep up with the content, it starts dropping frames. One exception to this is static content. If the content is a stationary object (such as a wall or an empty room), the frame rate may be lower because the Encoder does not need to capture any deltas.
  • Selected Profile: Each profile has frame size and frame rate determined. If you select a profile that the Encoder cannot keep up with, then select a lower quality profile. Multiple Bitrate profiles can also affect the Encoder performance. If the Encoder starts dropping frames, you may need to remove a stream from the profile, or adjust the frame size.
For more information on how to configure and use the Encoder, refer to the Windows Media Encoder documentation.

Modification Type:MajorLast Reviewed:11/26/2003
Keywords:kbinfo KB269068