Apple EU patent: Video encoding method and scene cut detection method
Posted by Dennis Sellers
Jun 12, 2006 at 2:18pm
The European Patent Office has revealed Apple patent EP1665801 published on June 7 titled “Video encoding method and scene cut detection method.” The patent addresses methods for determining the number of unidirectional and bidirectional frames to be encoded for a video sequence and a method for detecting scene cuts in the video sequence.
Patent FIG. 4 is a graphical illustration of a set of successive frames that are processed in two passes with partial re-use of motion vectors computed in the first pass. Patent FIG. 9 is a graphical illustration of a set of successive frames that are processed in two passes with full re-use of motion vectors computed in the first pass.
Background of the invention
Video is currently being transitioned from an analog medium to a digital medium. For example, the old analog NTSC television broadcasting standard is slowly being replaced by the digital ATSC television broadcasting standard. Similarly, analog video cassette tapes are increasingly being replaced by digital versatile discs (DVDs). Thus, it is important to identify efficient methods of digitally encoding video information. An ideal digital video encoding system will provide a very high picture quality with the minimum number of bits.
Many video encoding algorithms partition each video frame in a sequence of video frames into sets of pixels called pixelblocks. Each pixelblock is then coded using a predictive encoding method such as motion compensation.
Some coding ISO MPEG or ITU video coding standards, e.g., H.264, use different types of predicted pixelblocks in their encoding. Traditionally, a pixelblock may be one of three types:
1) An intra (I) pixelblock that uses no information from other frames in its encoding,
2) An unidirectionally predicted (P) pixelblock that uses information from one preceding (past) frame, or 3) a bidirectionally predicted (B) pixelblock that uses information from one preceding (past) frame and one future frame.
3) Generally, a frame that contains any B-pixelblocks is referred to as a B-frame, a frame that contains some P-pixelblocks and no B-pixelblocks is referred to a P-frame, and a frame that contains only I-pixelblocks is referred to an I-frame. The selection of the number of bidirectional motion compensated (B) frames to be coded between intra (I) or unidirectional motion compensated (P) frames is an encoder decision that significantly affects the bit rate of the subsequently compressed video bitstream. A video encoder must decide which is the best way amongst all of the possible methods (or modes) to encode each pixelblock and how many B-frames, if any, are to be coded between each I or P frames. Thus, efficient and effective methods of selecting the number of B-frames to be coded between I-frames or P-frames of a video sequence are needed.
Summary of the invention
The present invention provides methods for encoding frames of a video sequence where the sequence is processed in two passes. During the first pass, motion vectors are computed for pixelblocks of each frame in a set of successive frames with reference to other specific frame or frames. In some embodiments, motion compensation errors (MCEs) for pixelblocks of each frame are also computed. A motion cost value for each frame is then determined, the motion cost value being related to the number of bits required to encode the motion vectors and/or the value of the MCEs of the pixelblocks of the frame. A derived cost value is then computed based on the motion cost value of at least one frame (e.g., the derived cost value can be the motion cost value of one frame, the average motion cost value of two or more frames, or the ratio of the motion cost value of a first frame and the motion cost value of a second frame).
In addition, in the first pass, the derived cost value is used to determine the number (NB) of B-frames to be encoded in the set of successive frames. The number (NB) of B-frames to be encoded increases as long as the derived cost value is below a predetermined threshold value. In the second pass, frame NB+1 in the set of successive frames is encoded as a P-frame and frames 1 through NB are encoded as B-frames where some or all motion vectors computed in the first pass are re-used in the encoding process of the second pass.
In some embodiments, during the first pass, motion vectors are computed for each pixelblock and for each frame in a set of successive frames with reference to an immediately preceding frame. In these embodiments, some of the motion vectors computed in the first pass are re-used in the encoding process of the second pass. In further embodiments, during the first pass, motion vectors are computed for each frame in a set of successive frames with reference to a same preceding frame (frame 0 in the set of successive frames). In these embodiments, all of the motion vectors computed in the first pass are re-used in the encoding process of the second pass.
In some embodiments, the derived cost value is the average motion cost of a series of successive frames. In other embodiments, the derived cost value is the motion cost of a single frame. In further embodiments, the derived cost value is a ratio between the motion cost of a first frame and the motion cost of a second frame that immediately precedes the first frame. In these further embodiments, the ratio of motion costs is used to detect an impulse-like increase in the motion costs between two successive frames that typically indicates a scene cut between the two successive frames. As such, these further embodiments provide a scene cut detection method that is used in conjunction with the two pass encoding method of the present invention. In additional embodiments, the scene cut detection method is used independent from the two pass encoding method.
Random patent points
Some embodiments described above relate to video frames in YUV format. One of ordinary skill in the art, however, will realize that these embodiments may also relate to a variety of formats other than YUV. In addition, other video frame formats (such as RGB) can easily be transformed into the YUV format. Furthermore, embodiments of the present invention may relate to various video encoding applications (e.g., DVD, digital storage media, television broadcasting, internet streaming, communication, teleconferencing, etc.) in real-time or post-time. Embodiments of the present invention may also be used with video sequences having different encoding standards such as H.263 and H.264 (also known as MPEG-4/Part 10).
In some cases, the interpolated motion vector is good enough to be used without any correction, in which case no motion vector data need be sent. This is referred to as Direct Mode in H.263 and H.264. This works particularly well when the camera is slowly panning across a stationary background. In fact, the interpolation may be good enough to be used as is, which means that no differential information need be transmitted for these B-pixelblock motion vectors.
The method then determines (at 535) if the AFMCm value of frames 1 through m is less than a predetermined threshold T. In some embodiments, the value of threshold T is determined experimentally. In some embodiments, the value of threshold T is different for different types of video sequences, e.g., a different threshold value may be used for video sequences of sports, soap operas, news reports, old movies, videoconference sequences, etc.
In an alternative embodiment, the two pass partial motion vector re-use method of FIG. 5 is optimized for encoding a video sequence with a relatively high number of scene cuts (i.e., a relatively high number of frames with discontinuous content). Such video sequences may be found, for example, in music videos, sports broadcasts, etc. In the alternative embodiment, a scene cut detection method is used where an impulse-like variation in the motion costs of two successive frames (which typically indicates a scene change) is monitored.
Full re-use of motion vectors determined in a first pass
In most conventional video encoders, the process of computing motion vectors consumes a significant amount of computational resources. Thus, in encoding frames of a video sequence, it is advantageous to minimize the number of motion vector computations as much as possible. In the two pass encoding method of the present invention, therefore, it is advantageous to re-use in the second pass as many of the motion vectors computed in the first pass as possible so that fewer new motion vectors need to be computed in the second pass.
In an alternative embodiment, all motion vectors computed in the first pass are re-used in the second pass to provide full re-use of motion vectors. The full re-use method is similar to the partial re-use method described above in Section I except that, in the first pass, motion vectors for each frame are computed by using information from a same reference frame (frame 0) instead of a preceding frame (as in the partial re-use method). Then during the second pass, each motion vector determined in the first pass is re-used in the second pass to encode B-frames and P-frames in a set of successive frames.
Notice
Macsimum News presents only a brief summary of patents with associated graphic(s) for journalistic news purposes as each such patent application and/or grant is revealed by the U.S. Patent & Trade Office. Readers are cautioned that the full text of any patent applications and/or grants should be read in its entirety for further details.
Inventors: Barin Geoffry Haskell, Adriana Dumitras and Atul Puri.
neo@macsimumnews.com or macsimum@shaw.ca, attention Neo.
Article Information
Comment on this Article Print this Article Email this Article Digg This
Contributor
Dennis Sellers
Dennis has been a newspaper editor/reporter (seven years) and teacher (seven years). He has over 4,000 magazine, newspaper and online articles to his credit. He has also covered the Mac and tech industries for over a decade for such online publications as MacCentral, MacMinute and now MacsimumNews.






