| System and method for scalable portrait video -> Monitor Keywords |
|
System and method for scalable portrait videoSystem and method for scalable portrait video description/claimsThe Patent Description & Claims data below is from USPTO Patent Application 20090110065, System and method for scalable portrait video. Brief Patent Description - Full Patent Description - Patent Application Claims This is a divisional of a prior application Ser. No. 11/067,554 entitled “SYSTEM AND METHOD FOR SCALABLE PORTRAIT VIDEO” filed on Feb. 25, 2005, which was a continuation of a prior application entitled “SYSTEM AND METHOD FOR SCALABLE PORTRAIT VIDEO” which was assigned Ser. No. 10/302,653 and was filed Nov. 22, 2002. 1. Technical Field This invention is directed toward a system and method for video compression/decompression. More specifically, this invention is directed towards the generation, coding and transmission of a video form. 2. Background Art Wireless networks have been deployed rapidly in recent years. GPRS (General Packet Radio Service) and CDMA 1X (Code Division Multiple Access) as 2.5 G solutions to wide area wireless networks are available in increasingly more regions in Europe, North America and Southeast Asia. Wireless LAN 802.11 and Bluetooth also compete strongly for local area wireless networks. The fast expansion of wireless networks calls for rich content and services for consumers. However, due to limited channel bandwidths in these wireless networks and the weak processing power of mobile devices, conventional media contents are difficult to distribute. Bi-level video [1] is an effective solution for low bandwidth mobile video conferencing, where previously there did not exist suitable video coding technology for current wireless network and mobile device conditions. It was observed that although conventional video processing and coding technologies such as MPEG1/2/4 [2] and H.261/263 [3, 4] could also code video for low bit rates, the resultant images usually looked like a collection of color blocks and the motion in the scene became discontinuous. The block artifacts of these methods originate from the common architecture of MPEG1/2/4 and H.261/263, i.e. discrete cosine transform (DCT)-based coding. In DCT-based coding, low spatial frequency values that represent the “basic colors” of the blocks possess high priority. However, in video communications, facial expressions that are represented by the motions of the outlines of the face, eyes, eyebrows and mouth deliver more information than the basic colors of the face. Bi-level video uses bi-level images to represent these facial expressions, which results in very high compression ratios. Experiments show that at low bandwidths, bi-level video provides clearer shape, smoother motion, shorter initial latency and much cheaper computational cost than do DCT-based technologies. Bi-level video is especially suitable for small mobile devices such as handheld personal computers (PCs), palm-size PCs and mobile phones that possess small display screens and light computational power, and that work in wireless networks with limited bandwidths. In bi-level video, scenes are always represented by two colors, usually black and white. Although black and white are sufficient to describe the outlines of a scene, the visual quality is not very satisfactory. Given that many mobile devices are now able to display at least four levels of grayscale, users of a research prototype for mobile video conferencing have expressed a desire for improved video that contains more gray values and has better visual quality. With this improved video, bit rates must also be kept low. After reviewing existing video technologies that cover different bandwidth ranges, it was found that MPEG/H.26x performs well in the bandwidth range greater than about 40 Kbps and bi-level video works well in the range of 10-20 Kbps for quarter common intermediate format (QCIF) size (e.g., 144 lines and 176 pixels per line). However, the visual quality of bi-level video can no longer be improved even if greater bandwidth is assigned to it. The task is then how to improve the visual quality of bi-level video in the bandwidth range of 20-40 Kbps or how to design a new video form that can work well in this range. It is very important to develop a video form to fit into the 20-40 Kbps bandwidth range because this is the range that 2.5 G wireless networks such as GPRS and CDMA 1X can stably provide, although the theoretical bandwidths of GPRS and CDMA 1X are 115 Kbps and 153.6 Kbps, respectively. It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. A listing of the publications corresponding to each designator can be found at the end of the Detailed Description section. This invention relates to the generation, coding and transmission of an effective video form referred to as scalable portrait video. This form of video is especially useful for mobile video conferencing. As an expansion to bi-level video, portrait video is composed of more gray levels, and therefore possesses higher visual quality while it maintains a low bit rate and low computational costs. Portrait video is a scalable video in that each video with a higher level always contains all the information of the video with a lower level. The bandwidths of 2-4 level portrait videos fit into the bandwidth range of 20-40 Kbps that GPRS and CDMA 1X can stably provide. Therefore, portrait video is very promising for video broadcast and communication on 2.5 G wireless networks. With portrait video technology, this system and method is the first to enable two-way video conferencing on Pocket PCs and Handheld PCs. In the four level embodiment, to generate a portrait video frame, the scalable portrait video system and method obtains a frame of a video in grayscale format. This frame can either be input in grayscale format or can be converted to gray scale from a RGB or other color format using conventional methods. A first threshold T1 is applied to the grayscale frame to generate two partial grayscale images, a first of which, S1, has pixels which have values greater than (or equal to in one embodiment) the first threshold T1, and a second of which, S2, has pixels which have values less than said first threshold To. A first bi-level image is also generated comprising pixels assigned a first binary value if the value of the correspondingly located pixel in the grayscale frame exceeds (or equals in one embodiment) T1 and pixels assigned a second binary value if the value of the correspondingly located pixel in the grayscale frame is less (or equal to in one embodiment) than T1. A second threshold T2, is applied to the first partial grayscale image, S1, to generate a second bi-level image which has pixels assigned a first binary value if the value of the correspondingly located pixel in the first partial grayscale image exceeds (or equals in one embodiment) T21 and pixels assigned a second binary value if the value of the correspondingly located pixel in the first partial grayscale image is less than (or equal to) T21. Likewise a threshold T22 is applied to the second partial grayscale image, S2, to generate a third bi-level image comprising pixels assigned a first binary value if the value of the correspondingly located pixel in the second partial grayscale image exceeds (or equals) T22 and pixels assigned a second binary value if the value of the correspondingly located pixel in the second partial grayscale image is less than (or equal to in one embodiment) T22. It should be noted that T21>T1 and T22<T1. The first, second and third bi-level images can then be combined to create a four level grayscale video frame representing a frame of the portrait video, or the images can be encoded and possibly transmitted first. In one embodiment, to combine the first, second and third bi-level images, an array of two bit elements is created where each element corresponds to a different pixel location of the bi-level images. In one embodiment the second and third bi-level images are combined prior to encoding. The elements of the array have a most significant bit taken from the associated pixel location of the first bi-level image and a least significant bit taken from the associated pixel location of the combined bi-level image. Different gray scale levels are assigned to each possible element value to create the four level grayscale frame. To transmit the encoded bi-level images, if the available bandwidth is small, approximately 10-20 Kbps in one embodiment, then the encoded first bi-level image is transmitted. However, if the available bandwidth is increased, in one embodiment to approximately 20-40 Kbps, then the encoded second and third bi-level images are transmitted to display more grayscale levels than available with said first bi-level image alone. Likewise, if an available transmission bandwidth is large, then the encoded second and third bi-level images are transmitted to display more grayscale levels than available with said first bi-level image, and if this bandwidth is decreased, then only the encoded first bi-level image is transmitted. The encoded second and third bi-level video frames may be transmitted by fading them in and out to the first bi-level video transmission when the available bandwidth is changed. As indicated previously, the first, second and third bi-level images may all be encoded and potentially transmitted prior to being used to create a portrait video frame. This is done in one embodiment using bi-level video coding. In this bi-level encoding of each of the first, second and third bi-level images, for each pixel in raster order: (1) a context is computed and assigned a context number; (2) a probability table is indexed using the context number; (3) the indexed probability is used to drive an arithmetic coder; and (4) the arithmetic coder is terminated when the final pixel in each respective bi-level image has been processed. Alternately, the first, second and third bi-level images can be encoded by first combining the second and third bi-level images by adding the pixel values of corresponding pixel locations of these images to create a combined bi-level image and then separately encoding the first bi-level image and the combined bi-level image using a bi-level encoding process. The encoding of the combined bi-level image can be done by, for each pixel in raster order: (1) determining whether the correspondingly located pixel in the first bi-level image has the first binary value or the second binary value; (2) whenever the correspondingly located pixel in the first bi-level image has the first binary value, computing a context and assigning context number to this context; indexing a probability table using the context number; and using the indexed probability to drive an arithmetic coder; and (3) whenever the correspondingly located pixel in the first bi-level image has the second binary value, computing a context and assigning context number to said context; indexing a probability table using the context number; and using the indexed probability to drive an arithmetic coder. When the correspondingly located pixel in the first bi-level image has the first binary value, a value is assigned to each context position of the context equal to the value of the corresponding pixel location of the combined bi-level image, otherwise the second binary value is assigned. Likewise, if the first bi-level image has the second binary value, a value is assigned to each context position of the context equal to the value of the corresponding pixel location of the combined bi-level image whenever that pixel location, otherwise the first binary value is assigned. In one embodiment of the scalable portrait system and method, the first threshold T1 can be set by a user. The second threshold T21 and third thresholds T22 are then automatically set to T1 plus and minus a prescribed number. In a tested embodiment this prescribed number was 16. Alternately, these thresholds can be automatically calculated via conventional thresholding techniques. In one embodiment Ostu\'s single thresholding method was applied to the grayscale frame to obtain the optimal threshold values for T1, T21 and T22. The coding of four-level video can be easily extended to the coding of multiple-level video. Each partial grayscale image of a lower level video can be divided into two smaller partial grayscale images using a threshold. The value of each pixel in one smaller partial grayscale image is always greater or equal to the threshold and the value of each pixel in another smaller partial grayscale image is always smaller than the threshold. These smaller partial grayscale images can be converted into bi-level images, then be combined and encoded using bi-level video coding and finally become the lowest bit plane of the higher level video, while all the bit planes of the lower level video are used as the higher bit planes of the higher level video. The multi-level video according to the present invention is called portrait video because the coding is ordered from outlines to details, and the videos with lower levels look like a portrait. Much different from DCT-based coding methods, which put first priority on the average colors of a scene, portrait video puts first priority on the outline of a scene and then adds more details to it if more levels are involved. In some sense, portrait video always delivers the most important information of a scene for a given bandwidth. Portrait video is scalable because each video of a higher level always contains all the information of the video of a lower level and enhances the lower level videos. Continue reading about System and method for scalable portrait video... Full patent description for System and method for scalable portrait video Brief Patent Description - Full Patent Description - Patent Application Claims Click on the above for other options relating to this System and method for scalable portrait video patent application. ### 1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored. 3. Each week you receive an email with patent applications related to your keywords. Start now! - Receive info on patent apps like System and method for scalable portrait video or other areas of interest. ### Previous Patent Application: Method of selecting a reference picture Next Patent Application: Method, medium, and apparatus for encoding and/or decoding video Industry Class: Pulse or digital communications ### FreshPatents.com Support Thank you for viewing the System and method for scalable portrait video patent info. IP-related news and info Results in 7.71018 seconds Other interesting Feshpatents.com categories: Accenture , Agouron Pharmaceuticals , Amgen , AT&T , Bausch & Lomb , Callaway Golf paws |
* Protect your Inventions * US Patent Office filing
PATENT INFO |
|