Method and apparatus for adaptive data compression -> Monitor Keywords
Fresh Patents
Monitor Patents Patent Organizer How to File a Provisional Patent Browse Inventors Browse Industry Browse Agents Browse Locations
site info Site News  |  monitor Monitor Keywords  |  monitor archive Monitor Archive  |  organizer Organizer  |  account info Account Info  |  
04/06/06 - USPTO Class 341 |  163 views | #20060071822 | Prev - Next | About this Page  341 rss/xml feed  monitor keywords

Method and apparatus for adaptive data compression

USPTO Application #: 20060071822
Title: Method and apparatus for adaptive data compression
Abstract: We present a method and apparatus for performing adaptive data compression. An alphabet and vocabulary in the encoder and decoder is built adaptively and stored in a dictionary as symbols are to be encoded and decoded. Each time an unknown symbol is to be encoded by the encoder, the encoder adds the symbol to the dictionary and transmits it in plain in the encoded string. The code words transmitted by the encoder include symbols and indexes. The state of a prefix bit preceding the code word indicates whether the code word is a plain symbol or an index of a symbol or string of symbols stored in the dictionary. The decoder examines the prefix bit of each code word as it is received to determine if the code word stores a symbol in plain or in index. If the code word stores a symbol in plain, the decoder learns the symbol by adding a sequence of symbols resulting from the concatenation of previously decoded symbols and the first symbol of the currently decoded symbol and by adding the symbol to its dictionary. If the code word stores an index, the decoder decodes the code word by extracting the symbol or sequence of symbols stored in the dictionary at the respective index in the dictionary.
(end of abstract)
Agent: Hamilton, Brook, Smith & Reynolds, P.C. - Concord, MA, US
Inventor: Mourad Abdat
USPTO Applicaton #: 20060071822 - Class: 341050000 (USPTO)


The Patent Description & Claims data below is from USPTO Patent Application 20060071822.
Brief Patent Description - Full Patent Description - Patent Application Claims  monitor keywords



RELATED APPLICATIONS

[0001] This application is a continuation of U.S. application Ser. No. 10/788,003, filed Feb. 27, 2004, which is a continuation of U.S. application Ser. No. 10/420,018, filed Apr. 18, 2003, which was issued as U.S. Pat. No. 6,700,512 on Mar. 2, 2004, which is a continuation of U.S. application Ser. No. 09/782,614, filed Feb. 13, 2001, which was issued as U.S. Pat. No. 6,606,040 on Aug. 12, 2003. The entire teachings of the above applications are incorporated herein by reference.

BACKGROUND OF THE INVENTION

[0002] Data compression refers to the process of reducing the amount of data needed to represent a given information. The underlying basis of the reduction process is the removal of redundant or unnecessary data. Data compression techniques reduce the costs for information storage and transmission. Data compression techniques are used in many applications, ranging from simple file size reduction to speech and video encoding.

[0003] There are two different types of compression: lossless and lossy. In lossless compression, the source message at the encoder input is retrieved exactly at the output of the decoder. In lossy compression, the message is not retrieved exactly, but the information loss is tolerable for the type of application targeted by the compression schemes. Lossy compression is mainly used for speech, audio, image and video signals. The aim of the compression algorithm is to represent the signal with a minimum number of bits while maintaining the signal intelligibility and perceptual quality. All the information that cannot be perceived by human sensors can be removed.

[0004] Lossless compression techniques are used in applications where no information loss is tolerable such as compressing executable and source code files, satellite imaging and medical imaging. The techniques are also used as part of lossy compression schemes for better compression ratios.

[0005] One well-known technique for performing lossless compression is the LZW Lempel-Ziv-Welch ("LZW") algorithm. The LZW algorithm is a universal algorithm based on string parsing according to a fixed rule. It is based on the concept that often used sequences can be encoded in a lesser number of bits than would be required to spell out the entire sequence. The LZW algorithm requires the initialization of a table with the alphabet of the source. A symbol width is selected and the source alphabet is created for the symbols and stored in a coding table in the encoder and the decoder before the start of the encoding process. The LZW algorithm adds selected sequences of symbols (vocabulary) to a dictionary as it encodes received sequences of symbols. Sequences contained in the dictionary can be encoded with a lesser number of bits than that required to spell out the entire sequence with symbols. The size of the source alphabet is dependent on the width of the symbol. For example, a symbol width of 1 byte (8 bits) requires a source alphabet of 28 (256) entries and a symbol width of 2 bytes (16 bits) requires a source alphabet of 216 (64 K) entries. Typically, the LZW algorithm is implemented with a symbol width of one byte (8 bits). The LZW algorithm searches the coding table for the longest match in a received sequence of symbols and transmits the index of the longest match stored in the dictionary.

[0006] FIG. 1 illustrates a prior art LZW coding table 100 in an encoder and decoder for performing lossless data compression. The LZW coding table 100 can be a ternary Contents Addressable Memory ("CAM"). The input sequence of symbols 102 is translated to a sequence of indexes by the encoder using the source alphabet 106 and dictionary 108 stored in the LZW coding table 100. The coding table 100 in the encoder 110 and the decoding table 120 in decoder 112 include the source alphabet 106 and dictionary 108. The sequence of indexes 114 is transmitted by the encoder 110 and decoded by the decoder 112. The decoder 112 provides an output string 104 with the same symbols as the input sequence of symbols 102. The source alphabet 106 is stored in the LZW coding table 100 in the encoder 110 and the decoder 112 before the encoder 110 starts to encode the input sequence of symbols 102. The sequence of indexes 114 transmitted from the encoder 110 to the decoder 112 are indexes of plain text symbols stored in the source alphabet 106 or indexes of strings of symbols stored in the dictionary 108. The encoder 110 and the decoder 112 independently create entries in their respective dictionaries by learning new sequences of symbols dependent on the initial source alphabet. The encoder 110 adds a new sequence of symbols in the dictionary but transmits the index of the previously learned symbols or sequence of symbols to the decoder 112 in the sequence of indexes 114. The decoder also learns the new sequence of symbols and stores the new sequence of symbols at a new index in the LZW decoding table 120 in the dictionary 108.

[0007] FIG. 2 illustrates a prior art LZW compression of an input string in the encoder 110 shown in FIG. 1. The source alphabet 106 (FIG. 1) is stored in the LZW coding table 100 (FIG. 1) before the encoder 110 (FIG. 1) starts parsing the input sequence of symbols 102 or before the decoder starts decoding. The source alphabet 106 (FIG. 1) for an 8-bit symbol is stored at indexes 0-255 in the coding table 100 (FIG. 1) and the decoding table 120 (FIG. 1). The contents of five of the 256 locations in the source alphabet 106 (FIG. 1) are shown. Symbol `/` is stored at index 47, symbol `b` is stored at index 98, symbol `d` is stored at index 100, symbol `e` is stored at index 101, symbol `t` is stored at index 116 and symbol `w` is stored at index 119. An additional entry 256 at index 256 in the source alphabet 106 in the LZW coding table 100 stores End Of String ("EOS"), and entry 257 at index 257 in the dictionary 108 in the LZW coding table 100 stores a Flush code.

[0008] An input sequence of symbols 102 is received by the encoder 110 (FIG. 1). The encoder 110 (FIG. 1) parses the input sequence of symbols 102 and transmits the sequence of indexes 114 (FIG. 1). The input sequence of symbols 102 is encoded by the encoder 110 (FIG. 1) by parsing the input sequence of symbols 102 and searching the LZW coding table 100 for the longest match for the symbols and transmitted as a sequence of indexes (code words) for entries in the LZW coding table 100. An index can be a pointer to an entry in the source alphabet 106 or the dictionary 108.

[0009] As shown in the LZW coding table 100, the index for the entry in the source alphabet 106 storing the symbol `/` is 47. Initially, the coding table 100 stores only the source alphabet 106. As a sequence of symbols 102 is received by the encoder 110 (FIG. 1), the encoder 110 (FIG. 1) parses the sequence of symbols 102 dependent on the symbol width. The encoder 110 (FIG. 1) selects a symbol in the sequence of symbols 102 (FIG. 1) and searches the LZW coding table 100 for the symbol. The encoder learns vocabulary by concatenating known symbols and sequences of symbols. If the symbol is found, the symbol is concatenated with the next symbol, and the LZW coding table 100 is searched for a sequence of symbols formed by the two symbols. If the sequence of symbols is not stored in the LZW coding table 100, the index of the previously identified symbol or sequence of symbols is transmitted and the new sequence of symbols is added to the LZW coding table 100.

[0010] The operation of the encoder using the LZW algorithm is illustrated using the input sequence of symbols 102: /wed/we/wee/web/wet/ as shown in FIG. 2 and a symbol width of one character (8 bits). The coding table 100 stores the initial alphabet which includes an entry for each 8-bit symbol including `/`, `w`, `e`, `d`, `b` and `t`. The parsing of the input sequence of symbols 102 starts with symbol `/`. Symbol `/`is stored in the LZW coding table 100 at index 47, `/`is concatenated with the next symbol `w`, and the coding table is searched for the sequence of symbols `/w`; since `/w` is not then stored in the LZW coding table 100, `/w` is learned by storing `/w` at the next sequential index 258. The index for `/`; that is, 47 the previously identified symbol is transmitted in the sequence of indexes 104.

[0011] Parsing starts again at symbol `w` in the input sequence of symbols 102. The LZW coding table 100 is searched for symbol `w`. Symbol `w` is stored in the LZW coding table 100 at index 119, symbol `w` is concatenated with the next symbol `e` in the input sequence of symbols 102 and coding table is searched for the sequence of symbols `we`. Since `we` is not then stored in the coding table 100, `we` is learned by storing `we` in the coding table at the next sequential index 259. The index for `w`; that is, 119 the previously identified symbol is transmitted in the sequence of indexes 104.

[0012] Parsing starts again at symbol `e` in the input sequence of symbols 102. Symbol `e` is stored in the LZW coding table 100 at index 101. Symbol `e` is concatenated with the next symbol `d` in the input sequence of symbols and the LZW coding table 100 is searched for the sequence of symbols `ed`. Since, `ed` is not stored in the LZW coding table 100, `ed` is learned by storing `ed` in the next sequential entry in the coding table at index 260. The index for `e`, that is, 101, the previously identified symbol is transmitted in the sequence of indexes 104.

[0013] Parsing starts again at symbol `d` in the input sequence of symbols 102. Symbol `d` is stored in the coding table 100 at index 100. Symbol `d` is concatenated with the next symbol `/`in the input sequence of symbols 102 and the LZW coding table 100 is searched for the sequence of symbols `d/`. Since, `d/` is not stored in the LZW coding table 100, `d/` is learned by storing `d/` in the next sequential entry in the coding table at index 261. The index for the previously identified symbol `d`, 100, is transmitted in the sequence of indexes 104.

[0014] Parsing starts again from the symbol `/` in the input sequence of symbols 102. Symbol `/`is stored in the LZW coding table 100 at index 47. Symbol `/`is concatenated with the next symbol `w` in the input sequence of symbols 102 and the LZW coding table 100 is searched for the sequence of symbols `/w`. `/w` is stored in the coding table 100 at index 258, `/w` is concatenated with `e` in the input sequence of symbols 102 and the coding table is searched for the sequence of symbols `/we`. Since `/we` is not stored in the coding table 100, `/we` is learned by storing `/we` in the next sequential entry in the LZW coding table at index 262. The index for the previously identified sequence of symbols `/w`, 258, is transmitted in the sequence of indexes 104.

[0015] For example, for a symbol width of 8 bits, upon finding a match for the 24 bit (3 bytes.times.8 bits) per byte string of characters `/we`, a 9-bit index (the address of the string of symbols `/we` stored in the dictionary) is transmitted from the encoder to the decoder. This reduces the number of bits transmitted from 24 to 9. Upon receiving the 9-bit index the decoder regenerates the string of characters `/we` stored at the 9-bit index in its copy of the dictionary. If no corresponding sequence (prefix) had been found in the dictionary, indexes for the individual symbols `/`, `w` and `e` would be transmitted.

[0016] Transmission of the input sequence of twenty symbols 102 requires 160 bits (20 symbols.times.8 bits per symbol). The LZW algorithm reduces the number of bits transmitted to 126 bits (14 indices.times.9 bits). As the input sequence of symbols 102 is parsed, the vocabulary stored in the dictionary 108 in the coding table 100 increases and the lengths of the sequences of symbols stored in the coding table increase. For example, index 264 represents a sequence of four symbols `/wee`.

[0017] The longer the sequence of symbols stored in the coding table, the better the compression because the number of indexes transmitted is decreased. Compression can also be improved by increasing the symbol width. However, the initial source alphabet required by the LZW algorithm increases by two for each bit added to the symbol width and thus requires an impractical table size for encoding an initial source alphabet for symbol widths of several bytes.

SUMMARY OF THE INVENTION

[0018] The present invention does not require initialization of a source alphabet in the dictionary. Instead, both the alphabet and vocabulary are learned and stored in the dictionary during the encoding of the input string of symbols.

[0019] Furthermore, in the prior art LZW, a large alphabet requires large symbol width indexes. An alphabet of 256 one byte symbols requires the indexes to start with 9 bits, an alphabet of 16384 two byte symbols requires an initial index of 17 bits. The width of the index directly affects the compression ratio. In the present invention, the size of the alphabet has no direct effect on the index width. Furthermore, only symbols which are used by a source are learned to the dictionary. Thus, the invention is suitable for sparse sources.

[0020] The dictionary is searched for a symbol or sequence of symbols received in a string of symbols. Upon detecting that the symbol is not stored in the dictionary the symbol is learned by storing the symbol in the dictionary, and the plain symbol is transmitted in a code word.

[0021] Upon detecting that a symbol or sequence of symbols is stored in the dictionary, the index at which the symbol or sequence of symbols is stored in the dictionary is transmitted in the code word. A state of a prefix field in the code word may identify the contents of the code word as either plain symbol to be learned or an index. The dictionary index may be of variable width dependent on the number of symbols and sequences of symbols that have been learned.

Continue reading...
Full patent description for Method and apparatus for adaptive data compression

Brief Patent Description - Full Patent Description - Patent Application Claims
Click on the above for other options relating to this Method and apparatus for adaptive data compression patent application.
###
monitor keywords

How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Method and apparatus for adaptive data compression or other areas of interest.
###


Previous Patent Application:
High quality wide-range multi-layer image compression coding system
Next Patent Application:
Method and apparatus for scaling the average current supply to light-emitting elements
Industry Class:
Coded data generation or conversion

###

FreshPatents.com Support
Thank you for viewing the Method and apparatus for adaptive data compression patent info.
IP-related news and info


Results in 0.23536 seconds


Other interesting Feshpatents.com categories:
Canon USA , Celera Genomics , Cephalon, Inc. , Cingular Wireless , Clorox , Colgate-Palmolive , Corning , Cymer ,