CROSS-REFERENCE TO RELATED APPLICATION
This application is a continuation of and claims priority to co-pending U.S. patent application Ser. No. 13/099,882, filed May 3, 2011, which is a divisional of and claims priority to co-pending U.S. patent application Ser. No. 11/626,670, filed Jan. 24, 2007, which claims priority from U.S. Provisional Patent Application No. 60/761,610 filed Jan. 24, 2006, the contents of both of which are incorporated herein in their entireties by reference.
BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention is related to a global printing system, and more particularly, to a printer that is capable of effectively and efficiently printing scripts on a label.
2. Description of Related Art
Printers are used in countries around the world. Most of these countries require printing in languages other than English. For example, Europe, the Middle East, India & Southeast Asia, and China, Japan, Korea, and Vietnam (commonly known as “CJKV”) utilize printers that produce labels in their native language or in several languages on a single label. Thus, customers utilize thermal printers for the labels the printer produces, not the actual printer. These labels are made up of human readable text, graphics, and barcodes.
Languages have different ways of displaying the human readable text, each using different scripts. English, for example, uses the Latin script to produce human readable English text. A single script can be used for more than one language, as is the case with the Hanzi script being used to make human readable text for both Mandarin and Cantonese. A single language can also use more than one script. For instance, Japanese uses the Hiragana, Katakana, and Kanji scripts for written Japanese.
In order to print text, graphics, and barcodes, the data to be printed is encoded. Code points are utilized to represent characters, where characters are symbols that represent the smallest component of written language, such as letters and numbers. Glyphs are used to graphically represent the shapes of characters when they are displayed or rendered, while a font is a collection of glyphs. Each character does not necessarily correspond to a specific glyph, as there is not a one-to-one relationship between the two. The encoding is employed to convert code points into byte representation in storage memory. For example, some legacy encoding schemes include: ASCII in the United States; CP 850 in Latin-speaking regions; Shift-JIS in Japan; UHangul, Johab, and Wansung in Korea; and Big 5, GB 2312, and HZ in China/Taiwan.
However, because encoding schemes were insufficient to cover all languages, and many encodings schemes conflicted with one another, Unicode was developed. Unicode achieves uniformity between all languages and provides a set of coded characters that includes almost all characters used worldwide in an attempt to provide a universal standard. The Unicode Standard provides a number value (i.e., code point) and a name for each character, as well as various information such as mapping tables, character property tables, and mappings to character sets, to ensure legible and consistent implementation of data.
Unicode may be represented in UTF-8, UTF-16, and UTF-32 (UTF=Unicode Transformation) encodings but may also be represented by UCS-2 (UCS=Universal Character Set) and UCS-4. Each of the three UTF encoding schemes is capable of representing the full range of Unicode characters and have respective advantages and disadvantages. UTF-32 is the simplest form where each code point is represented by a single 32-bit code unit (i.e., a fixed width character encoding form). UTF-8 is a variable width encoding form that preserves ASCII transparency and uses 8-bit encoding units. UCS-2 is a two byte fixed width encoding scheme that does therefore not include support for “surrogate characters,” which are characters that require more than two bytes to represent.
With respect to UTF-16, code points within a specified range are represented by a single 16-bit unit, while code points in a supplementary plane are represented by pairs of 16-bit units. UTF-16 is not an ASCII transparent encoding scheme. While it does map the characters included in the ASCII character set to the same code points as ASCII, the way it encodes these code points is different. UTF-16 encodes these code points using two bytes. The Most Significant Byte (MSB) is 0x00 while the Least Significant Byte (LSB) is the same as the ASCII value. Often, Unicode scripts will contain a Byte Order Mark (BOM) to denote what the endianness of the file is. However, the Unicode standard does not require that the file contain a BOM. Furthermore, the Unicode standard states that if a system detects Unicode data that is encoded using the UTF-16 encoding scheme and a BOM is not present, endianness is assumed to be big.
Current printers support a variety of ASCII transparent encoding schemes, including both single and multi-byte encodings, but encode the ASCII set of characters using one byte. This makes it possible for printers to simply look at printer control commands as if they were ASCII, which enables a printer control command to be embedded that specifies what the encoding scheme actually is prior to reaching the field data which could include multi-byte characters. However, when utilizing a Unicode encoding scheme, the printer is prevented from blindly looking for printer control commands because not all of the data is ASCII transparent.
Unicode presents other issues with respect to grapheme clusters. In English, the concept of “a character” is very simple; it is a letter which is represented by a single code point. However, in other languages, defining “a character” is a more complex task, as a character is often made up of multiple code points. For example the “à” Small Latin Letter A with Grave can be encoded as both U+00E0 and as the combined U+0061 [Small Latin Letter A] U+0300 [Combining Grave Accent]. The Unicode Standard attempts to define what makes up “a character” through the creation of a grapheme cluster; however, this only handles the issue when the combining marks are of a non-spacing type. If printers use grapheme clusters to determine how to break apart a data string for vertical printing, a glyph could appear by itself on a line, which would render the text virtually unreadable.
Because Unicode provides characters rather than the actual rendering of the characters (i.e., formatting, text placement, glyph selection, glyph style, or glyph size), software is required to properly implement Unicode at a printer. For many languages and countries, a label design application must be used to format labels to be printed. The label design applications download the format to the printer as a graphic, greatly increasing the first label out time. A slow first label out is also caused by the slow memory used to store the large fonts required to support some scripts, such as Japanese or Simplified Chinese. The memory required means an expensive upgrade but is necessary since some fonts, such as Andale, can be as large as 22 MB and do not fit onto the printer's available memory. For some languages and encodings, there is improper support or they are not available at all. Additionally, Unicode is not supported in all font technologies, such as TrueType, because the number of defined Unicode points exceeds the capacity of current font technologies.
There are four main regions with languages issues to consider: Europe, the Middle East, India and Southeast Asia, and CJKV. Moreover, many companies are “multinational” and operate in many of the other regions and have both similar and separate language issues from those regions. Multinationals may be located in foreign markets, sell product into foreign markets, and/or manufacture in foreign markets.
In the Middle East, Hebrew and Arabic are the two most common languages. Hebrew is the official language of Israel. Arabic is the official language of Egypt, UAE, Iraq, Kuwait, and many other Middle East countries. The Arabic alphabet is also used to write non-Arabic languages. The Malay language, an official language of Singapore, Malaysia, and Brunei, uses the Arabic characters. Other languages that use the Arabic characters are Persian (Iran, Afghanistan, and Uzbekistan) and Urdu (Pakistan). Hebrew and Arabic differ from most other languages because they are read and written from right to left. Another issue with Arabic is that characters are displayed cursively. While English has an optional cursive style of writing, Arabic is always written cursively. The Arabic characters change shape depending on the characters around them, which is known as contextual shaping. Twenty two of the 28 characters have up to 4 different glyphs depending on if the character is at the beginning, middle, or end of a word. There is also a form for when the character is isolated. A single Unicode code point is given to each character, even though the code point can have several forms. The remaining 6 characters do not change shape. An interesting issue with writing right to left is that numerals are still written from left to right, which is a common occurrence in part numbers or addresses. This is the reason for the name bi-directional text, since the text can switch from left to right and right to left in the same sentence.
The India and Southeast Asia region consists of countries such as Thailand, India, Sri Lanka, Philippines, and Bangladesh. These countries use scripts such as That, Devanagari, Telugu, Bengali, and Sinhala. These scripts are less complex than the Middle East languages because they are written from left to right. The issue with Indic and Southeast Asian languages is that they have combining characters, and languages that use the Devanagari and similar scripts have connecting headstrokes. A headstroke is a horizontal line that runs across the top of each character. The character stems off from the headstroke. The characters combine and can change order depending on the characters around them, mostly with dependent vowels. Similar to Arabic, even though a character can have several forms, it is only assigned one Unicode code point.
The remaining Asian languages not covered by the other regions are Japanese, Korean, Simplified Chinese, Traditional Chinese, and Vietnamese, commonly known as CJKV. These languages are used in China, Taiwan, Hong Kong, Singapore, Vietnam, Korea, and Japan. The main issue with these languages is the vast amount of characters used to write each of these languages. Although only around 2000-3000 characters are required for basic literacy in Japanese or Chinese, there are upwards of 80000 characters listed in some dictionaries. Most of these characters are rarely used in everyday writings, but are commonly used in proper names. In most of these languages, a character will have the same meaning in all the languages, but may have a slightly different glyph. Another issue with CJKV languages is multiple scripts being used. For example, a Japanese sentence can use up to four different scripts. Vietnamese also presents another problem with writing using Latin characters. Vietnamese words must have a tone mark, which is a diacritical mark combined with a base character. Many of these characters do not have a presentation form and must be rendered with a font engine. Vietnamese also presents a problem in that it requires more vertical space to be displayed properly. When you stack these components the vertical space increases.
Conventional printers provide the ability to print horizontal blocks of text and simple single vertical columns of text. While this is sufficient for many languages of the world, it is not sufficient for the Japanese language. This language can be written either horizontally or vertically. Also, when a combination of Japanese and Latin text is combined in the same block, there are a variety of possibilities of how this would appear, as shown below.
To make the situation more complex, there are no rules that define which of these options is the correct one. In fact, all three of these are correct and the one that is used is left up to the discretion of the typographer.
Multinational often means multilingual. Different from the other mentioned regions where similar languages are required, multinationals must deal with printing labels with multiple languages and all of the language issues associated with each individual language. For example, a label may contain Arabic text written from right to left and French that is written left to right. This label would include other language issues like combining diacritic marks for the French and contextual shaping for the Arabic.
Furthermore, most barcode standards do not specify a particular encoding for the data contained in them but rather a character that would be represented by a particular bar sequence. The QR barcode is an exception to this statement. This QR barcode is capable of encoding data in Shift-JIS but does not always encode data as Shift-JIS. The barcode could also encode numeric data, alphanumeric data, or 8-bit JIS data. The customer could send Unicode encoded data and request that the data be written onto a QR barcode. Therefore, current printers are incapable of supporting barcodes that are encoding scheme independent in order to create a valid barcode with data that reflects the intentions of the customer.
Many companies currently offer locale specific solutions, such as a specific font, but there is no truly global solution provided by any thermal printer manufacturer “out-of-the-box.” Currently, the thermal printer market offers locale specific fonts as options, but printers are limited by standard printer memory and expensive memory upgrades to the number of fonts and languages the printer can support at any one time. Furthermore, many customers are unwilling to change to Unicode and will continue to use their legacy encoding schemes because of the high cost to convert. Those customers that do wish to convert to Unicode also face difficulties, such as how to properly parse printer control commands that are being sent simultaneously (e.g., in both ASCII transparent encoding schemes and in UTF-16 encoding schemes) without requiring an encoding indicator before the printer control commands are sent to the printer. In addition, customers wishing to send scripts supported by legacy encoding schemes may or may not have an encoding scheme command that is capable of being processed as they have been in the past.
It would therefore be advantageous to provide a global printing system that is capable of printing in the native language of various countries regardless of the encoding scheme employed. In addition, it would be advantageous to provide a global printing system that is cost effective and efficient, as well as capable of printing multiple scripts and/or multiple fonts on a label. Moreover, it would be advantageous to provide a global printing system that is capable of properly rendering Unicode characters and/or non-Unicode characters on a label.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING(S)
Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:
FIG. 1 is a diagram depicting a printer in communication with a plurality of host computers according to one embodiment of the present invention;
FIG. 2 is an exemplary product label depicting multiple scripts according to one embodiment of the present invention;
FIG. 3 is an exemplary shipping label depicting multiple scripts according to an additional embodiment of the present invention;
FIG. 4 is an exemplary label depicting multiple fonts according to one embodiment of the present invention;
FIG. 5 is an exemplary ̂FO command according to another embodiment of the present invention;
FIG. 6 is an exemplary list of characters prohibited from beginning a line of text;
FIG. 7 is an exemplary list of characters prohibited from ending a line of text;
FIG. 8 is an exemplary text box illustrating that the size of the text box is inadequate to accommodate all of the text therein according to another embodiment of the present invention;
FIG. 9 is an exemplary command having ASCII characters in the data field according to one embodiment of the present invention;
FIG. 10 is an exemplary command having non-ASCII characters in the data field according to one embodiment of the present invention;
FIGS. 11A and 11B are flowcharts illustrating a process for encoding scheme interleaving according to one embodiment of the present invention;
FIG. 12 is a flowchart depicting a process for encoding barcodes according to another embodiment of the present invention;
FIG. 13 is a flowchart depicting a process for serializing data based on combining semantic clusters according to one embodiment of the present invention; and
FIG. 14 is an exemplary sequence of code points and its associated printed representation illustrating one potential application of a combining semantic cluster according to another embodiment of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the invention are shown. Indeed, the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements. Like numbers refer to like elements throughout.
Referring to the drawings and, in particular, FIG. 1, the present invention provides a global printing system 10 that includes a printer 12 that is capable of communicating with a plurality of host computers 14. Each of the host computers 14 transmits commands to the printer 12 that may be formatted in different encoding schemes. The printer 12 is capable of determining the encoding scheme for the commands from each host computer 14 such that the commands may be interpreted correctly. Thus, the printer 12 is capable of printing various language scripts regardless of the encoding scheme and/or language script used. Moreover, the printer 12 is capable of printing multiple scripts on a single label (see FIGS. 2 and 3), as well as in multiple fonts (see FIG. 4). The printer 12 is capable of effectively printing the scripts on labels regardless of the orientation of the text and/or the complexity of serialized data. Additionally, the printer 12 is capable of encoding barcode labels such that the data input by the user matches the data that a scanner scanning the barcode sends back to the computer regardless of the encoding scheme employed.
The printer language referred to herein is Zebra Printer Language (“ZPL”) that is compatible with printers manufactured by ZIH Corp., the present assignee. For example, exemplary commands referred to herein include: ̂CI (change international font); ̂DT (download TrueType font); ̂DY (download graphics); ̂FB (field block); ̂FD (field data); ̂FO (field origin); ̂FP (field parameter); ̂FV (field variable); ̂SF (serialization field with a standard ̂FD string); ̂SL (set mode and language for real-time clock); ̂SN (serialization data); and ̂XZ (end format). Thus, ZPL is known to those of ordinary skill in the art as a programming language for generating commands for the printer 12. However, it is understood that the programming language is adaptable to other printers such that any equivalent programming language may be written to carry out the functions set forth herein with any suitable printer.
Moreover, the printer 12 is typically employed to render data received from one or more host computers 14 and print the rendered data on labels, such as barcode labels. However, the term “label” is not meant to be limiting, as the printer 12 may be configured to print on any suitable medium capable of rendering one or more glyphs thereon. For instance, the label could be paper, cloth, plastic, metal, or other medium capable of having human-readable text, graphics, or barcodes depicted thereon.
The printer 12 is typically a thermal printer for printing labels. However, the printer 12 could be any suitable printer capable of rendering data on a label. For instance, the printer 12 could be a computer peripheral device that produces a hard copy (permanent human-readable text and/or graphics, usually on paper) from data stored in a computer connected to it.
The printer 12 is capable of supporting Unicode. If employing Unicode, the printer 12 is required to comply with at least portions of the Unicode Standard, namely, Chapter 3 of the Unicode Standard that sets forth the details for conformance. For example, if the printer 12 claims to support the Arabic or Hebrew scripts, then the system must support the bi-directional text layout algorithm. Another example is if the printer 12 does not support certain character blocks, such as the pre-composed Hangul characters, documentation that those characters are not supported is required. It is understood that the printer 12 may be alternatively configured to support subsets of Unicode by excluding particular scripts that may not be relevant to a printing application, such as Mongolian, Ethiopic, Cherokee, Canadian Aboriginal, Byzantine Music Symbols, Braille, and Historic scripts.
The printer 12 supports contextual mapping, which is prevalent with Arabic letters. The printer 12 may employ various solutions, such as glyph substitution tables (“GSUB”) or an algorithm derived from the Arabicshaping.txt file in the Unicode Character Database, to provide such support. Furthermore, the printer 12 supports diacritic marks. One way to support diacritic marks is to have a font engine render the character and combining mark together. For instance, a font engine could combine the character and the non-spacing diacritic mark and then render the combined character. Another way to support diacritic marks is through the use of presentation forms. The character and the combining mark are converted into its presentation form. The presentation form is a pre-combined character that exists in its own code point in Unicode. Normalization C Algorithm aids the conversion to the presentation form and is detailed in the Unicode Standard. A font engine then renders the pre-combined presentation form, instead of the separate character and combining mark. If the presentation form is not available, the un-combined characters are printed side-by-side. A font engine may be used to combine characters when a presentation form is not available. Thus, the printer 12 is configured to accept the combining characters in the presentation forms and the characters plus combining marks.
Additionally, the printer 12 is capable of supporting bi-directional text if Arabic or Hebrew languages are supported, which is in accordance with the Unicode Standard. The printer 12 can also support text written left-to-right on the same line as text written right-to-left (e.g., English and Hebrew). Bi-directional support is typically selectable. Thus, with the bi-directional support having user selectable support, the printer 12 will remain backwards compatible. The user should be able to select bi-directional text support through both a ZPL command and a front panel prompt on the printer 12.
Currently, the ̂FO command sets the upper-left corner of the field area. When working with left to right oriented text in the ̂FD command, the field origin is the beginning of the text; however, when the ̂FD command contains right to left oriented text, the ̂FO command will set the upper-right corner of the field area. FIG. 5 demonstrates an exemplary ̂FO command with right to left and left to right oriented text. Both of the ̂FD commands have the same ̂FO command setting the field origin at the same location for both ̂FD commands. Since the first ̂FD command contains English, which is left to right oriented text, the field origin is the upper-left corner of the field area. The second command contains Hebrew text, which is right to left oriented text, so the field origin is set at the upper-right corner of the field area.
Furthermore, the printer 12 has the ability to override the location of field origin using explicit commands rather than the printer 12 using the script to determine the location based on the primary directionality of the script. Primary directionality of the script is defined within the Unicode Standard.
The printer 12 also supports word wrapping. When employing word wrapping, normative line breaking properties are followed. The most general way of word wrapping is to line break at spaces; however, in some languages, there are no spaces. If there are no spaces in the text, as in Japanese or Chinese, then the text should be broken when there is no more room in a line. There are certain characters that are prohibited from beginning a line of text, such as those shown in FIG. 6. These characters should be “wrapped up” to stay with the previous line. Similarly, there are also certain characters that should not end a line of text and should begin a new line of text, such as those shown in FIG. 7. These characters should be “wrapped down” to start the next line. Bi-directional text also presents issues with respect to line breaking. For example, since Hebrew is written from right to left, the Hebrew text that should be read first would move to the first line and the remainder would move to the second line.