Graphical syntax analysis of tables through tree rewriting -> Monitor Keywords
Fresh Patents
Monitor Patents Patent Organizer File a Provisional Patent Browse Inventors Browse Industry Browse Agents Browse Locations
site info Site News  |  monitor Monitor Keywords  |  monitor archive Monitor Archive  |  organizer Organizer  |  account info Account Info  |  
01/31/08 - USPTO Class 715 |  13 views | #20080028291 | Prev - Next | About this Page  715 rss/xml feed  monitor keywords

Graphical syntax analysis of tables through tree rewriting

USPTO Application #: 20080028291
Title: Graphical syntax analysis of tables through tree rewriting
Abstract: To determine a table structure, a spatially ordered sequence of rectangular cells (42) disposed in a two dimensional region is derived. The ordered sequence of rectangular cells is parsed in accordance with a two-dimensional structural grammar (54) having terminal elements corresponding to cells and non-terminal elements corresponding to structural relationship operators. The parsing produces a grammatical expression (52) with the cells represented by terminal elements and structural relationships represented by non-terminal elements. (end of abstract)



Agent: Fay Sharpe / Xerox - Rochester - Cleveland, OH, US
Inventor: Jean-Yves Vion-Dury
USPTO Applicaton #: 20080028291 - Class: 715228 (USPTO)

Graphical syntax analysis of tables through tree rewriting description/claims


The Patent Description & Claims data below is from USPTO Patent Application 20080028291, Graphical syntax analysis of tables through tree rewriting.

Brief Patent Description - Full Patent Description - Patent Application Claims
  monitor keywords

CROSS REFERENCE TO RELATED PATENTS AND APPLICATIONS

[0001]The following U.S. patents and patent applications are commonly owned with the present application and are each incorporated herein by reference.

[0002]Vion-Dury, U.S. application Ser. No. 11/451,525 filed Jun. 12, 2006, entitled "Methods and Apparatuses for Finding Rectangles and Application to Segmentation of Grid-Shaped Tables" is incorporated herein by reference in its entirety. This application relates at least to methods and apparatuses for finding spatially ordered sequences of rectangular cells.

[0003]Vion-Dury, U.S. application Ser. No. 11/312,267 filed Dec. 20, 2005, entitled "Normalization of Vector Based Graphical Representations" is incorporated herein by reference in its entirety. This application relates at least to apparatuses and methods for generating normalized canonical vector-based graphical representations.

[0004]Handley, U.S. Pat. No. 6,006,240 issued Dec. 21, 1999, entitled "Cell Identification in Table Analysis" is incorporated herein by reference in its entirety. This patent relates at least to identifying cells and cell separators during page recomposition processes, for example during optical character recognition processing.

BACKGROUND

[0005]The following relates to the graphical processing, document processing, information processing, and related arts. It finds example application in extracting structural layout of tables, and is described with particular reference thereto. The following finds more general application in determining structural layouts of rectangular cells of tables, grids, line art objects or representations, and so forth.

[0006]Tables are common elements in documents, and the contents of such tables typically contribute substantially to the informational content of the document. The information content of a table is often intimately related to its layout. For example, every entry in a column of a table may store a price value, while entries in another column may store item number, item name, or so forth. Accordingly, it is advantageous to determine and utilize the structural layout of the table in conjunction with extracting and interpreting the information content of the table. For example, the content may be interpreted on a row-by-row basis, or on a column-by-column basis, or so forth.

[0007]In document conversion applications, a document is converted from a source format, such as portable document format (PDF), to a more structured format such as extensible markup language (XML), hypertext markup language (HTML), or so forth. In performing such a conversion, it is advantageous to extract and retain the logical layout of a table for use in structuring the document. Such extraction can however be difficult, because different tables use different spatial layouts. For example, some tables include a line- or vector-based grid containing each cell of the document, with the topmost row of grid elements containing column headers. In other tables, the column headers are above and outside of the line- or vector-based grid. Moreover, some cells may be split or merged, so that the table deviates from a canonical row-by-row and column-by-column format. Indeed, some tables deviate strongly from such a canonical format, and include sub-rows, sub-columns, or other structures.

[0008]Some tables include line- or vector-based gridlines that provide the reader with a guide for following rows and columns of the table. In some automated table reading approaches, these line- or vector-based gridlines are ignored, and a purely text-based analysis is performed. Such a text-only approach will lose the spatial layout information typically provided by the gridlines. However, extracting useful information about the logical layout of the table from the gridlines has heretofore been difficult.

BRIEF DESCRIPTION

[0009]According to aspects illustrated herein, there are provided method and apparatus embodiments.

[0010]In an example method embodiment, a method is disclosed for determining a table structure. A table structure is determined respective to a first two cells of a spatially ordered sequence of rectangular cells. The table structure includes elements indicative of the first two cells and at least one element indicative of a structural relationship between the first two cells. A minimum rectangular bounding box containing the cells of the table structure is defined. The table structure is updated with additional structure including an element indicative of a next cell of the spatially ordered sequence of rectangular cells and at least one element indicative of a structural relationship between the next cell and the minimum rectangular bounding box. The defining and updating are repeated until the cells of the spatially ordered sequence of rectangular cells are exhausted. In some embodiments, the method optionally includes, conditional upon a selected portion of the table structure satisfying a rewrite criterion, rewriting the selected portion of the table structure in accordance with a rewrite rule corresponding to the rewrite criterion.

[0011]In an example apparatus embodiment, an apparatus is disclosed operating on a spatially ordered sequence of rectangular cells representing a table. The apparatus includes: a two dimensional structural grammar having terminal elements corresponding to rectangular cells and non-terminal elements corresponding to structural relationship operators; and a structural parser configured to parse the spatially ordered sequence of rectangular cells representing the table in accordance with the two dimensional structural grammar. The parsing produces a grammatical expression indicative of spatial positions of the rectangular cells relative to one another. In some embodiments, the apparatus optionally includes a set of grammar rewrite rules, the structural parser accessing selected grammar rewrite rules to simplify the grammatical expression during parsing. In some embodiments, the apparatus optionally includes a pre-processor configured to process a document to identify the spatially ordered sequence of rectangular cells representing the table. In some embodiments, the apparatus optionally includes a logical analyzer configured to process contents of the table based at least in part on the grammatical expression indicative of spatial positions of the rectangular cells relative to one another.

[0012]In an example method embodiment, a method is disclosed for determining a table structure. A spatially ordered sequence of rectangular cells disposed in a two-dimensional region is derived. The spatially ordered sequence of rectangular cells is parsed in accordance with a two dimensional structural grammar having terminal elements corresponding to cells and non-terminal elements corresponding to structural relationship operators. The parsing produces a grammatical expression with the cells represented by terminal elements and structural relationships represented by non-terminal elements.

BRIEF DESCRIPTION OF THE DRAWINGS

[0013]FIG. 1 diagrammatically shows a system for identifying cells of a table that is delineated by gridlines.

[0014]FIG. 2 diagrammatically shows suitable adjustments performed by the vectors converter of FIG. 1 for removing line segment overlaps.

[0015]FIG. 3 diagrammatically shows suitable adjustments performed by the vectors converter of FIG. 1 for removing line segment crossings.

[0016]FIG. 4 diagrammatically shows suitable adjustments performed by the vectors converter of FIG. 1 to remove vector redundancies.

[0017]FIG. 5 diagrammatically shows vectors s, s.sub.1, and s.sub.2 associated with a datastructure Forks(s)=[s.sub.1,s.sub.2].

[0018]FIG. 6 diagrammatically shows vectors s, s.sub.1, and s.sub.2 associated with a datastructure Meets(s)=[s.sub.1,s.sub.2].

[0019]FIG. 7 diagrammatically shows vectors s, s.sub.1, and s.sub.2 associated with a datastructure Joins(s)=[s.sub.1,s.sub.2].

[0020]FIG. 8 diagrammatically shows vectors s, s.sub.1, and s.sub.2 of a datastructure HC(s)=[s,s.sub.1,s.sub.2] (or equivalently, HCS(s.sub.2)=[s.sub.2,s.sub.1,s]).

Continue reading about Graphical syntax analysis of tables through tree rewriting...
Full patent description for Graphical syntax analysis of tables through tree rewriting

Brief Patent Description - Full Patent Description - Patent Application Claims

Click on the above for other options relating to this Graphical syntax analysis of tables through tree rewriting patent application.
###
monitor keywords

How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Graphical syntax analysis of tables through tree rewriting or other areas of interest.
###


Previous Patent Application:
Dynamic column adjustment
Next Patent Application:
Techniques to facilitate reading of a document
Industry Class:
Data processing: presentation processing of document

###

FreshPatents.com Support
Thank you for viewing the Graphical syntax analysis of tables through tree rewriting patent info.
IP-related news and info


Results in 0.18638 seconds


Other interesting Feshpatents.com categories:
Software:  Finance AI Databases Development Document Navigation Error 174
filepatents (1K)

* Protect your Inventions
* US Patent Office filing
patentexpress PATENT INFO