| Creation of structured data from plain text -> Monitor Keywords |
|
Creation of structured data from plain textThe Patent Description & Claims data below is from USPTO Patent Application 20080126080. Brief Patent Description - Full Patent Description - Patent Application Claims A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. REFERENCE TO A COMPUTER PROGRAM LISTING APPENDIXA computer program listing appendix is included in the attached CD-R created on Dec. 12, 2000, labeled “Creation of Structured Data from Plain Text,” and including the following files: CommodityProperty.nml (13 KB), DefaultSeg14Result.xml, (2 KB), ElectricalProperty.nml (16 KB), Example.txt, Grammar.txt, INML.xml, (5 KB), MeasurementProperty.nml (22 KB), Output.txt, (3 KB), PeriodProperty.nml (6 KB), PhysicalProperty.nml (36 KB), ReservedNameProperty.nml (6 KB), Seg14.nml (30 KB), Seg14Phrasing.nml (71 KB), UsageProperty.nml (7 KB), and Utility.nml (6 KB). These files are incorporated by reference herein. BACKGROUNDA. Technical Field The present invention relates to creation of structured data from plain text, and more particularly, to creation of structured data from plain text based on attributes or parameters of a web-site's content or products. B. Background of the Invention In recent years, the Internet has grown at an explosive pace. More and more information, goods, and services are being offered over the Internet. This increase in the data available over the Internet has made it increasingly important that users be able to search through vast amounts of material to find information that is relevant to their interests and queries. The search problem can be described at least two levels: searching across multiple web-sites, and searching within a given site. The first level of search is often addressed by “search engines” such as Google™ or Alta Vista™ of directories such as Yahoo™. The second level, which is specific to the content of a site, is typically handled by combinations of search engines and databases. This approach has not been entirely successful in providing users within effiencents access to a site's content. The problem in searching a website or other information-technology based service is composed of two subproblems: first, indexing or categorizing the corpora (body of material) to be searched (i.e., content synthesis), and second, interpreting a search request and executing it over the corpora (i.e., content retrieval). In general, the corpora to be searched typically consist of unstructured information (text descriptions) of items. For e-commerce web-sites, the corpora may be the catalog of the items available through that web-site. For example, the catalog entry for a description might well be the sentence “aqua cashmere v-neck, available in small, medium, large, and extra large.” Such an entry cannot be retrieved by item type or attribute, since the facts that v-neck is a style or sweater, cashmere a form of wool, and aqua a shade of blue, are unknown to current catalogs or search engines. In order to retrieve the information that this item is available, by item type and/or attribute, this description must be converted into an attributed, categorized description. In this example, such an attributed, categorized description may include properly categorizing the item as a sweater, extracting the various attributes, and tagging their values. An example of such a description is illustrated in Table 1.
Thank you for viewing the Creation of structured data from plain text patent info. IP-related news and info Results in 0.15318 seconds Other interesting Feshpatents.com categories: Electronics: Semiconductor , Audio , Illumination , Connectors , Crypto , |
||