| Editor for deriving regular expressions by example -> Monitor Keywords |
|
Editor for deriving regular expressions by exampleRelated Patent Categories: Data Processing: Database And File Management Or Data Structures, Database Or File Accessing, Query Processing (i.e., Searching), Pattern Matching AccessEditor for deriving regular expressions by example description/claimsThe Patent Description & Claims data below is from USPTO Patent Application 20060167873, Editor for deriving regular expressions by example. Brief Patent Description - Full Patent Description - Patent Application Claims FIELD OF THE INVENTION [0001] The present invention generally relates to information processing systems. More particularly, the present invention relates to methods and apparatus for deriving pattern matching expressions by example. BACKGROUND OF THE INVENTION [0002] Pattern matching refers to the use of various program languages or utilities to search for strings or patterns in input data streams. In many applications, pattern matching involves the use of regular expressions. A regular expression provides a description of patterns composed from combinations of symbols and operators. In general, regular expressions provide a powerful system for recognizing strings in incoming data streams or incoming data requests. String recognition facilitates the application of desired processing to these incoming data requests. For example, a particular string or pattern within an incoming Hyper Text Transfer Protocol (HTTP) request can be used to indicate the identity of the user sending that request. This identity can be used to route the HTTP request to a server that is best suited to handle such requests from that user. [0003] Unfortunately, reading and writing regular expressions is challenging or difficult even for experienced programmers. For non-programmers, understanding regular expressions is often next to impossible. Although techniques other than regular expressions, for example neural networks, genetic algorithms, Bayesian networks and Markov models, are also useful for recognizing patterns in data streams and incoming requests, these approaches also must be constructed by skilled programmers. In addition, these alternative approaches to pattern matching are predicated on machine learning rather than on user inputted parameters or definitions. Therefore, the use of regular expressions is preferred, and tools and systems have been developed to facilitate the use of regular expressions. [0004] Conventional tools for engineering regular expressions require an understanding of a regular expression language. Examples of these types of editors are located at http://www.larkware.com/RegexTools.html, http://www.eclipseplugincentral.com/Web_Links+index-reg-viewlink-cid-126.- html, http://www.regexbuddy.com/create.html and [0005] http://www.codeproject.com/vb/net/regexpservice.asp. Although these editors provide some degree of assistance in developing regular expressions, each one of these editors expects users to understand the syntax and semantics of regular expression languages. [0006] U.S. patent application Publication No. US 2003/0158895 discloses a system for pluggable Uniform Resource Locator (URL) pattern matching for servlets and application servers. As disclosed, the simple hard-coded servlet container is replaced with a servlet container that allows for the plug-in of different request pattern-matching utilities. The effect is to modify the application server request interface to suit the particular needs of the developer. Although this allows for the incorporation of various matching schemes into a given request resolution, the programmer is required to implement pattern matching code according to a required standard mapping interface. The system disclosed does not provide support for authoring pattern matching logic, for example using a graphical user interface (GUI), or automated composition wizards arranged to help both programmers and non-programmers construct the desired pattern matching utility to be plugged-in. In addition, the described system lacks facilities to produce regular expressions, detrimentally requiring programmer authored pattern matching logic. [0007] U.S. Pat. No. 4,550,436 is directed to parallel text matching methods in which a highly parallel matching circuit is provided to look at the entire lines of text simultaneously and in parallel for character matches. As disclosed, the system operates to compare input lines to a pattern in a parallel, simultaneous fashion, one symbol of the pattern at a time being compared to all of the symbols of the line. This use of parallel processing is directed to reducing the search time. Although the disclosed system and method can be used with regular expression operators, no assistance is given in the authoring or creation of regular expressions themselves. [0008] U.S. Pat. No. 6,473,757 is directed to systems and methods for constraint-based sequential pattern mining. In particular, pattern mining techniques are disclosed that enable the incorporation of user-controlled focus in the mining process. Regular expressions are used to identify the family of sequential patterns of interest, and different relaxations of the regular expression constraints are used to prune the candidate patterns during the mining process. Again, no assistance or guidance is provided for the authoring of the underlying regular expressions. Therefore, knowledge of regular expressions and of parsing regular expressions is required for the authoring of the regular expressions to be used for pattern mining and for the management of these regular expressions to affect the desired pruning. [0009] U.S. Pat. No. 6,496,835 is directed to methods for mapping data-fields from one data set to another in a data processing environment. If a field cannot be matched based on name alone, e.g. an identical match, rules are employed to determine a type for the field based on the field's name. The determined type of field is then used for matching. The rules are stated using regular expressions that list the text strings or substrings associated with a given field. For a given field, sets of rules, and therefore sets of regular expressions, are created. Although these rule sets automatically map one data set to a second data set and a graphical user interface (GUI) is provided for the end-user to alter the mapping results, the regular expressions themselves have to be programmed and stored in advance. The system does not provide a means for creating or modifying the regular expressions themselves, and in particular does not provide assistance to the end-user for authoring regular expressions. [0010] U.S. Pat. No. 6,757,647 is directed to a method for encoding regular expressions in a lexicon. The disclosed method provides for creating electronically encoded lexicons that include regular expressions for augmenting the lexicon and computer-based language verification systems. Meta-characters are used to represent large sets of entries in the lexicon. Methods and support for generating regular expressions are not disclosed and no tools are provided to help lexicon authors. [0011] A machine learning system is fed with a set of inputs and the corresponding outputs which are called training examples. Such a system is supposed to automatically generate an algorithm that produces the given outputs from the corresponding inputs. Problems with this approach include a machine learning system that takes a very long time to produce results and a machine learning system that requires a very large data set to produce a correct algorithm. In addition, supplying insufficient examples to a machine learning system may result in either the complete failure to generate an algorithm or the generation of an incorrect algorithm. Moreover, a machine learning system produced algorithm may not be efficient, easily understandable by humans or transformable into a regular expression. [0012] Many could benefit from being able to utilize pattern matching schemes, but are unable or unwilling to learn the language of regular expressions. Therefore, a need exists for tools that will bring the power of regular expressions to such persons. SUMMARY OF THE INVENTION [0013] The present invention is directed to methods and systems that provide for assisted authoring of data or pattern recognition statements in a user-friendly environment. Exemplary embodiments in accordance with the present invention use one or more examples of the desired patterns, strings and sub-strings as inputs. These inputs, or example patterns, are used to generate one or more pattern recognition statements. The generated pattern recognition statements are the output. Since actual examples of the desired patterns, strings or sub-strings are used to author the pattern recognition statements, systems and methods in accordance with the present invention can be viewed as using a "by example" paradigm to create the pattern recognition statements. Assistance is provided in producing the appropriate pattern recognition statements, since the pattern recognition statement output is generated from the user-provided input without the need for a prerequisite level of knowledge or understanding on the part of the user of the language in which the pattern recognition statements are written. Preferably, this language is a regular expression language. [0014] Although the generated pattern recognition statement is fully functional and adequate to identify occurrences of the desired patterns, strings and sub-strings in an incoming request or stream of data, the present invention also provides for manual editing of the pattern recognition statement by the user. Editing by the user, however, is optional, and typically would only be accomplished by users that are well versed in the syntax and semantics of the language in which the pattern recognition statement is written. [0015] In addition to generating pattern recognition statements, the present invention also facilitates transformations of patterns, strings and sub-strings that are recognized in an incoming request or data stream. After the pattern recognition statement is generated, incoming requests and monitored streams of data are tested using this pattern recognition statement. When the desired patterns are recognized, the recognized patterns are outputted. The form of the recognized pattern, however, may not be suitable or desirable for processing, routing or handling by subsequent systems. Therefore, the recognized pattern can be transformed, for example truncated, as desired. The desired transformation can also be associated with the generation of the pattern recognition statement so that transformation is automatically performed following pattern recognition. Alternatively, the transformation can be performed as a separate independent step, for example at the direction of the user. [0016] Superior to machine learning systems, methods and systems in accordance with the present invention produce correct and efficient pattern recognition and transformation expressions, such as regular expressions, in a relatively short time using as few as one example pattern. Advantageously, the present invention can suggest a set of outputs and a corresponding regular expression for a user to select. [0017] Exemplary systems and methods in accordance with the present invention preferably use a graphical user interface (GUI) to facilitate user interactions with the example pattern or string identification and with the pattern recognition statement creation. The GUI provides for user input of the example patterns, e.g. using a keyboard or mouse, and produces one or more files containing one or more pattern recognition and string transformation statements. Relevant information including the generated pattern recognition statement and any identified transformation is displayed within the GUI environment. BRIEF DESCRIPTION OF THE DRAWINGS Continue reading about Editor for deriving regular expressions by example... Full patent description for Editor for deriving regular expressions by example Brief Patent Description - Full Patent Description - Patent Application Claims Click on the above for other options relating to this Editor for deriving regular expressions by example patent application. ### 1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored. 3. Each week you receive an email with patent applications related to your keywords. Start now! - Receive info on patent apps like Editor for deriving regular expressions by example or other areas of interest. ### Previous Patent Application: Automatic dynamic contextual data entry completion system Next Patent Application: Game-powered search engine Industry Class: Data processing: database and file management or data structures ### FreshPatents.com Support Thank you for viewing the Editor for deriving regular expressions by example patent info. IP-related news and info Results in 3.24885 seconds Other interesting Feshpatents.com categories: Computers: Graphics , I/O , Processors , Dyn. Storage , Static Storage , Printers 174 |
* Protect your Inventions * US Patent Office filing
PATENT INFO |
|