Method, system, program and data structure for cleaning a database table -> Monitor Keywords
Fresh Patents
Monitor Patents Patent Organizer How to File a Provisional Patent Browse Inventors Browse Industry Browse Agents Browse Locations
     new ** File a Provisional Patent ** 
site info Site News  |  monitor Monitor Keywords  |  monitor archive Monitor Archive  |  organizer Organizer  |  account info Account Info  |  
08/31/06 | 9 views | #20060195489 | Prev - Next | USPTO Class 707 | About this Page  707 rss/xml feed  monitor keywords

Method, system, program and data structure for cleaning a database table

USPTO Application #: 20060195489
Title: Method, system, program and data structure for cleaning a database table
Abstract: Disclosed is a method, system, program, and data structure for performing a clean operation on an input table. The input table to clean is indicated in an input data table name. At least one rule definition is processed to clean the input table. Each rule definition indicates a find criteria, a replacement value, and an input data column in the input table. The rule definition comprises a type of rule that is a member of the set of rules consisting of: find and replace, discretization, and numeric clip, and at least two rule definitions are comprised of different rule types. For each rule definition, the input data column is searched for any fields that match the find criteria. The replacement value for the particular rule definition is inserted in the fields in the input data column that match the find criteria. Subsequent applications of additional rule definitions applied to the same input data column operate on replacement values inserted in the input data column during previously applied rule definitions. (end of abstract)
Agent: Konrad Raynes & Victor, LLP Attn: Ibm54 - Beverly Hills, CA, US
Inventors: Mark Anthony Cesare, Tom Robert Christopher, Julie Ann Jerves, Richard Henry Mandel
USPTO Applicaton #: 20060195489 - Class: 707200000 (USPTO)
Related Patent Categories: Data Processing: Database And File Management Or Data Structures, File Or Database Maintenance
The Patent Description & Claims data below is from USPTO Patent Application 20060195489.
Brief Patent Description - Full Patent Description - Patent Application Claims  monitor keywords



CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application is a divisional application of and claims the benefit of "METHOD, SYSTEM, PROGRAM, AND DATA STRUCTURE FOR CLEANING A DATABASE TABLE", having application Ser. No. 09/399,694, filed Sep. 21, 1999, the disclosure of which is incorporated herein by reference in its entirety.

[0002] This application is related to the following commonly-assigned patents and co-pending patent applications, all of which are filed on the same date herewith, and which are incorporated herein by reference in their entirety: [0003] "Method, System, Program, And Data Structure for Transforming Database Tables," to Mark A. Cesare, Tom R. Christopher, Julie A. Jerves, Richard H. Mandel III, and having attorney docket number ST9-99-034, application Ser. No. 09/400,507, and U.S. Pat. No. 6,920,443, issue don Jul. 19, 2005; [0004] "Method, System, Program, And Data Structure for Pivoting Columns in a Database Table," to Mark A. Cesare, Julie A. Jerves, and Richard H. Mandel III, and having attorney docket number ST9-99-035, application Ser. No. 09/400,057, and U.S. Pat. No. 6,604,095, issued on Aug. 5, 2003; [0005] "Method, System, and Program for Inverting Columns in a Database Table," to Mark A. Cesare, Julie A. Jerves, and Richard H. Mandel III, and having attorney docket no. ST9-99-038, application Ser. No. 09/400,690, and U.S. Pat. No. 6,748,389, issued on Jun. 8, 2004; and [0006] "Method, System, Program, And Data Structure For Cleaning a Database Table Using a Look-up Table," Mark A. Cesare, Julie A. Jerves, and Richard H. Mandel III, and having attorney docket no. ST9-99-036, application Ser. No. 09/401,006, and U.S. Pat. No. 6,965,888, issued Nov. 15, 2005.

BACKGROUND OF THE INVENTION

[0007] 1. Field of the Invention

[0008] The present invention relates to a method, system, program, and data structure for cleaning a database table and, in particular, for performing clean operations on columns in the database table.

[0009] 2. Description of the Related Art

[0010] Data records in a computer database are maintained in tables, which are a collection of rows all having the same columns. Each column maintains information on a particular type of data for the data records which comprise the rows. A data warehouse is a large scale database including millions or billions of records defining business or other types of transactions or activities. Data warehouses contain a wide variety of data that present a coherent picture of business or organizational conditions over time. Various data analysis and mining tools are provided with the data warehouse to allow users to effectively analyze, manage and access large-scale databases to support management decision making. Data mining is the process of extracting valid and previously unknown information from large databases and using it to make crucial business decisions. In many real-world domains such as marketing analysis, financial analysis, fraud detection, etc, information extraction requires the cooperative use of several data mining operations and techniques.

[0011] Once the desired database tables have been selected and the data to be mined has been identified, transformations on the data may be necessary. Transformations vary from conversions of one type of data to another, e.g., converting nominal values into numeric ones so that they can be processed by a neural network, to definition of new attributes, i.e., derived attributes. New attributes are defined either by applying mathematical or logical operators on the values of one or more database attributes. The transformed data is stored in a target database where it may then be mined using one or more techniques to extract the desired type of information necessary to make the organizational decisions. Further details of data mining are described in the International Business Machines Corporation (IBM) publication entitled "White Paper: Data Mining Solutions" (IBM Copyright, 1996)

[0012] Data transformation refers to the process of filtering, merging, decoding, and translating source data to create validated data for the data warehouse and data mining tools. For example, a numeric regional code might be replaced with the name of the region. Data transformations and cleansing is used when data is inconsistent or incompatible between sources. In such case, some level of data cleansing is needed to ensure data consistency and accuracy. Some of the current techniques for transforming and cleansing data include the use of an SQL WHERE clause to limit the rows extracted from the source table. Further, formulas and expressions specified in the column definition window and constants and tokens are used to eliminate and modify data.

[0013] Previous versions of IBM Visual Warehouse included programs to allow users to perform numerous functions on the source data. For instance, if one database table has revenue data in U.S. dollars and another data table stores revenue data in foreign currency denominations, then the foreign revenue data must be cleansed before both sets of data can be analyzed together. Transformation operations may be performed using application programs external to the database program that process and transform tables of data records. Further details of data warehousing and data transforms, are described in the IBM publications "Managing Visual Warehouse, Version 3.1," IBM document no. GC26-8822-01 (IBM Copyright, January, 1998), which is incorporated herein by reference in its entirety.

[0014] Notwithstanding current programs for cleansing data, there is a need in the art to provide users greater control over operations to clean input data.

SUMMARY OF THE PREFERRED EMBODIMENTS

[0015] To overcome the limitations in the prior art described above, preferred embodiments disclose a method, system, program, and data structure for performing a clean operation on an input table. The input table to clean is indicated in an input data table name. At least one rule definition is processed to clean the input table. Each rule definition indicates a find criteria, a replacement value, and an input data column in the input table. The rule definition comprises a type of rule that is a member of the set of rules consisting of: find and replace, discretization, and numeric clip, and at least two rule definitions are comprised of different rule types. For each rule definition, the input data column is searched for any fields that match the find criteria. The replacement value for the particular rule definition is inserted in the fields in the input data column that match the find criteria. Subsequent applications of additional rule definitions applied to the same input data column operate on replacement values inserted in the input data column during previously applied rule definitions.

[0016] In further embodiments, each rule definition is associated with one rule table including the find criteria and replacement value. In such case, a rule table column parameter is provided for each rule definition indicating the columns in the rule table including the find criteria and replacement value for that rule definition. In certain embodiments, two rule definitions may have the same rule table. In such case, the rule table column parameters indicate different columns in the same rule table including the find criteria and replacement value for each rule definition. In still further embodiments, a separate rule table may include the find criteria and replacement value for different rule definitions.

[0017] Still further, a rule definition may include multiple find criteria and a corresponding replacement value for each find criteria. In such case, the step of searching the input data column comprises applying each of the multiple find criteria to one field until a match occurs or none of the multiple find criteria are found to match the field content. When a match is found, the replacement value corresponding to the find criteria is inserted in the field having the matching content.

[0018] In preferred embodiments, the rule definition may define a find and replace rule, a discretization rule or a numeric clip rule. Different rule definitions may define different rule types.

[0019] In preferred embodiments, the rule definitions may be communicated from one computer system, such as a client, to a computer system including the input data table, such as a database server. The rule definitions are then executed against the input table on the database server including the input tables.

[0020] Preferred embodiments provide a data command structure including one or more rule definitions for performing different operations on the data in an input data table. The preferred embodiments provide a command structure that accommodates multiple types of clean operations to be performed on an input data table before the input data table is written to the output table. Further, preferred embodiments allow a client to transfer clean commands including to the database server including the database for execution on the database server. This reduces network traffic as the database tables subject to the clean operation do not have to be transferred between the database server and the client constructing the clean commands. Further, in preferred embodiments, the rule definitions are maintained in rule tables in the server. This further reduces network traffic as the clean command need only specify the location of rules to apply and does not have to provide tables of rules.

BRIEF DESCRIPTION OF THE DRAWINGS

[0021] Referring now to the drawings in which like reference numbers represent corresponding parts throughout:

[0022] FIG. 1 illustrates a computing environment in which preferred embodiments are implemented;

[0023] FIG. 2 illustrates the parameters used in a transform command to clean input tables in accordance with preferred embodiments of the present invention;

[0024] FIGS. 3a, 3b, 4, and 5 illustrate examples of a rule table to clean data in accordance with preferred embodiments of the present invention;

Continue reading...
Full patent description for Method, system, program and data structure for cleaning a database table

Brief Patent Description - Full Patent Description - Patent Application Claims
Click on the above for other options relating to this Method, system, program and data structure for cleaning a database table patent application.
###
monitor keywords

How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Method, system, program and data structure for cleaning a database table or other areas of interest.
###


Previous Patent Application:
Method, system, program and data structure for cleaning a database table
Next Patent Application:
System and method for providing a dynamic user interface for workflow in hospitals
Industry Class:
Data processing: database and file management or data structures

###

FreshPatents.com Support
Thank you for viewing the Method, system, program and data structure for cleaning a database table patent info.
IP-related news and info


Results in 1.45173 seconds


Other interesting Feshpatents.com categories:
Tyco , Unilever , Warner-lambert , 3m