Method and system of de-identification of a record -> Monitor Keywords
Fresh Patents
Monitor Patents Patent Organizer File a Provisional Patent Browse Inventors Browse Industry Browse Agents Browse Locations
site info Site News  |  monitor Monitor Keywords  |  monitor archive Monitor Archive  |  organizer Organizer  |  account info Account Info  |  
11/01/07 - USPTO Class 707 |  92 views | #20070255704 | Prev - Next | About this Page  707 rss/xml feed  monitor keywords

Method and system of de-identification of a record

USPTO Application #: 20070255704
Title: Method and system of de-identification of a record
Abstract: A method and system of de-identification of a record (100) are provided. The method includes creating a vector of identification field values (201) of a record (100), searching unstructured data (205) of the record (100) for each identification field value of the vector (201), and de-identifying the identification field values (230) of the record (100). The step of creating a vector of identification field values (201) extracts the values from one or more structured portions (101) of the record (100). An action (202) is defined for each identification field to de-identify the identification field. The method may include defining a mapping (203) of unstructured portions (111, 112, 113, 114) of the record (100), and extracting the unstructured portions (111, 112, 113, 114) of the record (100), wherein the steps of searching and de-identifying are carried out on the extracted unstructured portions (205). (end of abstract)



Agent: Stephen C. Kaufman IBM Corporation - Yorktown Heights, NY, US
Inventors: Ock Kee Baek, Simona Cohen, Alex Melament, Pnina Vortman
USPTO Applicaton #: 20070255704 - Class: 707006000 (USPTO)

Related Patent Categories: Data Processing: Database And File Management Or Data Structures, Database Or File Accessing, Query Processing (i.e., Searching), Pattern Matching Access

Method and system of de-identification of a record description/claims


The Patent Description & Claims data below is from USPTO Patent Application 20070255704, Method and system of de-identification of a record.

Brief Patent Description - Full Patent Description - Patent Application Claims
  monitor keywords

FIELD OF THE INVENTION

[0001] This invention relates to the field of de-identification of a record. In particular, the invention relates to extracting personal information elements from unstructured portions of a record in order to remove identification information.

BACKGROUND OF THE INVENTION

[0002] Privacy of information has become very important in many different fields. Privacy is an issue that is likely to last for some time, with serious implications for businesses, especially those that rely heavily on information systems and Internet technology.

[0003] The ease with which electronic data can be transmitted, together with the vital need for data and information to advance research, has brought about the need to protect the privacy of the entities whose data is used. For example, medical research requires patient data but a patient's privacy must be protected. To preserve a person's privacy, it must be ensured that the transferred information cannot be associated with any specific individual and also that only authorized individuals based on the informed consent have access to the personal information.

[0004] This privacy is achieved by disclosing only certain pieces of non-identifiable information. To ensure complete privacy, all data must go through the process known as de-identification in which any pieces of information which can be used to identify an entity (such as an individual, a group of individuals, a business entity, a government entity, or any organisation) are removed or replaced with non-identifiable information.

[0005] Several countries have already chosen to impose this concept through legislation (for example, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) in the U.S.). The HIPAA in the US is specific to portability of health information and applicable to the healthcare industry only. The EU Privacy Directive for the European Union member countries or the PIPEDA (Personal Information Protection and Electronic Document Act) and FIPPA (Freedom of Information and Protection of Privacy Act) in Canada are broader and also rigid and applicable to all business entities across industries. Legislation in this area is progressing around the world.

[0006] Legislation that protects the privacy of individuals can vary greatly, depending on which part of the world is involved. Additionally, the type of information involved, the technologies required to identify this information, and the definition of privacy are continuously evolving. The combination of these factors presents a challenge when developing methods for the protection of privacy and de-identification.

[0007] An example target industry in which de-identification of documents is critical, is the healthcare and life sciences industry. Specifically, de-identification is required for the implementation of electronic patient records (EPR) and electronic health records (EHR) for the integration of de-identified personal health records for translational life sciences research. De-identification of a patient's personal data from medical records is a protective legal requirement imposed before medical documents can be used for research purposes or transferred to other healthcare providers (e.g., teachers, students, tele-consultations).

[0008] De-identification can be applied to other industries such as government, retail, financial, insurance, and manufacturing industries for de-identification of protected personal information attributes.

[0009] In the US, HIPAA defines "Protected Health Information" (PHI) fields that must be de-identified to protect the personal privacy of a patient. These information fields include the following fields with the action required: [0010] Name: remove. [0011] Addresses: remove, but name of State, County, City, Town can be kept depending on the size of the population and based on IRB (Institutional Review Board) decisions. [0012] Dates (e.g., DoB, ADT (admissions, discharges, transfers), DoD): replace with age ranges, or keep year only, but on an exceptional case month can be also kept. [0013] Certificate/license numbers: remove. [0014] Diagnostic device ID and serial number: remove. [0015] Biometric identifier (e.g., voice, finger print, iris, retina): remove. [0016] Full-face photo or comparable image: remove. [0017] Social security number: remove. [0018] Telephone numbers: remove. Area code and prefix can be kept only if geographical information is missing and also depending on the size of the population sharing the same area code or prefix. [0019] Fax numbers: remove. [0020] Electronic mail address: remove. [0021] URL: remove. [0022] IP address: remove. [0023] Medical record number: remove. [0024] Health plan number: remove. [0025] Account numbers: remove. [0026] Vehicle ID, serial number, and license plate number: remove.

[0027] The de-identification rules for the elements of PHI can change based on the privacy policies of individual business entities and the Institutional Review Board decisions. For example, the state and city out of the address can be kept as long as the population of the city is more than 20,000, and a date of birth can be converted to an age range if the person is 89 years or younger.

[0028] Existing methods of locating identifying personal information that can be directly used to identify a specific individual, or non-personal information (e.g. 90 years of age) that can be used indirectly to identify a specific individual, generally use natural language processing and use complex methods that require name repositories, location repositories, dictionaries, and other taxonomies that will help to detect whether a specific "word" can be used directly or indirectly to identify a person. These methods need sophisticated information retrieval techniques, must resolve ambiguity, and are required for imbedding relatively heavy processing and algorithms. Large repositories of names are required from all around the world as the population in every country today is heterogeneous as a result of large immigration.

SUMMARY OF THE INVENTION

[0029] According to a first aspect of the present invention there is provided a method of de-identification of a record, comprising: creating a vector of identification field values of a record; searching unstructured data of the record for each identification field value of the vector; and de-identifying the identification field values of the record. The unstructured data may be portions of a structured, semi-structured, or unstructured record.

[0030] In an embodiment of the present invention, the step of creating a vector of identification field values extracts the values from one or more structured portions of the record. The one or more structured portions of the record may be independent of the unstructured data of the record, for example in a different file format. Alternatively, the one or more structured portions of the record may be combined with the unstructured data of the record.

[0031] The method also preferably includes defining an action for each identification field to de-identify the identification field. An action to be applied to an identification field may be, for example, to erase, encrypt, cloak, scramble, replace with a derived value, etc.

[0032] In one embodiment, the method includes defining a mapping of unstructured portions of the record; extracting the unstructured portions of the record; and wherein the steps of searching and de-identifying are carried out on the extracted unstructured portions. The method may also include re-mapping the de-identified unstructured portions to the record.

[0033] A measure of re-identification risk of a record may be defined as the level of difficulty of inferring information in a record to specific entities. A measure of completeness may be defined as the percentage of information in a record that is not de-identified. The measure of re-identification and the measure of completeness may be used to de-identify a minimum number of identification field values in a record.

[0034] According to a second aspect of the present invention there is provided a method comprising: extracting identification field values from a record; defining a set of conversion actions with a conversion action for each identification field; storing a first set of information of the identification field values and the set of conversion actions; and storing a second set of information of the record with converted identification field values; wherein the record can be re-identified using the first and second sets of information.

[0035] The first and second sets of information may be stored securely for access only by authorised users or stored encrypted using cryptography and the decryption keys available only to authorised users.

[0036] According to a third aspect of the present invention there is provided a computer program product stored on a computer readable storage medium for de-identifying a record, comprising computer readable program code means for performing the steps of: creating a vector of identification field values of a record; searching unstructured data of the record for each identification field value of the vector; and de-identifying the identification field values of the record.

[0037] According to a fourth aspect of the present invention there is provided a system for de-identification of a record, comprising: a tool for discovering identification field values of a record; a search engine for searching unstructured data of the record for each identification field value; and a converter for de-identifying the identification field values of the record.

[0038] The converter may apply an action defined for each identification field. The tool for discovering may be configured for discovering identification field values in one or more structured portions of the record. The one or more structured portions of the record may be independent of the unstructured data of the record. Alternatively, the one or more structured portions of the record may be combined with the unstructured data of the record.

Continue reading about Method and system of de-identification of a record...
Full patent description for Method and system of de-identification of a record

Brief Patent Description - Full Patent Description - Patent Application Claims

Click on the above for other options relating to this Method and system of de-identification of a record patent application.
###
monitor keywords

How KEYWORD MONITOR works... a FREE service from FreshPatents
1. Sign up (takes 30 seconds). 2. Fill in the keywords to be monitored.
3. Each week you receive an email with patent applications related to your keywords.  
Start now! - Receive info on patent apps like Method and system of de-identification of a record or other areas of interest.
###


Previous Patent Application:
Information retrieval apparatus
Next Patent Application:
Method of evaluating document conformance
Industry Class:
Data processing: database and file management or data structures

###

FreshPatents.com Support
Thank you for viewing the Method and system of de-identification of a record patent info.
IP-related news and info


Results in 0.13109 seconds


Other interesting Feshpatents.com categories:
Tyco , Unilever , Warner-lambert , 3m 174
filepatents (1K)

* Protect your Inventions
* US Patent Office filing
patentexpress PATENT INFO