`US007519591B2
`
`c12) United States Patent
`Landi et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,519,591 B2
`Apr.14, 2009
`
`(54) SYSTEMS AND METHODS FOR
`ENCRYPTION-BASED DE-IDENTIFICATION
`OF PROTECTED HEALTH INFORMATION
`
`(75)
`
`Inventors: William A. Landi, Devon, PA (US); R.
`Bharat Rao, Berwyn, PA (US)
`
`(73) Assignee: Siemens Medical Solutions USA, Inc.,
`Malvern, PA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 427 days.
`
`(21) Appl. No.: 10n96,255
`
`(22) Filed:
`
`Mar. 9,2004
`
`(65)
`
`Prior Publication Data
`
`US 2005/0165623 Al
`
`Jul. 28, 2005
`
`Related U.S. Application Data
`
`(60) Provisional application No. 60/454,114, filed on Mar.
`12, 2003.
`
`(51) Int.Cl.
`G06F 7100
`(2006.01)
`G06F 17/30
`(2006.01)
`G06F 17/00
`(2006.01)
`............................. 707/6; 707/9; 707/104.1
`(52) U.S. Cl .
`( 58) Field of Classification Search .... ...... ........ ... 707 /1,
`707/9, 100, 102, 103 R, 6, 10, 104.1, 200;
`713/109, 193; 705/2, 3, 51; 726/26, 27
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`6,823,203 B2 • 11/2004
`6,874,085 Bl •
`3/2005
`7,039,810 Bl• 5/2006
`7,158,979 B2 *
`1/2007
`7,181,017 Bl* 2/2007
`2001/0054155 Al* 12/2001
`2002/0116227 Al• 8/2002
`2003/0021417 Al•
`1/2003
`2003/0088771 Al •
`5/2003
`2004/0143594 Al• 7/2004
`2004/0172293 Al• 9/2004
`
`Jordan ........................ 600/407
`Koo et al. ................... 713/165
`Nichols ...................... 713/182
`Iverson et al ................ 707/100
`Nagel et al. ................. 380/282
`Hagan eta!. ................ 713/193
`Dick ............................. 705/3
`Vasic et al.
`................. 380/277
`Merchen ..................... 713/175
`Kalies .................... 707/103 R
`Bruschi et al. ................. 705/2
`
`OTHER PUBLICATIONS
`
`in Medicine
`and Communications
`Imaging
`"Digital
`(DICOM}-Supplement 55: Attribute Level Confidentiality (Includ(cid:173)
`ing De-identifciation)", DICOM Standards Committee, Working
`Group 14 Security, Sep. 5, 2002.
`
`* cited by examiner
`
`Primary Examiner-Don Wong
`Assistant Examiner-Merilyn P Nguyen
`
`(57)
`
`ABSTRACT
`
`Systems and methods are provided for protecting individual
`privacy (e.g., patient privacy) when individual data records
`( e.g., patient data records) are shared between various entities
`( e.g., healthcare entities). In one aspect, systems and methods
`are provided which implement secured key encryption for
`de-identifying patient data to ensure patient privacy, while
`allowing only the owners of the patient data and/or legally
`empowered entities to re-identify subject patients associated
`with de-identified patient data records, when needed.
`
`5,956,400 A •
`
`9/1999 Chaum et al. ............... 713/167
`
`33 Claims, 3 Drawing Sheets
`
`Obtain EncryptionlDacryptionKw,;(s)
`
`so
`---------1,
`"
`
`---....11....----1,
`Clblalnkllr'llilledP.uentData
`
`Provide Input Key(s) and ldantllied Patient Data as
`Input to EIICfllplton SystMt and Commence
`Da-ldantificationPmcna
`
`Genarale an EnGtypted ID tor each Paffal'lt
`Ulllng Enmyplon Kfly
`
`Oplionally Generate Dala S17Uclw& lhat
`Maps eech Ellctyptad ID to a Sluc6-' ID
`.-----.L.-----'i"
`
`Parform D11-ld11nllflcation ofSCrucllllad PaHentDela
`Reoorde, and Optionally ~laoe Ce-lelentifled Data
`with Readable strings Mapped ta Enan,tedl!J$
`
`PerratmD&-ldentfflcatlondUnstNcturedPatlentOala
`Records. and Optionally Replace Da-lclanllfled Dala
`wllh ReadaDle Strings Mapped kl Em;r;pted IOl1
`
`Perform Oe-ldenlfflcatlon al Pattem Image Data, and
`Optlonally Raplaca De-ldenttnad DalaWlth Readable
`Sti1nge; Mapped kl ~ r / ,0 ,
`
`_____ .._ ___ ___,.
`"
`..
`.--..,---o-..,..--.._•_-,-o,-1a1a-,--',
`
`CelivefyklRemoleCllltafReclplent
`
`DATAVANT, INC. EXHIBIT NO. 1005
`Page 1 of 13
`
`
`
`U.S. Patent
`
`Apr. 14, 2009
`
`Sheet 1 of 3
`
`US 7,519,591 B2
`
`,,,20-1
`
`,,,23-1
`
`,21-1
`
`,,-23-n
`
`. P~lic Key~j
`1 1
`. Pnvate Key,
`~ter Public Key
`. ---'-'="'
`,.,22-t
`Encryption
`System
`
`.1 P~blic Keytj j
`1
`. Pnvate Key,:
`~aster Public Key
`_ _____._.,,,,.
`! ,22-n
`Encryption
`System
`
`.-25-1
`
`~26-1
`
`,20-n
`
`,2t-n
`-
`.
`· - . ·11,;
`entifiedl I
`ent Data
`_ __L.iJ I
`
`·~
`
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`~ '
`
`t _______________
`
`I
`I
`I
`i
`
`,.,30
`
`_,.J. ______ ...J
`I
`•
`I
`I
`I
`I
`I
`I
`
`Trusted Broker
`
`Enome
`
`I
`
`40
`
`~41
`
`,-50
`
`r------...1 IIKey Ge['!eration II
`I •
`
`----===,,,_,,, _ _,rr53
`aster ! j
`-
`ate Key:
`___Ll.,J I
`i
`Encryption
`System
`
`,,52
`
`,..34
`
`!ff
`
`FIG. 1
`
`DATAVANT, INC. EXHIBIT NO. 1005
`Page 2 of 13
`
`
`
`U.S. Patent
`
`Apr. 14, 2009
`
`Sheet 2 of 3
`
`US 7,519,591 B2
`
`Patient Medical Record
`
`2s
`
`CT Wd
`X-Ray ~
`Labs I
`I
`Doc Visit lfil ■
`
`PET ,~
`
`~
`~
`
`I
`
`it
`
`Procedures
`
`MR
`
`Rx
`Radiologist
`findings
`Specialist
`findings
`Demographics
`
`Billing
`
`I
`I
`■ [I;]
`~
`~
`
`I
`
`I
`~
`~
`
`- - - - - - - - - - - - - - - - - - . t ime
`
`FIG.2
`
`DATAVANT, INC. EXHIBIT NO. 1005
`Page 3 of 13
`
`
`
`U.S. Patent
`
`Apr.14, 2009
`
`Sheet 3 of 3
`
`US 7,519,591 B2
`
`Obtain Encryption/Decryption Key(s)
`
`Obtain Identified Patient Data
`
`/ 50
`
`,,-51
`
`Provide Input Key(s) and Identified Patient Data as
`Input to Encryption System and Commence
`De-Identification Process
`
`Generate an Encrypted ID for each Patient
`Using Encryption Key
`
`Optionally Generate Data Structure that
`Maps each Encrypted ID to a Study ID
`
`l
`
`'
`'
`'
`'
`'
`'
`'
`
`r 52
`
`/' 58
`
`/' 54
`
`r 55
`
`/' 57
`
`r 58
`
`Perform De-Identification of Structured Patient Data
`Records, and Optionally Replace De-Identified Data
`with Readable Strings Mapped to Encrypted IDs
`
`/' 56
`Perform De-Identification of Unstructured Patient Data
`Records, and Optionally Replace De-Identified Data
`with Readable Strings Mapped to Encrypted IDs
`
`Perform De-Identification of Patient Image Data, and
`Optionally Replace De-Identified Data with Readable
`Strings Mapped to Encrypted IDs
`
`Encrypt De-Identified Patient Data for
`Delivery to Remote Cite of Recipient
`
`FIG.3
`
`DATAVANT, INC. EXHIBIT NO. 1005
`Page 4 of 13
`
`
`
`US 7,519,591 B2
`
`1
`SYSTEMS AND METHODS FOR
`ENCRYPTION-BASED DE-IDENTIFICATION
`OF PROTECTED HEALTH INFORMATION
`
`CROSS-REFERENCE TO RELATED
`APPLICATION
`
`This application claims priority to U.S. Provisional Appli(cid:173)
`cation Ser. No. 60/454,114, filed on Mar. 12, 2003, which is
`fully incorporated by reference.
`
`TECHNICAL FIELD OF THE INVENTION
`
`The present invention relates, in general, to systems and
`methods for protecting patient privacy when health care infor(cid:173)
`mation is shared between various healthcare entities and, in
`particular, to systems and methods that implement secured
`key encryption for de-identifying patient data to ensure
`patient privacy, while allowing only the owners of the patient
`data records and/or other legally empowered entities to re- 20
`identify subject patients of de-identified data records, when
`needed.
`
`BACKGROUND
`
`2
`consents, authorizations and notices, which must be adopted
`in order to maintain, use, or disclose individually identifiable
`health information in treatment, business operations or other
`activities.
`The HIPAA Privacy Rule allows for certain entities to
`"de-identify" protected health information for certain pur(cid:173)
`poses so that such information may be used and disclosed
`freely, without being subject to the protections afforded by
`the Privacy Rule. The term "de-identified data" as used by
`10 HIPAA refers to patient data from which all information that
`could reasonably be used to identify the patient has been
`removed ( e.g., removing name, address, social security num(cid:173)
`bers, etc ... ). The Privacy Rule requirements do not apply to
`information that has been de-identified. HIPAA also defines
`15 the notion of"Limited Data Set" which is "de-identified data"
`but the de-identification requirements are not as stringent.
`Further, the distribution requirements on limited data sets are
`tighter than those for more completely de-identified data.
`Conventional methods for de-identifying patient data
`include simply stripping all information from the patient data
`records that can be used to determine the identity of a patient,
`or replacing such patient identifying information with some(cid:173)
`thing else ( e.g. replace the actual name with the string
`"name"). With such methods, although the patient data
`25 records are de-identified, there is no mechanism by which
`patient identification can be recovered, if necessary.
`
`Due to continued technological advancements in data stor(cid:173)
`age systems and information processing systems, health care
`providers and organizations continue to migrate toward envi(cid:173)
`ronments where most aspects of patient care management are
`automated, making it easier to collect and analyze patient 30
`information. Consequently, health care providers and organi(cid:173)
`zations, etc., tend to accumulate vast stores of patient infor(cid:173)
`mation, such as financial and clinical information, in the form
`of electronic patient data records that are stored in electronic
`databases or other electronic medium such as files. In this 35
`document, the term database is used as a general term to
`denote any mechanism for storing data electronically and is
`not limited to a traditional database system. Such patient
`information may be stored in a myriad of unstructured and
`structured formats, and includes many items of patient iden(cid:173)
`tifying information that can be used to identify subject
`patients of the patient data records.
`There are various circumstances in which healthcare orga(cid:173)
`nizations have to disclose or otherwise share their patient data
`with other healthcare entities, agencies, business partners,
`etc. However, healthcare organizations have both an ethical
`and legal responsibility for protecting patient privacy. Orga(cid:173)
`nizations camiot release or otherwise disclose patient data
`records that contain patient identifying information that can
`be used to identify patients without patient approval unless
`there is a valid reason as defined by various laws and regula(cid:173)
`tions. For example, valid reasons are generally related to
`TPO, treatment, payment, and operations but can also cover
`other activities such as certain research. Even when a valid
`reason exists, there is still an obligation on the part of the
`organization to release only the minimum amount of infor(cid:173)
`mation that is necessary for the particular reason.
`In the United States, standards such as HIPAA (Health
`Insurance Portability and Accountability Act) have resulted
`in Federal regulations that place strict requirements on the 60
`archiving and disclosure of medical records. For example, in
`accordance with HIPAA, Federal regulations have been pro(cid:173)
`mulgated requiring healthcare organizations and physicians
`to ensure the protection, privacy and security of patient medi(cid:173)
`cal information. In particular, the "Privacy Rule" ofHIPAA
`provides Federal privacy regulations that set forth require(cid:173)
`ments for confidentiality and privacy policies and procedures,
`
`SUMMARY OF THE INVENTION
`
`invention generally
`Exemplary embodiments of the
`include systems and methods for protecting individual pri(cid:173)
`vacy (e.g., patient privacy) when private information (e.g.,
`health care information) is shared between various entities
`( e.g., healthcare entities). More specifically, exemplary
`embodiments of the invention include systems and methods
`that implement secured key encryption for de-identifying
`patient data to ensure patient privacy, while allowing only the
`owners of the patient data and/or legally empowered entities
`to re-identify subject patients associated with de-identified
`40 patient data records, when needed.
`In one exemplary embodiment of the invention, a method
`for processing data includes the steps of obtaining a data
`record of an individual which includes individual identifying
`information, removing the individual identifying information
`45 in the data record to generate a de-identified data record,
`generating an encrypted ID for the individual, wherein the
`encrypted ID comprises an encrypted representation of one or
`more items of individual identifying information, and storing
`the encrypted ID with or in the de-identified data record. A
`50 decryption key is securely maintained and accessible by an
`authorized entity that is legally authorized or empowered to
`decrypt the encrypted ID in the de-identified data record to
`re-identify the individual.
`In one exemplary embodiment of the invention, the data
`55 records are patient data records containing clinical and pos(cid:173)
`sibly financial (billing) information. In other embodiments of
`the invention, the data records may comprise, e.g., financial
`records, employer/employee records, appraisals, student
`records, etc.
`In another exemplary embodiment of the invention, a sys-
`tem for processing data includes a first data processing sys(cid:173)
`tem, a second data processing system and a third data pro(cid:173)
`cessing system. The first data processing system comprises a
`first repository that stores data records of an individual which
`include individual identifying information, and an encryption
`system that can generate an encrypted ID for the individual
`using an encryption key associated with the first data process-
`
`65
`
`DATAVANT, INC. EXHIBIT NO. 1005
`Page 5 of 13
`
`
`
`US 7,519,591 B2
`
`3
`ing system, wherein the encrypted ID comprises an encrypted
`representation of one or more items of individual identifying
`information, and wherein the encryption system can generate
`de-identified data records of the individual which are associ(cid:173)
`ated with the encrypted ID. The second data processing sys(cid:173)
`tem comprises a second repository that stores de-identified
`data records generated by the first data processing system and
`an engine that can process the de-identified data records in the
`second repository. The third data processing system com(cid:173)
`prises a third repository that stores a master decryption key, 10
`and an encryption system that can use the master decryption
`key to decrypt an encrypted ID of de-identified data records to
`re-identify an individual.
`These and other exemplary embodiments, aspects, features
`and advantages of the present invention will become apparent 15
`from the following detailed description of exemplary
`embodiments, which is to be read in connection with the
`accompanying drawings.
`
`4
`executable by any device or machine comprising suitable
`architecture. It is to be further understood that because the
`constituent system modules and method steps depicted in the
`accompanying Figures can be implemented in software, the
`actual connections between the system components ( or the
`flow of the process steps) may differ depending upon the
`manner in which the application is programmed. Given the
`teachings herein, one of ordinary skill in the related art will be
`able to contemplate these and similar implementations or
`configurations of the present invention.
`FIG. 1 illustrates a high-level schematic diagram of a sys(cid:173)
`tem (10) according to one exemplary embodiment of the
`invention wherein a secured encryption scheme is imple(cid:173)
`mented for protecting patient privacy when patient data is
`shared between different entities. In general, the exemplary
`system (10) comprises a plurality of patient data processing
`systems (20-1-20-n), a central patient data processing system
`(30), a trusted broker system (40), and a central surveillance
`system (50). The patient data processing systems (20-1-20-n)
`20 are operated at different clinical sites by different healthcare
`organizations (including, but not limited to, doctors, health
`care providers, institutions, associations, organizations, hos(cid:173)
`pitals, etc.). In the exemplary embodiment, each clinical cite
`releases de-identified patient data records to a third-party
`25 entity that operates the central patient data processing system
`(30), wherein the collected de-identified data records can be
`processed for purposes of, e.g., research, health care moni(cid:173)
`toring, etc. In addition, the central surveillance system (50)
`may be operated and controlled by a governmental agency
`30 (such as the CDC), or any other entity that is authorized by
`laws or regulations for example, to re-identify subject
`patients that are associated with de-identified patient data
`records maintained and processed by the central patient data
`processing system (30).
`Furthermore, in the exemplary embodiment of FIG. 1, the
`system (10) is implemented using a asymmetric encryption
`scheme (e.g., RSA) for protecting patient privacy, wherein
`de-identification and re-identification of patient data is imple(cid:173)
`mented using public key/private key pairs that are generated
`40 for different healthcare entities. As explained in further detail
`below, the public keys (which are generated based on one or
`more private keys) are used for de-identifying ( encrypting)
`one or more items of patient identifying information (i.e.,
`information that could be used for identifying the patient) to
`45 generate an encrypted version of the patient identifying infor(cid:173)
`mation (referred to herein as "Encrypted IDs") and the private
`keys are used for decrypting de-identified patient data to
`re-identify subject patients.
`In the exemplary embodiment of FIG. 1, each patient data
`50 processing system (20-1-20-n) respectively comprises per(cid:173)
`sistently stored electronic patient data records (21-1-21-n)
`that contain identified patient data, an encryption system
`(22-1-22-n), and optional securely stored encryption keys
`(23-1-23-n). In addition, each patient data processing system
`55 (20-1-20-n) may optionally store data structures (24-1-24-n)
`such as index or map structures, which, as explained below,
`can be used for mapping "Encrypted IDs "to "Study IDs", a
`repository of persistently stored de-identified data records
`(25-1-25-n) (to optionally store de-identified data that is
`60 provided to the central data processing system (30)) and a
`repository of persistently stored re-identified data (26-1-26-
`n) to optionally store re-identified patient data that is gener(cid:173)
`ated, when needed, using a corresponding private ( decryp-
`tion) key and the de-identified data.
`In one exemplary embodiment of the invention, each
`encryption system (22-1-22-n) implements the same secure
`public encryption protocol (e.g., RSA). Depending on the
`
`35
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a high-level schematic diagram of a sys(cid:173)
`tem according to an exemplary embodiment of the invention,
`which employs secured key encryption to protect patient pri(cid:173)
`vacy when patient data is shared between different entities.
`FIG. 2 illustrates an exemplary electronic patient medical
`record comprising a plurality of structured and unstructured
`data sources containing patient identifying information,
`which can be automatically de-identified using systems and
`methods according to exemplary embodiments of the inven(cid:173)
`tion.
`FIG. 3 is a flow diagram of a method for de-identifying
`patient data records according to an exemplary embodiment
`of the invention.
`
`DETAILED DESCRIPTION OF EXEMPLARY
`EMBODIMENTS
`
`In general, exemplary embodiments of the invention as
`described herein include systems and methods for protecting
`patient privacy when patient health care information is shared
`between various entities. More specifically, systems and
`methods according to the invention implement a secured
`encryption protocol that enables de-identification of patient
`data in a manner that protects patient privacy, while allowing
`owners of the patient data and/or legally empowered entities,
`to re-identify subject patients that are associated with de(cid:173)
`identified patient data records, when needed or desired. For
`example, depending on the application, a secured encryption
`protocol for de-identifying and re-identifying patient data
`may be implemented using an asymmetric or symmetric key
`encryption method. Advantageously, as explained below, sys(cid:173)
`tems and methods according to the invention for de-identify(cid:173)
`ing/re-identifying patient data can be implemented for vari(cid:173)
`ous purposes such as research, public health or healthcare
`operations, while maintaining compliance with regulations
`based on HIPAA for protecting patient privacy.
`It is to be understood that the exemplary systems and
`methods described herein in accordance with the present
`invention may be implemented in various forms of hardware,
`software, firmware, special purpose processors, or combina(cid:173)
`tions thereof. In one exemplary embodiment of the invention,
`the exemplary systems and methods described herein are
`implemented in software as an application comprising pro(cid:173)
`gram instructions that are tangibly embodied on one or more 65
`program storage devices ( e.g., hard disk, magnetic floppy
`disk, RAM, CD Rom, DVD, ROM and flash memory), and
`
`DATAVANT, INC. EXHIBIT NO. 1005
`Page 6 of 13
`
`
`
`US 7,519,591 B2
`
`5
`application, however, the encryption systems (22-1-22-n)
`may perform one or more functions. For example, according
`to one exemplary embodiment of the invention, the encryp(cid:173)
`tion systems (22-1-22-n) may include methods for encrypt(cid:173)
`ing patient identifying information that is stored in ( or stored
`with) the patient data records (21-1-21-n) using respective
`public keys and/or a master public key (23-1-23-n). More
`specifically, each healthcare entity can use its own public key
`or both its own public key and the master key to encrypt one
`or more items of patient identifying information contained in 10
`the respective patient medical records owned/managed by the
`healthcare entity), to thereby generate an Encrypted ID for
`each subject patient. An Encrypted ID of a given patient is a
`unique reproducible encrypted version of patient identifying
`information that is sufficient to uniquely identify the patient. 15
`The patient identifying information, which is encrypted to
`generate the Encrypted ID, may include one or more items of
`patient identifying information, including, but not limited to,
`patient name, social security number, and/or address.
`If a particular entity only uses its public key to encrypt 20
`patient identifying information, then only the entity's corre(cid:173)
`sponding private key can be used to decrypt the de-identified
`data. If a particular entity only uses the master public key to
`encrypt the patient identifying information, then only a mas-
`ter private key (e.g., a master private key (53) of the surveil- 25
`lance system (50)) can be used to decrypt the de-identified
`data. On the other hand, if a particular entity uses both its
`public key and the master public key to encrypt the patient
`identifying information, then either the entity's correspond(cid:173)
`ing private key or the master private key can be used to 30
`decrypt the de-identified data.
`The encryption systems (22-1-22-n) can operate to gener-
`ate Encrypted IDs and then store each Encrypted ID with or
`within a corresponding de-identified data patient data record
`(i.e., a patient data records from which patient identifying
`information has been removed). In this regard, each data
`processing system (20-1-20-n) may comprise a repository of
`persistently stored de-identified patient data records (25-
`1-25-n) which include de-identified data patient data records
`having corresponding Encrypted IDs stored in or with the 40
`de-identified records. The encrypted (de-identified) patient
`data (Encrypted IDs) can be included in de-identified patient
`records, which can be released to third-party entities without
`compromising patient privacy. In general these records would
`be removed from the data processing system (20-1-20-n) 45
`after transfer to the central patient data processing system
`(30).
`It is to be understood that the process of removing patient
`identifying information from patient records can be per(cid:173)
`formed manually, or using automated methods according to 50
`the invention. For example, a de-identified data record for a
`given patient may be generated by an authorized user who
`manually removes patient identifying information from a
`data record and then uses the encryption system to generate an
`Encrypted ID for the subject patient, which is associated with 55
`the de-identified data record.
`Further, in another exemplary embodiment of the inven(cid:173)
`tion, the encryption systems (22-1-22-n) may include meth(cid:173)
`ods for automatically de-identifying a patient data record by
`automatically removing patient identifying information from
`the patient data records. More specifically, the encryption
`systems (22-1-22-n) may include methods for automatically
`de-identifying structured and/or unstructured patient data
`records that are included in the persistently stored electronic
`patient data records (21-1-21-n). By way of example, FIG. 2
`illustrates an exemplary embodiment of the electronic patient
`data records (21-1-21-n) in the form of computerized patient
`
`6
`records (CPR) ( or electronic patient medical records) includ(cid:173)
`ing a plurality of structured and unstructured data sources for
`maintaining patient information that can be collected over the
`course of patient treatments. The patient information may
`include, e.g., computed tomography (CT) images, X-ray
`images, laboratory test results, doctor progress notes, details
`about medical procedures, prescription drug information,
`radiological reports, other specialist reports, demographic
`information, and billing (financial) information. In general,
`the structured data sources include, for example, financial,
`laboratory, and pharmacy databases, wherein patient infor(cid:173)
`mation in typically maintained in database tables. The
`unstructured data sources include for example, free-text
`based documents (e.g., physician reports, etc.) and images
`and waveforms data. Various methods for automatically de(cid:173)
`identifying structured and unstructured data will be discussed
`in detail below with reference to FIG. 3, for example.
`Referring again to FIG. 1, the encryption systems (22-
`1-22-n) may
`further
`include methods
`for mapping
`"Encrypted IDs" to "Study IDs" and replacing de-identified
`patient information in de-identified patient data records with
`human readable strings (as opposed to encrypted strings. As
`noted above, an Encrypted ID is generated by encrypting one
`or more items of patient identification information using a
`public key, and the Encrypted ID is included with or within a
`de-identified data record. However, the encrypted patient
`information in a de-identified patient data record would
`include a character string (e.g., a 128+ character string) that
`could make difficult or burdensome for a person to review the
`de-identified data record. Accordingly, a de-identification
`process according to an embodiment of the invention includes
`a method for replacing de-identified patient information with
`user-friendly character strings that contain no patient infor-
`35 mation.
`More specifically, in one exemplary embodiment of the
`invention, a unique Encrypted ID (e.g., 128+ non-readable
`character string) of a given patient can be arbitrarily mapped
`to a Study ID that does not provide patient information, which
`is mapped in tum to one or more human readable, short
`replacement text strings that can be used for replacing de-
`identified patient data that is associated with the Encrypted
`ID. For example, given an arbitrary unique Encrypted ID that
`is mapped to a Study ID such as "42", text strings such as
`"patient 42 ID", "patient 42 name", "patient 42 address", etc
`... could be used to replace the various pieces of de-identified
`patient data in a de-identified data record. This association
`and human friendly replacement are optional and not
`required. In the system of FIG. 1, the persistently stored ID
`mappings (24-1-24-n) comprise data structures that map
`Encrypted IDs to Study IDs and corresponding replacements
`strings. Since these ID mappings (24-1-24-n) contain no
`patient information, that can be made publicly available to the
`third-party entity and stored in the database (35) without
`compromising patient privacy.
`Furthermore, according to another exemplary embodiment
`of the invention, the encryption systems (22-1-22-n) may
`further include methods for re-identifying subject patients by
`using the appropriate private key to decrypt de-identified
`60 patient data that is contained in de-identified patient data
`records (25-1-25-n). For example, as depictedinFIG.1, each
`of the healthcare entities that release de-identified patient data
`records can securely store a respective private key (23-1-23-
`n) for purposes of re-identifying patient data that was
`65 decrypted using their respective public key. During a re(cid:173)
`identification process, if de-identified data records contain
`one or more Study IDs, the encryption systems (22-1-22-n)
`
`DATAVANT, INC. EXHIBIT NO. 1005
`Page 7 of 13
`
`
`
`US 7,519,591 B2
`
`7
`can utilize the respective ID mapping(s) (24-1-24-n) to
`obtain the corresponding Encrypted IDs to be decrypted.
`The central patient data processing system (30) comprises
`a repository of persistently stored de-identified patient data
`records (35) containing de-identified (encrypted) patient 5
`identifying information, which are collected from the differ(cid:173)
`ent clinical sites. The central patient data processing system
`(30) comprises one or more data processing engines (37) that
`are used for processing the de-identified patient data (35) for
`one or more given applications. The central patient data pro- 10
`cessing system (30) includes a repository of ID mappings
`(34), which includes a collection of the ID mappings (23-14-
`n) associated with the individual data processing systems
`(20-10-n). The information contained the repository of ID
`mappings (34) is used by the data processing engine(s) (37) 15
`(35).
`when processing
`the de-identified data records
`Although only one central data processing system (30) is
`shown in FIG. 1, it is to be understood that in other embodi(cid:173)
`ments of the invention, depending on the application, two or
`more similar central data processing systems (30) may be 20
`implemented.
`In the exemplary embodiment of FIG. 1, it is assumed that
`the entity ( or entities) that operate the central patient data
`processing system (30) does not have the authority to re(cid:173)
`identify subject patients that are associated with the de-iden- 25
`tified patient data records (35). In other words, the central
`patient data processing system (30) does not include an
`encryption system that is capable of decrypting Encrypted
`IDs that are stored in or with the de-identified patient data
`records (35). The central data processing system (30) may 30
`optionally maintain a repository of public keys that are asso(cid:173)
`ciated with one or more of the entities that provide the de(cid:173)
`identified patient data records (35) and associated ID map(cid:173)
`pings (34). However, since the central processing system (30)
`is not authorized to re-identify subject patients, the system 35
`(30) does not store (or camiot access) private keys that are
`associated with the one or more entities that provide the
`de-identified patient data records (35).
`The central surveillance system (50) comprises a reposi(cid:173)
`tory of de-identified patient data records (55) and correspond- 40
`ing ID mapping(s) (54), which can be obtained from the
`central data processing system (30) or the different entities
`that operate the data processing systems (20-1-20-n). In the
`exemplary embodiment of FIG. 1, it is assumed that the
`central surveillance system (50) is operated by one or more 45
`entities that are legally authorized or empowered to re-iden(cid:173)
`tify subject patients associated with the de-identified data
`records (55). In this regard, the central surveillance system
`(50) comprises an encryption system (52) and securely stored
`master encryption keys (53) ( one key that can re-identify any 50
`institutions data), which can be used for decrypting de-iden(cid:173)
`tified patient data (Encrypted IDs) associated with the de(cid:173)
`identified patient data records (55) using the ID mappings
`



