throbber
I 1111111111111111 11111 lllll lllll lllll lllll lllll lllll lllll 111111111111111111
`US009614814B2
`
`c12) United States Patent
`Fontecchio
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 9,614,814 B2
`Apr. 4, 2017
`
`(54) SYSTEM AND METHOD FOR CASCADING
`TOKEN GENERATION AND DATA
`DE-IDENTIFICATION
`
`(71) Applicant: Management Science Associates, Inc.,
`Pittsburgh, PA (US)
`
`(58) Field of Classification Search
`CPC ............. H04L 63/0807; H04L 63/0876; H04L
`9/0643; H04L 63/0474; H04L 63/0421;
`(Continued)
`
`(56)
`
`References Cited
`
`(72)
`
`Inventor: Tony Fontecchio, Irwin, PA (US)
`
`U.S. PAIBNT DOCUMENTS
`
`(73) Assignee: Management Science Associates, Inc.,
`Pittsburgh, PA (US)
`
`( *) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by O days.
`
`(21) Appl. No.: 15/046,202
`
`(22) Filed:
`
`Feb. 17, 2016
`
`(65)
`
`Prior Publication Data
`
`US 2016/0182231 Al
`
`Juu. 23, 2016
`
`(63)
`
`(51)
`
`(52)
`
`Related U.S. Application Data
`
`Continuation of application No. 14/291,805, filed on
`May 30, 2014, now Pat. No. 9,292,707.
`(Continued)
`
`(2006.01)
`(2006.01)
`(2006.01)
`(2013.01)
`(2006.01)
`(2006.01)
`
`Int. Cl.
`H04L 9/32
`H04L29/06
`H04L 9106
`G06F 21162
`H04L 9114
`H04L 9/30
`U.S. Cl.
`CPC ..... . H04L 6310421 (2013.01); G06F 2116254
`(2013.01); H04L 910643 (2013.01);
`(Continued)
`
`6,397,224 Bl
`6,732,113 Bl
`
`5/2002 Zubeldia et al.
`5/2004 Ober et al.
`(Continued)
`
`OTHER PUBLICATIONS
`
`Bouzelat et al. NPL 1996-Extraction and Anonymity Protocol of
`Medical file.*
`
`(Continued)
`
`Primary Examiner - Kaveh Abrishamkar
`Assistant Examiner - Tri Tran
`(74) Attorney, Agent, or Firm
`
`The Webb Law Firm
`
`ABSTRACT
`(57)
`A computer-implemented method for de-identifying data by
`creating tokens through a cascading algorithm includes the
`steps of processing at least one record comprising a plurality
`of data elements to identify a subset of data elements
`comprising data identifying at least one individual; gener(cid:173)
`ating, with at least one processor, a first hash by hashing at
`least one first data element with at least one second data
`element of the subset of data elements; generating, with at
`least one processor, a second hash by hashing the first hash
`with at least one third data element of the subset of data
`elements; creating at least one token based at least partially
`on the second hash or a subsequent hash derived from the
`second hash, wherein the token identifies the at least one
`individual; and associating at least a portion of a remainder
`of the data elements with the at least one token.
`
`20 Claims, 6 Drawing Sheets
`
`I .
`'--~i:~__.r ........ .__w_a_ptie_"__,
`-·-· -· -- -· -· -· ---. - ,- . ---· -· -· -· -. -· -
`
`Otfier dtita
`
`Client
`
`Doto Suppl!er
`
`m
`
`I
`
`,,,_Q
`
`113
`
`"'
`................ -. -........... -:-- ....................... -
`...
`'"'"'
`I Mctchlio,engine I
`I Token~=ing I
`
`DtrtcProcesslnij
`
`lOI
`
`_
`
`engmeJ.ll!
`
`r
`
`1000
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 1 of 14
`
`

`

`US 9,614,814 B2
`Page 2
`
`Related U.S. Application Data
`(60) Provisional application No. 61/830,345, filed on Jun.
`3, 2013.
`(52) U.S. CI.
`CPC .................. H04L 9/14 (2013.01); H04L 9/30
`(2013.01); H04L 9/3234 (2013.01); H04L
`9/3239 (2013.01); H04L 63/0442 (2013.01);
`H04L 63/0807 (2013.01); H04L 63/0876
`(2013.01); H04L 2209/42 (2013.01); H04L
`2463/062 (2013.01)
`
`( 58) Field of Classification Search
`CPC . G06F 19/322; G06F 21/6245; G06F 21/6254
`See application file for complete search history.
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4/2007 Paltenghe et al.
`7,200,578 B2
`7,280,663 Bl * 10/2007 Golomb ................ H04L 9/3066
`380/255
`
`7,376,677 B2
`7,865,376 B2
`8,473,452 Bl
`8,930,404 B2
`2008/0147554 Al•
`
`5/2008 Ober et al.
`1/2011 Ober et al.
`6/2013 Ober et al.
`1/2015 Ober et al.
`6/2008 Stevens ................. G06F 19/322
`705/51
`
`OTHER PUBLICATIONS
`
`Bouzelat et al., Extraction and Anonymity Protocol of Medical File,
`Department of Medical Informatics (Pr. L. Dusserre ), Teaching
`Hospital of Dijon France, AMIA, Inc., 1996, pp. 323-327.
`Fraser et al., Tools for De-Identification of Personal Health Infor(cid:173)
`mation, Prepared for the Pan Canadian Health Information Privacy
`(HIP) Group, Sep. 2009, 40 pages.
`Kunitz et al., Record Linkage Methods Applied to Health Related
`Administrative Data Sets Containing Racial and Ethnic Descriptors,
`Record Linkage Techniques, 1997, pp. 295-304.
`Scheuren, Linking Health Records: Human Rights Concerns,
`Record Linkage Techniques, 1997, pp. 404-426.
`
`* cited by examiner
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 2 of 14
`
`

`

`U.S. Patent
`
`Apr. 4, 2017
`
`Sheet 1 of 6
`
`US 9,614,814 B2
`
`Client
`106
`
`........•
`
`Other data
`suppliers
`
`-------------------~------------------
`
`1
`
`Public key
`
`Doto Supplier
`103
`
`--------------------------------------
`.
`___ ,_ ... ;_..
`///
`115
`
`113
`"--... -~ .............. ,
`
`•
`
`~ !
`·-·-----------------~-----------------
`....
`Data Processing
`Entity
`108
`Matching engine
`109
`
`De-ID data...,. ___ _
`111
`
`Token processing
`engine 110
`
`L •
`
`-
`
`•
`
`-
`
`•
`
`-
`
`-
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`-• -
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`-
`
`•
`
`FIG. 1
`
`1000
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 3 of 14
`
`

`

`U.S. Patent
`
`Apr. 4, 2017
`
`Sheet 2 of 6
`
`US 9,614,814 B2
`
`221
`
`Initial key
`
`201
`
`first
`hash
`, .....
`211
`
`,,/
`
`I
`
`203
`
`204
`
`second J third
`
`hash
`
`_,./
`
`f
`213
`
`hash
`/
`/_,,.
`
`215
`FIG. 2A
`
`-.~--" \
`
`200
`
`219
`
`✓
`
`/
`
`/
`I
`
`token
`
`,-----· - - , ,_ __ ......., ,.,.,/
`
`221
`/
`,-/
`,
`
`223
`
`Initial key
`
`hash key
`
`223
`
`hash key
`
`206
`
`201
`
`203
`
`/---11>
`
`(
`
`200
`
`first
`hosh
`
`second
`hash
`
`/
`I
`211
`
`_,.,....,,
`I
`213
`
`. . . ... ... ...
`..
`' .. ...
`
`Nth
`hash
`
`"-·········-1
`217
`
`FIG. 2B
`
`token
`
`219
`,-/
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 4 of 14
`
`

`

`U.S. Patent
`
`Apr. 4, 2017
`
`Sheet 3 of 6
`
`US 9,614,814 B2
`
`219
`_j
`
`--G
`
`Hosh
`function
`
`Hash
`function
`
`Hash
`function
`
`/
`220
`
`,,,-/
`220
`
`key
`
`FIG. 2C
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 5 of 14
`
`

`

`U.S. Patent
`
`Apr. 4, 2017
`
`Sheet 4 of 6
`
`US 9,614,814 B2
`
`301
`
`303
`
`305
`\
`:
`'--, .. 1
`
`l,,:
`
`identify seed from
`configuration file
`
`hosh o first data element of
`record with seed
`
`hash a next data element with
`previous hash result
`
`308
`' -...... ,i l create token based on hash sequence
`309
`
`generate transient key
`1.,1nique to session
`
`encrypt token with transient key
`
`encrypt the encrypted token and
`transient key with public key
`associated with data processing
`entity
`
`FIG. 3A
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 6 of 14
`
`

`

`U.S. Patent
`
`Apr. 4, 2017
`
`Sheet 5 of 6
`
`US 9,614,814 B2
`
`315
`............................................................................................. ,
`'----,,
`
`\
`
`receive encrypted output file from
`data supplier
`
`317
`'----
`
`decrypt encrypted output file with
`private key to obtain a transient
`key and an encrypted token
`I
`319
`'"'·--~ , decrypt the encrypted token with
`j
`the transient key
`,
`l _____________ _.
`
`l
`
`, ••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••
`
`325
`
`hash the token with a seed unique
`to the client or data supplier
`
`327
`\._
`-- match the token to a record for an
`individual from a record database
`
`FIG. 3B
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 7 of 14
`
`

`

`U.S. Patent
`
`Apr. 4, 2017
`
`Sheet 6 of 6
`
`US 9,614,814 B2
`
`\
`$wt
`(
`''--. ___ _ j
`
`r····························1
`
`Line
`
`/
`
`.····················"
`Command W. 6.1 • Read and Validate
`i
`
`··················•
`!
`i
`
`Command Line
`
`/
`/
`
`/
`Configutation
`FJle
`r·····························
`··········-·········•
`t······1
`••••••·•••·••••••··
`!
`\ ............................ \
`; i 6.2 • Read and Validate
`I
`I
`Log File
`(
`i->i
`i······+I
`_
`... (---j '----~'-~----'
`
`'----------'
`
`ll••I•• lfritit~I••••••••\JI
`
`/t"
`, fi.4.2~~$~~,.o • ~---l ·· ····· ... · ···. ···• 4
`"'·
`/
`"(S ... c.•.•••.••,••,.™•.of•••.···•••1•.i·n·,••.·.P. t.:·· . .,, ')--Yes~.•·
`•··. •··••··.••· Log • ........... ,
`. .~ogfffe
`,
`' ' ' / , /
`.. • .. · . ... : . . . • ... ... :.;
`\~··~----~
`:
`No
`.................. t. ............... .
`
`"4"T..,...I I
`
`.::· .. ·.>\
`/"::·.
`\'2nd ....... '
`\..
`'·=··=====· /
`
`FIG. 4
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 8 of 14
`
`

`

`US 9,614,814 B2
`
`1
`SYSTEM AND METHOD FOR CASCADING
`TOKEN GENERATION AND DATA
`DE-IDENTIFICATION
`
`CROSS REFERENCE TO RELATED
`APPLICATIONS
`
`This application is a continuation of U.S. patent applica(cid:173)
`tion Ser. No. 14/291,805, filed May 30, 2014, which claimed
`the benefit of U.S. Provisional Application No. 61/830,345,
`filed on Jun. 3, 2013, the entire disclosures of each of which
`are hereby incorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`Field of the Invention
`The present invention relates generally to data de-identi(cid:173)
`fication and, in particular, a system and method for de(cid:173)
`identifying data using cascading token generation.
`Description of Related Art
`For decades, data including personally-identifying infor(cid:173)
`mation has been de-identified through the creation of tokens
`that uniquely identify an individual. This technology has
`been used in connection with consumer package goods data,
`television data, subscriber data, healthcare data, and the like.
`Traditionally, methods for creating tokens for a specific
`record associated with an individual involved concatenating
`selected data elements into a string, and then encrypting that
`string to form a token. However, there are scenarios in which
`concatenated substrings will yield less than optimal results.
`Advances in computing power now allow for token genera(cid:173)
`tion to be complex, even across large volumes of data,
`providing for enhanced data security. Moreover, once a
`token is created, additional security measures are desirable
`to prevent reverse-engineering through statistical analysis
`attacks.
`By law, Protected Healthcare Information (PHI) cannot be
`freely disseminated. However, if properly de-identified to
`the point where the risk is minimal that an individual could
`be re-identified, the PHI can be disclosed by a covered entity
`or an entity in legal possession of PHI.
`
`SUMMARY OF THE INVENTION
`
`Generally, it is an object of the present invention to
`provide a system and method for de-identifying data that
`overcomes some or all of the above-described deficiencies
`of the prior art.
`According to a preferred embodiment, provided is a
`computer-implemented method for de-identifying data by
`creating tokens through a cascading algorithm, comprising:
`processing at least one record comprising a plurality of data
`elements to identify a subset of data elements comprising
`data identifying at least one individual; generating, with at
`least one processor, a first hash by hashing at least one first
`data element with at least one second data element of the
`subset of data elements; generating, with at least one pro(cid:173)
`cessor, a second hash by hashing the first hash with at least
`one third data element of the subset of data elements;
`creating at least one token based at least partially on the 60
`second hash or a subsequent hash derived from the second
`hash, wherein the token identifies the at least one individual;
`and associating at least a portion of a remainder of the data
`elements of the plurality of data elements with the at least
`one token.
`According to another preferred embodiment, provided is
`a system for de-identifying data, comprising: a data supplier
`
`2
`computer compnsmg at least one processor and a de(cid:173)
`identification engine, the de-identification engine configured
`to: (i) process a data record comprising a plurality of data
`elements, wherein a subset of data elements of the plurality
`of data elements comprises identifying information; (ii)
`generate a token based at least partially on a series of hashes
`of individual data elements of the subset of data elements,
`wherein a plurality of hashes in the series of hashes are
`based at least partially on a previous hash in the series of
`10 hashes; (iii) encrypt at least the token to generate an
`encrypted token; (b) a data processing entity computer
`remote from the data supplier computer, the data processing
`computer comprising at least one processor configured to: (i)
`receive the encrypted token and unencrypted data elements
`15 from the data supplier computer; (ii) decrypt the encrypted
`token, resulting in the token; (iii) link the token and unen(cid:173)
`crypted data elements with at least one other record based at
`least partially on the token.
`According to a further preferred embodiment, provided is
`20 a de-identification system, comprising: a de-identification
`subsystem comprising at
`least one computer-readable
`medium containing program instructions which, when
`executed by at least one remote processor at a data supplier,
`causes the at least one remote processor to: create a token
`25 from at least one record, the token created by performing at
`least one hash operation on at least one data element of at
`least one record, wherein the at least one data element
`comprises personally-identifying information; encrypt the
`token with a randomly-generated encryption key, forming an
`30 encrypted token; and encrypt the encrypted token and the
`randomly-generated encryption key with a public key, form(cid:173)
`ing encrypted data; and a record processing subsystem
`comprising a server and at least one computer-readable
`medium containing program instructions which, when
`35 executed by at least one processor, causes the at least one
`processor to: receive the encrypted data; decrypt the
`encrypted data with a private key corresponding to the
`public key, resulting in the randomly-generated encryption
`key and the encrypted token; and decrypt the encrypted
`40 token with the randomly-generated encryption key.
`According to another preferred embodiment, provided is
`a de-identification engine for de-identifying at least one
`record comprising a plurality of data elements, wherein a
`subset of the plurality of data elements comprise personally-
`45 identifying data, the de-identification engine comprising at
`least one computer-readable medium containing program
`instructions that, when executed by at least one processor of
`at least one computer, cause the at least one computer to: (a)
`generate an initial hash by hashing at least one key and a first
`50 data element of the subset of data elements; (b) generate a
`next hash by hashing a next data element of the subset of
`data elements with a previous hash value generated by
`hashing at least a previous data element of the subset of data
`elements; and ( c) repeat step (b) for all data elements of the
`55 subset of data elements, resulting in a final hash value.
`These and other features and characteristics of the present
`invention, as well as the methods of operation and functions
`of the related elements of structures and the combination of
`parts and economies of manufacture, will become more
`apparent upon consideration of the following description
`and the appended claims with reference to the accompany-
`ing drawings, all of which form a part of this specification,
`wherein like reference numerals designate corresponding
`parts in the various figures. It is to be expressly understood,
`65 however, that the drawings are for the purpose of illustration
`and description only and are not intended as a definition of
`the limits of the invention. As used in the specification and
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 9 of 14
`
`

`

`US 9,614,814 B2
`
`3
`the claims, the singular form of"a", "an", and "the" include
`plural referents unless the context clearly dictates otherwise.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 is a schematic diagram for a system for de(cid:173)
`identifying data according to the principles of the present
`invention;
`FIGS. 2A-2C are schematic diagrams for a cascading
`hash process for de-identifying data according to the prin(cid:173)
`ciples of the present invention;
`FIGS. 3A and 3B are flow diagrams for a system and
`method for de-identifying data according to the principles of
`the present invention; and
`FIG. 4 is a further flow diagram for a system and method
`for de-identifying data according to the principles of the
`present invention.
`
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`
`For purposes of the description hereinafter, it is to be
`understood that the invention may assume various alterna(cid:173)
`tive variations and step sequences, except where expressly
`specified to the contrary. It is also to be understood that the
`specific devices and processes illustrated in the attached
`drawings, and described in the following specification, are
`simply exemplary embodiments of the invention. Hence,
`specific dimensions and other physical characteristics
`related to the embodiments disclosed herein are not to be
`considered as limiting.
`As used herein, the terms "communication" and "com(cid:173)
`municate" refer to the receipt, transmission, or transfer of
`one or more signals, messages, commands, or other type of
`data. For one unit or device to be in communication with
`another unit or device means that the one unit or device is
`able to receive data from and/or transmit data to the other
`unit or device. A communication may use a direct or indirect
`connection, and may be wired and/or wireless in nature.
`Additionally, two units or devices may be in communication
`with each other even though the data transmitted may be
`modified, processed, routed, etc., between the first and
`second unit or device. It will be appreciated that numerous
`other arrangements are possible.
`In a preferred and non-limiting embodiment of the present
`invention, provided is a system for de-identifying data that
`includes a de-identification engine configured to hash per(cid:173)
`sonally identifying data within a data record, while at the
`same time passing through non-identifying data ( e.g., a refill
`number and/or the like). In this way, the system has the
`ability to perform data cleansing operations (e.g., justifica(cid:173)
`tion, padding, range checking, character set validation, date
`cleaning, zoned decimal conversion, and/ or the like), data
`derivation (e.g., ages, combinations of fields, and/or the
`like), and/or data translation ( e.g., state abbreviations to state
`names, or the like). Various other formatting and normal(cid:173)
`ization functions are also possible.
`To create a unique identifier for an individual (i.e., a
`patient, a consumer, or the like), the de-identification engine
`of the present invention may support configurable standard(cid:173)
`ization and hashing of fields. By using multiple fields to
`create a unique identifier, the system of the present invention
`ensures that statistical analysis or other reverse-engineering
`techniques cannot be performed on the hashed values to
`determine a person's identity. For example, applying a
`hashing algorithm ( e.g., SHA-3 or other hashing algorithms)
`to the first name "John" will produce a secure token that
`
`4
`cannot be reversed back to the name "John," but potentially
`allows for a statistical analysis operation to be performed to
`determine that the most frequent first name hash token
`represents the name "John." A similar analysis could be
`performed on other non-unique fields as well. For that
`reason, multiple fields are used to create a distinct ( or
`sufficiently distinct) de-identification value. For example,
`using a first name, last name, date of birth, and zip code may
`be considered sufficiently distinct to prevent statistical
`10 cracking.
`Referring now to FIG. 1, a system 1000 for de-identifying
`data is shown according to a preferred and non-limiting
`embodiment. A data supplier 103 is in communication with
`a raw data storage unit 104, which may include one or more
`15 data storage devices. The raw data storage unit 104 may
`comprise one or more data structures, such as tables, data(cid:173)
`bases, and/or the like, including records personally identi(cid:173)
`fying individuals. The data supplier 103 includes one or
`more computers, such as servers, user terminals, processors,
`20 and/or the like, and a de-identification engine 107 that
`executes on one or more of the data supplier 103 computers.
`The de-identification engine 107 may include compiled
`program instructions capable of being executed on a data
`supplier 103 computer and configured to process data
`25 records from the raw data storage unit 104. The data supplier
`103 is also given access to a configuration file 105, a
`signature file, and a public key for use in the de-identifica(cid:173)
`tion process. The data supplier 103 may be one of many data
`suppliers associated with a particular client 106, and mul-
`30 tiple clients may each be associated with multiple data
`suppliers. It will be appreciated that other arrangements are
`possible.
`With continued reference to FIG. 1, a data processing
`entity 108 is shown in communication with the data supplier
`35 103 through a network environment 112, such as the Internet
`or any direct or indirect network connection. The data
`processing entity 108 is in communication with a de(cid:173)
`identification data storage unit 111 and includes one or more
`computers capable of executing a matching engine 109 and
`40 a token processing engine 110. The matching engine 109
`and/or token processing engine 110 may include compiled
`program instructions capable of being executed on a data
`processing entity 108 computer. The token processing
`engine 110 may be configured to receive output from the
`45 data supplier 103 and, as explained further below, perform
`additional operations on the token or encrypted output such
`as, but not limited to, decrypting encrypted output data and
`hashing the token generated by the de-identification engine
`107 with a seed/key unique to the client 106 and/or data
`50 supplier 103 to produce a new token.
`Still referring to FIG. 1, the matching engine 109 may be
`configured to match tokens among de-identified records,
`received from the data supplier 103, with other records in the
`de-identification data storage unit 111. For example, the
`55 matching engine 109 may use the tokens generated or output
`by the de-identification engine 107, or the new tokens
`generated or output by the token processing engine, to match
`the records received with a unique individual, and to link the
`record to that individual. The de-identification data storage
`60 unit 111 may include one or more data storage devices that
`comprise one or more data structures such as tables, data(cid:173)
`bases, and/or the like. The system 1000 is distributed such
`that the data supplier is in a location 115 remote from a
`location 113 of the data processing entity 108. In this way,
`65 the raw data can be de-identified.
`In a preferred and non-limiting embodiment, a cascading
`hash process is used to generate a de-identified token. A
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 10 of 14
`
`

`

`US 9,614,814 B2
`
`5
`cascading hash process may increase token security against
`attacks from crackers and hackers. Instead of concatenating
`multiple fields, adding a secret seed, and then hashing to
`form a token, the cascading hash process forms a token
`through a series of hashes involving each individual field. 5
`This polyphasic operation works by hashing data fields or
`elements of a record individually in a chain, such that each
`subsequent hash depends upon a previous hash result.
`Referring now to FIGS. 2A-2C, a cascading hash process
`is depicted according to a preferred and non-limiting 10
`embodiment. A record 200 containing a number of data
`fields or elements 201, 203, 204, 205 that include identifying
`data is provided. Once these data fields or elements are
`identified, generally with business rules customized to a
`particular data supplier, the token creation process is started. 15
`Referring specifically to FIG. 2A, an initial key 221 is
`hashed with a first data field 201 to produce a first hash 211.
`The first hash 211 is then hashed with a second data field 203
`to produce a second hash 213. The second hash 213 is then
`hashed with a third data field 204 to produce a third hash
`215. This process may continue for as many data fields as
`required, resulting in a hashed token 219 derived directly
`from the last hashed data field and, as a result of the cascade,
`derived indirectly from the first hash 211, second hash 213,
`and any intervening hashes. In the example shown in FIG.
`2A, the fourth data field 205 is hashed with the third hash
`215 to produce the token 219.
`With continued reference to FIGS. 2A-2C, it will be
`appreciated that the hash function 220 (shown in FIG. 2C)
`may include other inputs, keys, and/or the like, in addition
`to a previous hash result. For example, in the non-limiting
`example shown in FIG. 2B, an initial key 221 is used to hash
`the first data field 201, and subsequent data fields 203, 206
`are hashed with a previous hashed value as well as a hash
`key 223. In this example, the second data field 203 is hashed
`with the first hash 211 and the hash key 223 as inputs to a
`hashing function that results in the second hash 213.
`Depending on the number of data fields used, generally as
`defined by the business rules for a particular data source, the
`process may be repeated. As shown in FIG. 2B, the Nth hash
`217 is derived from the sequence of hashes preceding it and
`is used, along with hash key 223, to hash the Nth+ 1 data field
`206 to create the token 219.
`Due to the nature of the cascading process, the final token
`219 produced is unique for the data fields 201, 203, 206 but,
`unlike traditional concatenation-based methods,
`is not
`merely a hashed version of all of the data fields combined.
`Rather, with the cascading token generation process, a
`nested or cascaded token is produced that can only be
`derived from the series of hashes and data fields in a record
`200. In the non-limiting embodiment shown in FIG. 2B, for
`example, an initial key 221 may differ from a hash key 223
`used in subsequent iterations of the sequence. However, it
`will be appreciated that the hash key 223 and the initial key
`221 may be the same and, in some embodiments, further
`hash keys 223 may not be used after the initial key 221.
`Those skilled in the art will appreciate that various other
`arrangements are possible.
`Referring to FIG. 2C, a cascading hash process is shown
`according to a further preferred and non-limiting embodi(cid:173)
`ment. The hash function 220, not separately shown in FIGS.
`2A-2B, is depicted in FIG. 2C as receiving inputs and
`outputting results. The hash function 220 takes, as inputs, a
`key 223 and a first data field 201. The output of the hash
`function 220 in this example is input back into itself (i.e.,
`recursively) along with a second data field 203. Similarly,
`the next output of the hash function 220 is input back into
`
`6
`the hash function 220 again, along with a third data field
`204. This is repeated as many times as necessary, depending
`on how many data fields 201, 203, 204, 205 will be used in
`creating the token 219. The final hash results in the token
`219. It will be appreciated that the key 223, or a different
`key, may also be used as inputs to subsequent iterations of
`the hash function 220.
`Referring to FIGS. 1 and 2C, in a preferred and non(cid:173)
`limiting embodiment, a SHA-3 algorithm is used as the hash
`function 220 to create tokens 219. However, through the use
`of the de-identification engine 107 and configuration file
`105, new and/or different algorithms and methodologies
`may be easily implemented. To increase security and data
`quality, the SHA-3 hashing algorithm may be configured to
`return spaces (fixed output) or null (delimited output)
`instead of a hash value if any of the component fields are not
`populated or contain all spaces.
`In a preferred and non-limiting embodiment, and with
`reference to FIG. 1, it is envisioned that many clients 106
`20 may be licensed to use the de-identification engine 107, and
`that each client may have a number of data suppliers 103.
`Therefore, it is desirable to provide unique tokens for each
`of the clients 106 or, in other embodiments, each of the data
`suppliers 103. This uniqueness may be provided, at least in
`25 part, through the use of the configuration file 105. In
`particular, the configuration file 105 may include a client tag
`(e.g., a client code or client key) to use in the token creation
`process. The client tag may be combined, incorporated,
`XORed, or used as an input to a hashing function for each
`30 data field. Alternatively, the client tag may be used as the
`initial input ( e.g., initial key) for the first hash operation, and
`subsequent hash operations may use the previous hash
`result.
`Through the use of client-specific tags, data records
`35 processed for one client 106 will not produce the same
`tokens as identical data records processed for a different
`client. In a preferred and non-limiting embodiment, the
`client name is stored in the configuration file 105 and, based
`on the client name, the client tag is generated or created. In
`40 this way, the actual value being used as the client tag will not
`be discernable to the data supplier 103. However, it will be
`appreciated that the client name itself may be used as a key
`and that, in other embodiments, the client tag may be known
`by the data supplier 103. Other arrangements and configu-
`45 rations are possible.
`In a preferred and non-limiting embodiment, and with
`continued reference to FIG. 1, once the de-identification
`engine 107 at the data supplier 103 creates a token, the token
`(as well as the remainder of the record) must then be
`50 transmitted to the data processing entity 108 as one or more
`output files. To do so, further layers of encryption (e.g.,
`token masking) may be provided. For example, the data
`supplier 103 may generate a transient encryption key and
`initialization vector unique to the session. The transient
`55 encryption key and initialization vector may be generated
`randomly in any number of ways. In a non-limiting embodi(cid:173)
`ment, the transient encryption key may include a 128 bit key,
`and the encryption algorithm for the transient layer of
`encryption may include an Advanced Encryption Standard
`60 (AES) algorithm. However, various other arrangements,
`algorithms, and configurations are possible.
`After encrypting the token with the transient encryption
`key, the encrypted token and the transient key may be
`encrypted together using, for example, a public key of the
`65 data processing entity 108 that corresponds to a private key
`held secretly by the data processing entity 108. In some
`non-limiting embodiments, the generated transient encryp-
`
`DATAVANT, INC. EXHIBIT NO. 1001
`Page 11 of 14
`
`

`

`US 9,614,814 B2
`
`7
`tion key and initialization vector may be stored in a de(cid:173)
`identification log file after being encrypted using the public
`key. Un-hashed output fields may remain unchanged so that
`the data supplier 103 is able to verify the content and verify
`that no personally identifiable data is being sent in the output
`files. Yet another layer of data security may be applied by
`transmitting the output files from the data supplier 103 to the
`data processing entity 108 over a secure transmission pro(cid:173)
`tocol such as SFTP or HTTPS, as examples.
`Once the public key is used to encrypt the encrypted
`token, the transient key, and the initialization vector, the
`encrypted data is transmitted to the data processing entity
`108 as one or more output files. Once received, the data
`processing entity 108 (and particularly the token processing
`engine 110 of the data processing entity 108) uses the private
`key corresponding to the public key used by the data
`supplier 103 to decrypt the last layer of encryption and to
`obtain the encrypted token, the transient key, and the ini(cid:173)
`tialization vector. The transient key is used to decrypt the
`encrypted token, resulting in the original token that resulted 20
`from the cascading hash process. Once the token 219 is
`obtained, the data processing entity 108 may perform an
`additional hash operation on the token 219 with a seed/key
`that is unique to either the client 106 or the data supplier 103
`of the client 106. In some non-limiting embodiments, the 25
`data processing entity 108 may always perform the addi(cid:173)
`tional ha

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket