throbber
Methods of Information in Medicine
`© E K. Schattauer Verlagsgesellschaft mbH (1996)
`
`K. Pommerening, M. Miller,
`I. Schmidtmann, J. Michaelis
`
`Institut fi.ir Medizinische Statistik
`und Dokumentation
`der Johannes-Gutenberg-U niversitat,
`Mainz, Germany
`
`Pseudonyms for Cancer Registries
`
`Abstract: In order to conform to the rigid German legislation on data priva(cid:173)
`cy and security we developed a new concept of data flow and data storage
`for population-based cancer registries. A special trusted office generates a
`pseudonym for each case by a cryptographic procedure. This office also
`handles the notification of cases and communicates with the reporting
`physicians. It passes pseudonymous records to the registration office for
`permanent storage. The registration office links the records according to the
`pseudonyms. Starting from a requirements analysis we show how to con(cid:173)
`struct the pseudonyms; we then show that they meet the requirements. We
`discuss how the pseudonyms have to be protected by cryptographic and
`organizational means. A pilot study showed that the proposed procedure'
`gives acceptable synonym and homonym error rates. The methods de(cid:173)
`scribed are not restricted to cancer registration and may serve as a model
`for comparable applications in medical informatics.
`
`Keywords: Cancer Registry, Data Protection, Data Encryption, Pseudonyms,
`Record Linkage.
`
`111
`
`(
`
`1. Introduction
`
`Until recently, the rigid German
`legislation on data privacy and data
`security has hindered comprehensive
`cancer registration in major parts of
`Germany. The new European directive
`on data protection [1] may pose further
`difficulties. The basic premise states
`that permanent storage of an individ(cid:173)
`ual's medical data together with his/
`her identification data is allowed on the
`basis of informed consent only. How(cid:173)
`ever, many cancer patients nowadays
`are still not completely informed about
`the nature of their disease and, there(cid:173)
`fore, cannot be asked for informed
`consent to report their data to a cancer
`registry. Hence, it is desirable that
`physicians should have the right to noti(cid:173)
`fy incident cases without obtaining in(cid:173)
`formed consent in order to assure the
`necessary completeness of cancer regis(cid:173)
`tration. Notification without informed
`consent is regarded as violation of an
`individual's constitutional right to data
`
`privacy, unless it is compensated by
`anonymity.
`A cancer registry, however, needs
`identification data for record linkage, to
`identify multiple notifications of the
`same individual, and to record follow(cid:173)
`up information on individuals. On the
`other hand, scientific analysis of the
`registry data is generally performed
`anonymously and does not include any
`reference to individual identification
`data.
`To minimize the violation of data
`privacy we developed a new organiza(cid:173)
`tional and technical concept for cancer
`registries which has been approved by
`data-protection officials and incorpo(cid:173)
`rated into the corresponding German
`federal legislation [2]. In our concept
`the registry is separated into two offices
`with complementary functions. The
`concept makes extensive use of data
`encryption and provides data privacy by
`pseudonymous data storage. This mode
`of data storage allows record linkage by
`matching of pseudonyms and does not
`
`112
`
`1
`
`interfere with the scientific require(cid:173)
`ments of a cancer registry. In certain
`cases a controlled re-identification of
`records might be necessary to obtain
`follow-up information about cases. The { II; '
`concept includes provisions for achiev-
`ing this.
`A pilot study was initiated in 1992 to
`explore the possibilities for running
`a population-based cancer
`registry
`in Rheinland-Pfalz (Rhineland-Palati(cid:173)
`nate) on the basis of this concept [3-5].
`The results show that the proposed
`compromise between research interests
`and privacy issues is practicable and
`sound. Further overviews have been
`given in [6-8]. The concept has also
`been adopted for the pilot phase of
`the cancer registry of Niedersachsen
`(Lower Saxonia) [9].
`The cryptographic concept of pseu(cid:173)
`donymity can be adapted to other situa(cid:173)
`tions where a fundamental conflict
`between the goals of privacy and public
`interest needs to be solved, e.g., to con(cid:173)
`trol the effiency of health care [10, 11].
`
`/ Meth Inform Med 1996; 35: 112-21 I
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 1 of 10
`
`

`

`2. Pseudonyms
`
`Pseudonyms are distinct, unlinkable
`identities that an individual assumes in
`order to hide his or her true identity. In
`information
`technology pseudonyms
`control the matching of data while
`preserving privacy. A pseudonym be(cid:173)
`longs to one person only (henceforth
`called 'the owner') but does not reveal
`the identity of that person. If only the
`owner can uncover the pseudonym, it
`is called 'untraceable'. This concept
`was
`introduced
`into cryptology by
`Chaum [12]; it is useful to protect
`privacy in electronic banking, electronic
`elections, and other electronic trans-
`Aactions. Possible (but not yet realized)
`Vapplications in the medical domain are
`anonymous
`electronic prescriptions
`[10] or the settlement of accounts
`between physicians and insurance com(cid:173)
`panies [11].
`Cancer registries need a distinct kind
`of pseudonyms which must satisfy the
`following requirements:
`1. The registry must be able to re(cid:173)
`cognize multiple notifications of the
`same case (record linkage).
`2. The record linkage procedure should
`minimize synonym and homonym
`errors (see section 6) to yield suffi(cid:173)
`cient data quality.
`3. Collaborating registries should be
`able to match their records.
`4. In certain controlled circumstances
`the uncovering of a pseudonym
`should be possible for obtaining ad(cid:173)
`ditional information, e.g. within the
`scope of case-control studies.
`5. The owner should not be able to
`uncover his own pseudonym.
`This last point derives from the right
`to notify a case without informing the
`patient about his disease. It implies that
`the owner should not generate his
`pseudonym; instead, we need a trusted
`institution that generates the pseudo(cid:173)
`nyms.
`To satisfy the first requirement the
`pseudonym should be generated by an
`algorithmic procedure that can be re(cid:173)
`produced. The prefered method
`is
`hashing [13, par. 6.4]. Since the hash
`values should not reveal any informa(cid:173)
`tion about the original data, we use a
`cryptographic hash function [14, chap.
`14]. Since no one except the trusted
`institution should be able to generate a
`
`Meth. Inform. Med., Vol.35, No.2, 1996
`
`the
`pseudonym for cancer registry,
`procedure should depend on a secret
`key which is kept by the trusted insti(cid:173)
`tution. Such a pseudonym can by no
`means be uncovered; the key-depen(cid:173)
`dent procedure even prevents un(cid:173)
`authorized trial encryption, at least
`from outside.
`This kind of pseudonym does not
`meet requirement 2, the reason is lack
`of fault tolerance: the encryption pro(cid:173)
`cess cannot compensate for slight varia(cid:173)
`tions in the identification data, e.g., mis(cid:173)
`takes in spelling the name. This is not a
`problem when machine-readable iden(cid:173)
`tification data on patient cards can be
`used; but this is not always the case.
`Certain notifying institutions, such as
`pathologists, may not have access to the
`patient card. Old data (from the time
`before
`the
`introduction of patient
`cards) should also be linked. In any
`case, requirement 2 conflicts with com(cid:173)
`plete anonymity; the model has
`to
`provide a balance between these two
`conflicting goals. What we need is a
`concept of error detection and error
`correction for encrypted data. Finding
`an optimal solution is an interesting
`problem for further research. As a first
`solution we divide the 'one-way' part of
`the pseudonym into a set of 'linkage
`data' that satisfy requirements 1, 2
`and 5.
`In order to meet requirement 4 we
`add a second part to the pseudonym.
`This part derives from the identification
`data of the patient by encryption; the
`key is ~nown only to the trusted institu(cid:173)
`tion. For reasons to be discussed later
`we use asymmetric encryption with two
`keys (see section 5.1).
`The reason for requirement 3 is
`that the German Federal States will
`have separate registries. To enable
`anonymous data matching between
`these registries they could use a com(cid:173)
`mon cryptographic key, but this is not
`advisable: A secret loses its value if
`shared among too many parties. There(cid:173)
`fore, for inter-registry linking we pro(cid:173)
`pose a re-encryption of the first part
`of the pseudonym with a temporary
`(one-time) key (for details, see sec(cid:173)
`tion 5.3).
`Our concept of pseudonymity in
`cancer registry needs an organizational
`framework that is described in the next
`section.
`
`3. Organizational structure
`of registry
`
`The cancer registry consists of two
`separate offices at separate locations.
`The first office (trusted office, "Ver(cid:173)
`tra uensstelle") basically serves for the
`notification and generates the pseudo(cid:173)
`nyms. The second office (registration
`office, "Registerstelle") links the re(cid:173)
`cords and stores data permanently.
`
`3.1. Identity Data and
`Epidemiological Data
`
`following we distinguish
`the
`In
`between identity data and epidemiolog(cid:173)
`ical data. Identity data are:
`-
`surname, former surname(s), given
`name(s),
`address,
`- date of birth, date of death,
`date of diagnosis,
`- notifying physician or health-care in(cid:173)
`stitution.
`Epidemiological data are those data
`that are needed in every meaningful
`statistical evaluation of the registry
`data:
`- gender,
`- census code of place of residence,
`- professional group,
`- year of birth, year of death,
`- year of diagnosis,
`- date of notification,
`-
`tumor classification,
`-
`further medical data.
`
`3.2. The Trusted Office
`
`The trusted office accepts incoming
`reports from physicians or hospital(cid:173)
`based cancer registries. These reports
`are checked for completeness and
`plausibility. If necessary, this office ob(cid:173)
`tains additional information from the
`reporting physicians. It codes the re(cid:173)
`ported diseases according to classifi(cid:173)
`cation schemes such as ICD-9 and
`ICD-10. Thereafter, it assigns a pseudo(cid:173)
`nym to the record, and sends the pseu(cid:173)
`donymous record to the registration
`office. After a short period of time,
`when any discrepancies are cleared,
`the trusted office deletes the records
`in its database. Death certificates are
`also sent to the trusted office and
`handled in the same way as notification
`forms.
`
`113
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 2 of 10
`
`

`

`The trusted office is directed by a
`physician and, therefore, is subject to
`professional discretion in addition to
`data-protection laws. It is trusted by all
`other parties, hence the German name
`"Vertrauensstelle". Nevertheless, the
`decryption key -
`the 'private' key of
`the asymmetric encryption procedure,
`henceforth
`called
`'re-identification
`key' - is held in a second trusted institu(cid:173)
`tion outside the cancer registry. There
`are several sensible choices for this in(cid:173)
`stitution; in the following we call it the
`'supervising office'. The separate hand(cid:173)
`ling of the re-identification key empha(cid:173)
`sizes the 'separation of informational
`powers' and makes clear that decryp(cid:173)
`tion ( = re-identification) is an excep(cid:173)
`tional process. Moreover, it gives addi(cid:173)
`tional security in case of a compromised
`encryption key.
`
`3.3. The Registration Office
`
`The registration office receives pseu(cid:173)
`donymous data only. With these data it
`performs record linkage and detects
`duplicate notifications; then it stores the
`pseudonyms and the epidemiological
`data permanently. If the record linkage
`reveals any inconsistencies, these are
`reported back to the trusted office
`which, in turn, may sort out any dis(cid:173)
`crepancies by contacting the reporting
`physicians. In the same way the office
`links a death certificate to an existing
`patient record. Figure 1 illustrates the
`data flow. Only the registration office
`stores records permanently.
`
`3.4. Epidemiological Studies
`
`The pseudonymous records serve for
`routine analyses of the cancer registry
`as well as for epidemiological studies.
`Figure 2 illustrates the procedure for a
`cohort study: if a well-defined cohort
`(e.g., occupationally exposed employ(cid:173)
`ees of a company) is to be analyzed for
`the occurrence of cancer, a sequence
`number is assigned to each individual
`member of the cohort and possibly
`also to non-exposed controls. These se(cid:173)
`quence numbers serve as simple tempo(cid:173)
`rary pseudonyms for the study. A re(cid:173)
`search institute (which could also be the
`registry) obtains a record for each indi(cid:173)
`vidual containing the sequence number
`and the exposure data. A record con-
`
`Physician
`
`Hospital based
`registry
`
`Health care
`institution
`
`Public health
`department
`(death
`certificates)
`
`reoorts
`
`....
`_ __L Trusted Office
`encrypts identification
`data
`
`,(
`checks discrepancies
`, .....
`....
`,
`
`A°
`
`reports
`
`forwards
`data
`
`reports
`implausible
`
`' ( i data
`
`Registration office
`
`stores
`• pseudonyms
`• epidemiological data
`
`Fig. 1 Organiza(cid:173)
`tional structure and
`information flow.
`
`taining the sequence number and per(cid:173)
`sonal identification data is sent to the
`trusted office in parallel. This office
`generates the pseudonym and sends it
`to the registration office, together with
`the sequence number. The registration
`office performs the record linkage and
`generates a record which contains the
`sequence number and the epidemiolog(cid:173)
`ical data stored in the registry. Thereaf(cid:173)
`ter, epidemiological data and exposi(cid:173)
`tion data may be linked for further anal(cid:173)
`ysis by using the sequence number. This
`procedure ensures that for the purpose
`of the study nobody sees which cohort
`members were diseased.
`A corresponding procedure applies
`to case-control studies if only the epi(cid:173)
`demiological data which are kept in the
`registry are needed for such a study.
`If it is necessary to obtain additional
`information from the diseased patients,
`the identification data may be decrypt(cid:173)
`ed using the re-identification key which
`
`I)
`
`(
`
`is kept in the supervising office (see sec(cid:173)
`tion 3.2). Re-identification has to be ap-1
`proved by an ethics committee and is
`done in the supervising office; techni(cid:173)
`cally this could also be realized with a
`portable PC operated by an employee
`of the supervising office. The decrypted
`identification data are then given to the
`trusted office. In some cases the neces(cid:173)
`sary data can be retrieved from the
`notifying institution. If it is necessary to
`contact the patient for an additional
`inquiry, the trusted office has lo obtain
`informed consent from the patient via
`the notifying or
`treating physician
`whose identity is stored as part of the
`(encrypted) identification data of the
`patient (see section 3.1).
`
`4. A Registry Model
`
`ii
`
`Since a strict formalization of the
`procedures of the previous section in
`
`Sequence#
`Identification data
`
`Source of Cohort
`
`Sequence#
`Identification data
`Exposure data
`
`Sequence#
`Exposure data
`
`Sequence#
`Pseudonym
`
`Registration office
`
`Pseudonym
`Epidemiological data
`
`Sequence#
`Epidemiological data
`
`Research Institute
`
`Sequence#
`Exposure data
`Epidemiological data
`
`Fig. 2 Record
`linkage for cohort
`studies.
`
`114
`
`Meth. Inform. Med., Vol. 35, No.2, 1996
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 3 of 10
`
`

`

`the sense of [15] would be too technical
`for this paper, we only give a systematic
`verbal (semi-formal) description and
`the access matrix of the registry model;
`some of the less relevant details are
`given in a slightly simplified form.
`Every assumption of the model
`should be critically examined as
`to
`whether it is sound. For instance, can a
`party do things it is not supposed to do?
`What can two or more parties achieve
`through collaboration? The model will
`not give absolute security but will
`show where additional (organizational)
`means should be provided. The organ(cid:173)
`izational framework has to guarantee
`the model assumptions and fill
`the
`security gaps that the cryptographic
`•
`· procedures leave open.
`In discussing the security of the mod(cid:173)
`el we assume that the cryptographic
`algorithms are secure and that they are
`implemented in a secure way. The first
`assumption is justified by using state(cid:173)
`of-the-art cryptographic
`techniques.
`The second assumption is more prob(cid:173)
`lematic and needs careful organization(cid:173)
`al measures.
`
`4.1. Data and Parties
`
`In the semi-formal description of
`the model we speak of the patient, the
`the
`cooperating
`registry,
`sequence
`number etc., although in reality there
`are several instances of each of these
`classes.
`.l'he knowledge (or data) in our model
`consists of the following parts:
`- The identity data (see 3.1).
`- The pseudonym
`the encrypted identity (see 5.1),
`-
`the linkage data (see 5.3); they
`-
`occur in 'pure hash' format, in
`'linkage' format, in 'storage' for(cid:173)
`mat, and in
`'exchange' formal
`(see Fig. 5).
`- The epidemiological data (see 3.1).
`- The sequence number, a temporary
`pseudonym for a research project as
`in 3.4.
`- The encryption key for asymmetric
`encryption of identification data.
`- The re-identification key for re-iden(cid:173)
`tification of identity data.
`- The linkage data key for generating
`the linkage data (see 5.3).
`- The storage key for permanent stor(cid:173)
`age of the linkage data (see 5.3).
`
`Meth. Inform. Med., Vol.35, No.2, 1996
`
`The exchange key for inter-registry
`record linkage (see 5.4).
`Moreover, we have the identification:
`data of the notifying institution for
`clearing discrepancies, for obtaining
`follow-up information, for reporting
`follow-up information in the case where
`the notifying institution is a clinical
`cancer registry, and for compensating
`the reporting physician for his notifica(cid:173)
`tion. The trusted office also stores other
`administrative data.
`The relevant parties for our model
`are the following; for each of these par(cid:173)
`ties we have to define what knowledge
`it has or transfers and which other par(cid:173)
`ties it trusts:
`- The patient has access to his own
`data, but only via his treating physi(cid:173)
`cian.
`- The notifying institution knows the
`data of its own patients:
`- The treating physician notifies the
`registry of his patients and can be
`asked by the trusted office about
`them.
`institutions
`Other health-care
`which also send notifications arc
`clinical cancer registries, after(cid:173)
`and Public
`care
`institutions,
`Health offices.
`- The trusted office sees all the data
`except the re-identification key and
`the storage key.
`It permanently
`stores only the encryption key and
`the linkage data key.
`- The supervising office keeps the re(cid:173)
`identification key and sees the iden(cid:173)
`tity data of re-identified cases.
`The registration office sees the pseu(cid:173)
`donym, the epidemiological data, the
`sequence number, the storage key,
`and also stores these data perma(cid:173)
`nently (except the sequence num(cid:173)
`ber).
`- The cooperating registry:
`- The trusted office sees the ex-
`change key and the pseudonyms,
`even in pure hash format.
`- The registration office sees the
`linkage data in its own linkage for(cid:173)
`mat. In case of a match it gets the
`full registry data, which is the aim
`of the linking procedure.
`The research institute gets the se(cid:173)
`quence number and the epidemi(cid:173)
`ological data as well as the exposure
`data which are outside the scope of
`the registry model (see 3.4).
`
`The outsider: any person or institu(cid:173)
`tion other than those listed above -
`has access only to communication
`to
`storage
`and perhaps
`paths
`media, if these leave the registration
`office, say, in case of a hardware
`defect.
`the notifying
`The bank where
`physician has his account is ignored.
`Only a very small amount of informa(cid:173)
`tion can be gained by observing the
`financial transfers, e.g., that a certain
`physician has a cancer patient at a cer(cid:173)
`tain time.
`In the following we discuss only the
`parts of the model that are relevant for
`the pseudonymity aspect. For example,
`data on storage and communication
`media should be useless for the outsid(cid:173)
`er; this is achieved by encryption of all
`communication paths and all storage
`media. In particular, the notifying insti(cid:173)
`tutions should communicate with the
`trusted office in a secure manner, i.e.,
`using encrypted data transfer. Hence(cid:173)
`forth, we assume that the outsider can
`gain data access only through collabora(cid:173)
`tion with some other institution, and
`leave the security of communication
`and storage outside the scope of this
`paper.
`
`4.2. The Access Matrix
`
`Figure 3 gives the access matrix of
`the registry model. We have lo show
`that no party can get additional infor(cid:173)
`mation by inferencing, in other words,
`that the access matrix as shown in Fig. 3
`is complete. Since the model involves
`cryptographic keys, i.e., data that imply
`access to other data, the question is
`what subsets of the set of data in the
`access matrix are 'closed' with respect
`to inferencing. This gives only a 'naive'
`proof of security; there are indirect
`ways for getting additional informations
`(see section 4.3).
`We have a single inference that
`needs no key:
`id _,. ld,,,
`where the symbols are taken from Fig. 3
`and the arrow denotes the inference. In
`other words: whoever has the iden(cid:173)
`tification data can derive the linkage
`data in pure hash format, because the
`hash algorithm is publicly known and
`needs no key. The complete list of key(cid:173)
`dependent inferences is as follows:
`
`115
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 4 of 10
`
`

`

`ke: id -->- ps,
`k,e: ps -->- id,
`k1d: ldh <-+ ld1,
`ks,: ld1 <-+ lds,
`kx: [dh <-+ ldx.
`Therefore, the access matrix is com(cid:173)
`plete. The only way to infer the iden(cid:173)
`tification data id is by knowledge of
`ps and k,e, the encrypted identification
`data and
`the
`re-identification key.
`Hence this can only be done by the
`supervising office.
`
`4.3. Indirect Ways
`for Re-identification
`
`-
`
`The goal of the registry model is to
`make unauthorized re-identification as
`difficult as possible. However, what is
`possible, if the access matrix is guaran(cid:173)
`teed by the implementation of the mod(cid:173)
`el? The multitude and nature of indirect
`ways for making inferences about the
`data cannot be completely delineated.
`This is the main difficulty in proving the
`validity of any security model formally.
`Some relevant methods that should be
`considered are:
`trial encryption (guessed plain-text
`attack),
`data matching with outside sources
`[16],
`statistical attacks [16],
`covert channels [17],
`(voluntary or
`social engineering
`forced collaboration).
`The outsider sees none of the data.
`He could gain access only by collabora(cid:173)
`tion with another party.
`the
`The
`research
`institute sees
`epidemiological data and could try an
`unauthorized matching with an external
`data source. This danger is inherent in
`the granularity of the epidemiological
`data and cannot be made smaller by
`any model whatsoever. Therefore, the
`release ofsubsets of epidemiological
`data is restricted according to a specific
`project.
`The cooperating registration office
`only sees the linkage data in its own
`linkage format. It could try a statistical
`attack to find out some frequent names
`or use distribution anomalies of birth
`data. But this will hardly suffice to iden(cid:173)
`tify even a single case other than those
`that this registry has among its own
`records.
`
`116
`
`s = sees
`(and temporarily
`stores)
`k = keeps
`(= permanently
`stores)
`
`d = can derive
`
`-;:;-
`
`i i
`'" E ...:
`Jl
`"
`00 "
`"
`"'
`e .c
`00
`u
`E
`"
`,'.'.;, ~
`~ !'l
`-0 _,
`"'
`" "
`:,J'
`Cl] "'
`-"'
`-"'
`c::
`"
`::l
`::l
`
`1:
`~ ~
`>,
`"
`~
`-"'
`>,
`" ~
`c::
`.9
`-"'
`>,
`;;,
`u
`19
`-"'
`>,
`u
`"'
`"
`s
`u
`-0
`-"'
`" "
`Cl]
`:a
`C:
`00
`" " e
`-£
`"2
`-0
`·u
`.9
`"'
`::l
`c,:
`er,
`u:i
`
`00
`
`Patient
`
`s :
`
`Notifying institution
`
`d
`
`d .
`
`Trun~offire
`
`s: s s: s
`
`k :
`
`k
`
`: s
`
`: k
`
`k
`
`k
`
`k
`
`s2: s2 d
`Supervising office
`Registration office
`k
`....... , .............................................. .
`Cooperating trusted office
`
`Cooperating reg. office
`
`s3
`
`Research institute
`
`Outsider
`
`Fig.3 Access matrix of the registry model. 1 only own patients; 2 only re-identified cases;
`3 in its own linkage format.
`
`The cooperating trusted office sees
`the linkage data even in pure hash
`format and could perform a
`trial
`encryption. However, it is trusted by
`definition.
`The registration office could try
`illegal data matching with the epidemi(cid:173)
`ological data and a statistical attack at
`the linkage data in linkage format.
`The supervising office sees the iden(cid:173)
`tity data of re-identified cases. How(cid:173)
`ever, it is also trusted, and it gets only
`few data.
`The trusted office sees the iden(cid:173)
`tification data and the epidemiological
`data, but it is trusted by definition.
`The notifying institution and the
`patient get no knowledge of data they
`should not know. They know their own
`data only.
`The question what a party can do
`that has unauthorized knowledge of an
`additional piece of data, say, by col(cid:173)
`laborating with another party, can be
`answered by the analysis in section 4.2.
`Covert channels could be exploited, for
`instance, by faking notifications; we
`to
`this
`in section 7.1.
`come back
`Unauthorized matching with epidemi(cid:173)
`ological data is only possible for an
`
`employee of the registration office or of
`the research institute; the trusted office
`that also sees the epidemiological data
`sees the identity anyway.
`
`5. Encryption Procedures
`
`Encryption of identifying data is per(cid:173)
`formed by using different techniques
`which are suited for different purposes.
`A detailed technical description of the
`basic algorithms is given in [14]. As a
`basis to assess the performance of the
`procedures one has to take an expected
`number of 50,000 notifications each
`year for Rheinland-Pfalz. The efficien(cid:173)
`cy of the procedures also suffices for
`larger registries.
`
`5.1. Asymmetric Encryption
`of Identification Data
`
`Asymmetric encryption techniques
`use two different keys for encryption
`and decryption, often called 'public key'
`and 'private key'. This notation, how(cid:173)
`ever does not fit in the present context.
`Therefore we speak of 'encryption key'
`
`Meth. Inform. Med., Vol. 35, No. 2, 1996
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 5 of 10
`
`

`

`and 're-identification key'. Knowledge
`of one of the keys does not help in any
`way to derive the other.
`The identity data of each incoming
`record are encrypted in the trusted of(cid:173)
`fice using the encryption key, see Fig. 4.
`If, under special circumstances (as in
`3.4),
`the decryption of some iden(cid:173)
`tification data becomes necessary, the
`registration office sends the encrypted
`identity data back to the trusted office
`that initiates the re-identification, see
`section 3.4.
`The most suitable asymmetric en(cid:173)
`cryption method, according to the state(cid:173)
`of-the-art, is the RSA algorithm [14, 18,
`19]. It uses the mathematical operation
`aof modular exponentiation, x --+ x' mod
`ll'n; character strings are treated as num(cid:173)
`bers according to their bit patterns and
`decomposed into blocks such that each
`block represents a number smaller than
`n. The modulus n is a very large num(cid:173)
`ber. The exponent e is the encryption
`key. The re-identification key d has a
`size similar to n and the property that
`xed ~ x (mod n). Thus, modular expo(cid:173)
`nentiation with d is the inverse opera(cid:173)
`tion of modular exponentiation with e.
`Deriving e from n and d requires de(cid:173)
`composition of n into its prime factors,
`a task that is mathematically infeasible,
`if n is large enough. Experts recom(cid:173)
`mend a key length of > 700 bits [20].
`Since in a cancer registry data are
`stored for a long time, one should rath(cid:173)
`er choose a key length of > 1,000 bits to
`1- e prepared for possible technological
`progress. For performance reasons, in(cid:173)
`stead of RSA one could use a hybrid
`encryption method [19, section V.1.7]
`such as RSA + DES or PGP (RSA+
`
`IDEA) [14, section 17.9]. This makes
`sense as soon as the data to be encrypt(cid:173)
`ed are longer than a single RSA block.
`DES and IDEA are symmetric encryp(cid:173)
`tion procedures, meaning that encryp(cid:173)
`tion and decryption use the same key.
`The exact description is too complicat(cid:173)
`ed to be given here; we refer to [14, 17].
`They are several orders of magnitude
`faster than all known asymmetric pro(cid:173)
`cedures but do not fit directly to our
`model which relies on asymmetric en(cid:173)
`cryption. Therefore, a hybrid combina(cid:173)
`tion with RSA has to be used.
`If an employee of the registration of(cid:173)
`fice gains knowledge of the encryption
`key, or if an outsider gains knowledge
`of the encryption key and access to the
`registered data, he could perform a trial
`encryption ('chosen plain-text attack')
`with the corresponding identity data.
`In order to prevent this possible misuse,
`each record is complemented by a
`random number before encryption.
`As shown in Fig. 4, this random number
`is kept in the encrypted part of the
`record.
`
`5.2. Key Management
`
`The keys have to be generated in a
`secure manner under special organiza(cid:173)
`tional precautions, e.g., in the supervis(cid:173)
`ing office. The encryption key is kept in
`the trusted office. It has not necessarily
`to be kept secret because the encryption
`is randomized (see section 5.1). There(cid:173)
`fore, there is no need for a cryptograph(cid:173)
`ic token, like a smart card, to hold this
`key. But a smart card is desirable as
`access-control token. It could then also
`hold the key. On the other hand, the
`
`'need to know' principle says that it is
`better keeping the key secret.
`There are two cases where a change
`of the encryption and re-identification
`keys becomes necessary:
`- The actual keys are compromised; at
`least there is suspicion that an unau(cid:173)
`thorized person has got the keys.
`- The progress of cryptanalysis or the
`performance of hardware have ad(cid:173)
`vanced to a great extent such that the
`chosen key length can no longer be
`assumed to be sufficient.
`In these cases a new, more secure
`pair of encryption and re-identification
`keys has to be generated and used. This
`could be done by decrypting and then
`re-encrypting all the stored records in
`the trusted office. However, the Ger(cid:173)
`man BSI ('Bundesamt fur Sicherheit in
`der Informationstechnik', Federal Of(cid:173)
`fice for Security in Information Techno(cid:173)
`logy) proposed a more efficient meth(cid:173)
`od: define the new encryption method
`to be the composition of the old one
`and the "over-encryption" with the new
`key, thereby avoiding even a temporal
`exposition of the plain-text data; the
`future decryption key is the composi(cid:173)
`tion of the old and the new keys. Over(cid:173)
`encryption of the old records can be
`done in the registration office under
`special security precautions. An analo(cid:173)
`gous procedure also applies in case the
`chosen encryption met

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket