`© E K. Schattauer Verlagsgesellschaft mbH (1996)
`
`K. Pommerening, M. Miller,
`I. Schmidtmann, J. Michaelis
`
`Institut fi.ir Medizinische Statistik
`und Dokumentation
`der Johannes-Gutenberg-U niversitat,
`Mainz, Germany
`
`Pseudonyms for Cancer Registries
`
`Abstract: In order to conform to the rigid German legislation on data priva(cid:173)
`cy and security we developed a new concept of data flow and data storage
`for population-based cancer registries. A special trusted office generates a
`pseudonym for each case by a cryptographic procedure. This office also
`handles the notification of cases and communicates with the reporting
`physicians. It passes pseudonymous records to the registration office for
`permanent storage. The registration office links the records according to the
`pseudonyms. Starting from a requirements analysis we show how to con(cid:173)
`struct the pseudonyms; we then show that they meet the requirements. We
`discuss how the pseudonyms have to be protected by cryptographic and
`organizational means. A pilot study showed that the proposed procedure'
`gives acceptable synonym and homonym error rates. The methods de(cid:173)
`scribed are not restricted to cancer registration and may serve as a model
`for comparable applications in medical informatics.
`
`Keywords: Cancer Registry, Data Protection, Data Encryption, Pseudonyms,
`Record Linkage.
`
`111
`
`(
`
`1. Introduction
`
`Until recently, the rigid German
`legislation on data privacy and data
`security has hindered comprehensive
`cancer registration in major parts of
`Germany. The new European directive
`on data protection [1] may pose further
`difficulties. The basic premise states
`that permanent storage of an individ(cid:173)
`ual's medical data together with his/
`her identification data is allowed on the
`basis of informed consent only. How(cid:173)
`ever, many cancer patients nowadays
`are still not completely informed about
`the nature of their disease and, there(cid:173)
`fore, cannot be asked for informed
`consent to report their data to a cancer
`registry. Hence, it is desirable that
`physicians should have the right to noti(cid:173)
`fy incident cases without obtaining in(cid:173)
`formed consent in order to assure the
`necessary completeness of cancer regis(cid:173)
`tration. Notification without informed
`consent is regarded as violation of an
`individual's constitutional right to data
`
`privacy, unless it is compensated by
`anonymity.
`A cancer registry, however, needs
`identification data for record linkage, to
`identify multiple notifications of the
`same individual, and to record follow(cid:173)
`up information on individuals. On the
`other hand, scientific analysis of the
`registry data is generally performed
`anonymously and does not include any
`reference to individual identification
`data.
`To minimize the violation of data
`privacy we developed a new organiza(cid:173)
`tional and technical concept for cancer
`registries which has been approved by
`data-protection officials and incorpo(cid:173)
`rated into the corresponding German
`federal legislation [2]. In our concept
`the registry is separated into two offices
`with complementary functions. The
`concept makes extensive use of data
`encryption and provides data privacy by
`pseudonymous data storage. This mode
`of data storage allows record linkage by
`matching of pseudonyms and does not
`
`112
`
`1
`
`interfere with the scientific require(cid:173)
`ments of a cancer registry. In certain
`cases a controlled re-identification of
`records might be necessary to obtain
`follow-up information about cases. The { II; '
`concept includes provisions for achiev-
`ing this.
`A pilot study was initiated in 1992 to
`explore the possibilities for running
`a population-based cancer
`registry
`in Rheinland-Pfalz (Rhineland-Palati(cid:173)
`nate) on the basis of this concept [3-5].
`The results show that the proposed
`compromise between research interests
`and privacy issues is practicable and
`sound. Further overviews have been
`given in [6-8]. The concept has also
`been adopted for the pilot phase of
`the cancer registry of Niedersachsen
`(Lower Saxonia) [9].
`The cryptographic concept of pseu(cid:173)
`donymity can be adapted to other situa(cid:173)
`tions where a fundamental conflict
`between the goals of privacy and public
`interest needs to be solved, e.g., to con(cid:173)
`trol the effiency of health care [10, 11].
`
`/ Meth Inform Med 1996; 35: 112-21 I
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 1 of 10
`
`
`
`2. Pseudonyms
`
`Pseudonyms are distinct, unlinkable
`identities that an individual assumes in
`order to hide his or her true identity. In
`information
`technology pseudonyms
`control the matching of data while
`preserving privacy. A pseudonym be(cid:173)
`longs to one person only (henceforth
`called 'the owner') but does not reveal
`the identity of that person. If only the
`owner can uncover the pseudonym, it
`is called 'untraceable'. This concept
`was
`introduced
`into cryptology by
`Chaum [12]; it is useful to protect
`privacy in electronic banking, electronic
`elections, and other electronic trans-
`Aactions. Possible (but not yet realized)
`Vapplications in the medical domain are
`anonymous
`electronic prescriptions
`[10] or the settlement of accounts
`between physicians and insurance com(cid:173)
`panies [11].
`Cancer registries need a distinct kind
`of pseudonyms which must satisfy the
`following requirements:
`1. The registry must be able to re(cid:173)
`cognize multiple notifications of the
`same case (record linkage).
`2. The record linkage procedure should
`minimize synonym and homonym
`errors (see section 6) to yield suffi(cid:173)
`cient data quality.
`3. Collaborating registries should be
`able to match their records.
`4. In certain controlled circumstances
`the uncovering of a pseudonym
`should be possible for obtaining ad(cid:173)
`ditional information, e.g. within the
`scope of case-control studies.
`5. The owner should not be able to
`uncover his own pseudonym.
`This last point derives from the right
`to notify a case without informing the
`patient about his disease. It implies that
`the owner should not generate his
`pseudonym; instead, we need a trusted
`institution that generates the pseudo(cid:173)
`nyms.
`To satisfy the first requirement the
`pseudonym should be generated by an
`algorithmic procedure that can be re(cid:173)
`produced. The prefered method
`is
`hashing [13, par. 6.4]. Since the hash
`values should not reveal any informa(cid:173)
`tion about the original data, we use a
`cryptographic hash function [14, chap.
`14]. Since no one except the trusted
`institution should be able to generate a
`
`Meth. Inform. Med., Vol.35, No.2, 1996
`
`the
`pseudonym for cancer registry,
`procedure should depend on a secret
`key which is kept by the trusted insti(cid:173)
`tution. Such a pseudonym can by no
`means be uncovered; the key-depen(cid:173)
`dent procedure even prevents un(cid:173)
`authorized trial encryption, at least
`from outside.
`This kind of pseudonym does not
`meet requirement 2, the reason is lack
`of fault tolerance: the encryption pro(cid:173)
`cess cannot compensate for slight varia(cid:173)
`tions in the identification data, e.g., mis(cid:173)
`takes in spelling the name. This is not a
`problem when machine-readable iden(cid:173)
`tification data on patient cards can be
`used; but this is not always the case.
`Certain notifying institutions, such as
`pathologists, may not have access to the
`patient card. Old data (from the time
`before
`the
`introduction of patient
`cards) should also be linked. In any
`case, requirement 2 conflicts with com(cid:173)
`plete anonymity; the model has
`to
`provide a balance between these two
`conflicting goals. What we need is a
`concept of error detection and error
`correction for encrypted data. Finding
`an optimal solution is an interesting
`problem for further research. As a first
`solution we divide the 'one-way' part of
`the pseudonym into a set of 'linkage
`data' that satisfy requirements 1, 2
`and 5.
`In order to meet requirement 4 we
`add a second part to the pseudonym.
`This part derives from the identification
`data of the patient by encryption; the
`key is ~nown only to the trusted institu(cid:173)
`tion. For reasons to be discussed later
`we use asymmetric encryption with two
`keys (see section 5.1).
`The reason for requirement 3 is
`that the German Federal States will
`have separate registries. To enable
`anonymous data matching between
`these registries they could use a com(cid:173)
`mon cryptographic key, but this is not
`advisable: A secret loses its value if
`shared among too many parties. There(cid:173)
`fore, for inter-registry linking we pro(cid:173)
`pose a re-encryption of the first part
`of the pseudonym with a temporary
`(one-time) key (for details, see sec(cid:173)
`tion 5.3).
`Our concept of pseudonymity in
`cancer registry needs an organizational
`framework that is described in the next
`section.
`
`3. Organizational structure
`of registry
`
`The cancer registry consists of two
`separate offices at separate locations.
`The first office (trusted office, "Ver(cid:173)
`tra uensstelle") basically serves for the
`notification and generates the pseudo(cid:173)
`nyms. The second office (registration
`office, "Registerstelle") links the re(cid:173)
`cords and stores data permanently.
`
`3.1. Identity Data and
`Epidemiological Data
`
`following we distinguish
`the
`In
`between identity data and epidemiolog(cid:173)
`ical data. Identity data are:
`-
`surname, former surname(s), given
`name(s),
`address,
`- date of birth, date of death,
`date of diagnosis,
`- notifying physician or health-care in(cid:173)
`stitution.
`Epidemiological data are those data
`that are needed in every meaningful
`statistical evaluation of the registry
`data:
`- gender,
`- census code of place of residence,
`- professional group,
`- year of birth, year of death,
`- year of diagnosis,
`- date of notification,
`-
`tumor classification,
`-
`further medical data.
`
`3.2. The Trusted Office
`
`The trusted office accepts incoming
`reports from physicians or hospital(cid:173)
`based cancer registries. These reports
`are checked for completeness and
`plausibility. If necessary, this office ob(cid:173)
`tains additional information from the
`reporting physicians. It codes the re(cid:173)
`ported diseases according to classifi(cid:173)
`cation schemes such as ICD-9 and
`ICD-10. Thereafter, it assigns a pseudo(cid:173)
`nym to the record, and sends the pseu(cid:173)
`donymous record to the registration
`office. After a short period of time,
`when any discrepancies are cleared,
`the trusted office deletes the records
`in its database. Death certificates are
`also sent to the trusted office and
`handled in the same way as notification
`forms.
`
`113
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 2 of 10
`
`
`
`The trusted office is directed by a
`physician and, therefore, is subject to
`professional discretion in addition to
`data-protection laws. It is trusted by all
`other parties, hence the German name
`"Vertrauensstelle". Nevertheless, the
`decryption key -
`the 'private' key of
`the asymmetric encryption procedure,
`henceforth
`called
`'re-identification
`key' - is held in a second trusted institu(cid:173)
`tion outside the cancer registry. There
`are several sensible choices for this in(cid:173)
`stitution; in the following we call it the
`'supervising office'. The separate hand(cid:173)
`ling of the re-identification key empha(cid:173)
`sizes the 'separation of informational
`powers' and makes clear that decryp(cid:173)
`tion ( = re-identification) is an excep(cid:173)
`tional process. Moreover, it gives addi(cid:173)
`tional security in case of a compromised
`encryption key.
`
`3.3. The Registration Office
`
`The registration office receives pseu(cid:173)
`donymous data only. With these data it
`performs record linkage and detects
`duplicate notifications; then it stores the
`pseudonyms and the epidemiological
`data permanently. If the record linkage
`reveals any inconsistencies, these are
`reported back to the trusted office
`which, in turn, may sort out any dis(cid:173)
`crepancies by contacting the reporting
`physicians. In the same way the office
`links a death certificate to an existing
`patient record. Figure 1 illustrates the
`data flow. Only the registration office
`stores records permanently.
`
`3.4. Epidemiological Studies
`
`The pseudonymous records serve for
`routine analyses of the cancer registry
`as well as for epidemiological studies.
`Figure 2 illustrates the procedure for a
`cohort study: if a well-defined cohort
`(e.g., occupationally exposed employ(cid:173)
`ees of a company) is to be analyzed for
`the occurrence of cancer, a sequence
`number is assigned to each individual
`member of the cohort and possibly
`also to non-exposed controls. These se(cid:173)
`quence numbers serve as simple tempo(cid:173)
`rary pseudonyms for the study. A re(cid:173)
`search institute (which could also be the
`registry) obtains a record for each indi(cid:173)
`vidual containing the sequence number
`and the exposure data. A record con-
`
`Physician
`
`Hospital based
`registry
`
`Health care
`institution
`
`Public health
`department
`(death
`certificates)
`
`reoorts
`
`....
`_ __L Trusted Office
`encrypts identification
`data
`
`,(
`checks discrepancies
`, .....
`....
`,
`
`A°
`
`reports
`
`forwards
`data
`
`reports
`implausible
`
`' ( i data
`
`Registration office
`
`stores
`• pseudonyms
`• epidemiological data
`
`Fig. 1 Organiza(cid:173)
`tional structure and
`information flow.
`
`taining the sequence number and per(cid:173)
`sonal identification data is sent to the
`trusted office in parallel. This office
`generates the pseudonym and sends it
`to the registration office, together with
`the sequence number. The registration
`office performs the record linkage and
`generates a record which contains the
`sequence number and the epidemiolog(cid:173)
`ical data stored in the registry. Thereaf(cid:173)
`ter, epidemiological data and exposi(cid:173)
`tion data may be linked for further anal(cid:173)
`ysis by using the sequence number. This
`procedure ensures that for the purpose
`of the study nobody sees which cohort
`members were diseased.
`A corresponding procedure applies
`to case-control studies if only the epi(cid:173)
`demiological data which are kept in the
`registry are needed for such a study.
`If it is necessary to obtain additional
`information from the diseased patients,
`the identification data may be decrypt(cid:173)
`ed using the re-identification key which
`
`I)
`
`(
`
`is kept in the supervising office (see sec(cid:173)
`tion 3.2). Re-identification has to be ap-1
`proved by an ethics committee and is
`done in the supervising office; techni(cid:173)
`cally this could also be realized with a
`portable PC operated by an employee
`of the supervising office. The decrypted
`identification data are then given to the
`trusted office. In some cases the neces(cid:173)
`sary data can be retrieved from the
`notifying institution. If it is necessary to
`contact the patient for an additional
`inquiry, the trusted office has lo obtain
`informed consent from the patient via
`the notifying or
`treating physician
`whose identity is stored as part of the
`(encrypted) identification data of the
`patient (see section 3.1).
`
`4. A Registry Model
`
`ii
`
`Since a strict formalization of the
`procedures of the previous section in
`
`Sequence#
`Identification data
`
`Source of Cohort
`
`Sequence#
`Identification data
`Exposure data
`
`Sequence#
`Exposure data
`
`Sequence#
`Pseudonym
`
`Registration office
`
`Pseudonym
`Epidemiological data
`
`Sequence#
`Epidemiological data
`
`Research Institute
`
`Sequence#
`Exposure data
`Epidemiological data
`
`Fig. 2 Record
`linkage for cohort
`studies.
`
`114
`
`Meth. Inform. Med., Vol. 35, No.2, 1996
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 3 of 10
`
`
`
`the sense of [15] would be too technical
`for this paper, we only give a systematic
`verbal (semi-formal) description and
`the access matrix of the registry model;
`some of the less relevant details are
`given in a slightly simplified form.
`Every assumption of the model
`should be critically examined as
`to
`whether it is sound. For instance, can a
`party do things it is not supposed to do?
`What can two or more parties achieve
`through collaboration? The model will
`not give absolute security but will
`show where additional (organizational)
`means should be provided. The organ(cid:173)
`izational framework has to guarantee
`the model assumptions and fill
`the
`security gaps that the cryptographic
`•
`· procedures leave open.
`In discussing the security of the mod(cid:173)
`el we assume that the cryptographic
`algorithms are secure and that they are
`implemented in a secure way. The first
`assumption is justified by using state(cid:173)
`of-the-art cryptographic
`techniques.
`The second assumption is more prob(cid:173)
`lematic and needs careful organization(cid:173)
`al measures.
`
`4.1. Data and Parties
`
`In the semi-formal description of
`the model we speak of the patient, the
`the
`cooperating
`registry,
`sequence
`number etc., although in reality there
`are several instances of each of these
`classes.
`.l'he knowledge (or data) in our model
`consists of the following parts:
`- The identity data (see 3.1).
`- The pseudonym
`the encrypted identity (see 5.1),
`-
`the linkage data (see 5.3); they
`-
`occur in 'pure hash' format, in
`'linkage' format, in 'storage' for(cid:173)
`mat, and in
`'exchange' formal
`(see Fig. 5).
`- The epidemiological data (see 3.1).
`- The sequence number, a temporary
`pseudonym for a research project as
`in 3.4.
`- The encryption key for asymmetric
`encryption of identification data.
`- The re-identification key for re-iden(cid:173)
`tification of identity data.
`- The linkage data key for generating
`the linkage data (see 5.3).
`- The storage key for permanent stor(cid:173)
`age of the linkage data (see 5.3).
`
`Meth. Inform. Med., Vol.35, No.2, 1996
`
`The exchange key for inter-registry
`record linkage (see 5.4).
`Moreover, we have the identification:
`data of the notifying institution for
`clearing discrepancies, for obtaining
`follow-up information, for reporting
`follow-up information in the case where
`the notifying institution is a clinical
`cancer registry, and for compensating
`the reporting physician for his notifica(cid:173)
`tion. The trusted office also stores other
`administrative data.
`The relevant parties for our model
`are the following; for each of these par(cid:173)
`ties we have to define what knowledge
`it has or transfers and which other par(cid:173)
`ties it trusts:
`- The patient has access to his own
`data, but only via his treating physi(cid:173)
`cian.
`- The notifying institution knows the
`data of its own patients:
`- The treating physician notifies the
`registry of his patients and can be
`asked by the trusted office about
`them.
`institutions
`Other health-care
`which also send notifications arc
`clinical cancer registries, after(cid:173)
`and Public
`care
`institutions,
`Health offices.
`- The trusted office sees all the data
`except the re-identification key and
`the storage key.
`It permanently
`stores only the encryption key and
`the linkage data key.
`- The supervising office keeps the re(cid:173)
`identification key and sees the iden(cid:173)
`tity data of re-identified cases.
`The registration office sees the pseu(cid:173)
`donym, the epidemiological data, the
`sequence number, the storage key,
`and also stores these data perma(cid:173)
`nently (except the sequence num(cid:173)
`ber).
`- The cooperating registry:
`- The trusted office sees the ex-
`change key and the pseudonyms,
`even in pure hash format.
`- The registration office sees the
`linkage data in its own linkage for(cid:173)
`mat. In case of a match it gets the
`full registry data, which is the aim
`of the linking procedure.
`The research institute gets the se(cid:173)
`quence number and the epidemi(cid:173)
`ological data as well as the exposure
`data which are outside the scope of
`the registry model (see 3.4).
`
`The outsider: any person or institu(cid:173)
`tion other than those listed above -
`has access only to communication
`to
`storage
`and perhaps
`paths
`media, if these leave the registration
`office, say, in case of a hardware
`defect.
`the notifying
`The bank where
`physician has his account is ignored.
`Only a very small amount of informa(cid:173)
`tion can be gained by observing the
`financial transfers, e.g., that a certain
`physician has a cancer patient at a cer(cid:173)
`tain time.
`In the following we discuss only the
`parts of the model that are relevant for
`the pseudonymity aspect. For example,
`data on storage and communication
`media should be useless for the outsid(cid:173)
`er; this is achieved by encryption of all
`communication paths and all storage
`media. In particular, the notifying insti(cid:173)
`tutions should communicate with the
`trusted office in a secure manner, i.e.,
`using encrypted data transfer. Hence(cid:173)
`forth, we assume that the outsider can
`gain data access only through collabora(cid:173)
`tion with some other institution, and
`leave the security of communication
`and storage outside the scope of this
`paper.
`
`4.2. The Access Matrix
`
`Figure 3 gives the access matrix of
`the registry model. We have lo show
`that no party can get additional infor(cid:173)
`mation by inferencing, in other words,
`that the access matrix as shown in Fig. 3
`is complete. Since the model involves
`cryptographic keys, i.e., data that imply
`access to other data, the question is
`what subsets of the set of data in the
`access matrix are 'closed' with respect
`to inferencing. This gives only a 'naive'
`proof of security; there are indirect
`ways for getting additional informations
`(see section 4.3).
`We have a single inference that
`needs no key:
`id _,. ld,,,
`where the symbols are taken from Fig. 3
`and the arrow denotes the inference. In
`other words: whoever has the iden(cid:173)
`tification data can derive the linkage
`data in pure hash format, because the
`hash algorithm is publicly known and
`needs no key. The complete list of key(cid:173)
`dependent inferences is as follows:
`
`115
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 4 of 10
`
`
`
`ke: id -->- ps,
`k,e: ps -->- id,
`k1d: ldh <-+ ld1,
`ks,: ld1 <-+ lds,
`kx: [dh <-+ ldx.
`Therefore, the access matrix is com(cid:173)
`plete. The only way to infer the iden(cid:173)
`tification data id is by knowledge of
`ps and k,e, the encrypted identification
`data and
`the
`re-identification key.
`Hence this can only be done by the
`supervising office.
`
`4.3. Indirect Ways
`for Re-identification
`
`-
`
`The goal of the registry model is to
`make unauthorized re-identification as
`difficult as possible. However, what is
`possible, if the access matrix is guaran(cid:173)
`teed by the implementation of the mod(cid:173)
`el? The multitude and nature of indirect
`ways for making inferences about the
`data cannot be completely delineated.
`This is the main difficulty in proving the
`validity of any security model formally.
`Some relevant methods that should be
`considered are:
`trial encryption (guessed plain-text
`attack),
`data matching with outside sources
`[16],
`statistical attacks [16],
`covert channels [17],
`(voluntary or
`social engineering
`forced collaboration).
`The outsider sees none of the data.
`He could gain access only by collabora(cid:173)
`tion with another party.
`the
`The
`research
`institute sees
`epidemiological data and could try an
`unauthorized matching with an external
`data source. This danger is inherent in
`the granularity of the epidemiological
`data and cannot be made smaller by
`any model whatsoever. Therefore, the
`release ofsubsets of epidemiological
`data is restricted according to a specific
`project.
`The cooperating registration office
`only sees the linkage data in its own
`linkage format. It could try a statistical
`attack to find out some frequent names
`or use distribution anomalies of birth
`data. But this will hardly suffice to iden(cid:173)
`tify even a single case other than those
`that this registry has among its own
`records.
`
`116
`
`s = sees
`(and temporarily
`stores)
`k = keeps
`(= permanently
`stores)
`
`d = can derive
`
`-;:;-
`
`i i
`'" E ...:
`Jl
`"
`00 "
`"
`"'
`e .c
`00
`u
`E
`"
`,'.'.;, ~
`~ !'l
`-0 _,
`"'
`" "
`:,J'
`Cl] "'
`-"'
`-"'
`c::
`"
`::l
`::l
`
`1:
`~ ~
`>,
`"
`~
`-"'
`>,
`" ~
`c::
`.9
`-"'
`>,
`;;,
`u
`19
`-"'
`>,
`u
`"'
`"
`s
`u
`-0
`-"'
`" "
`Cl]
`:a
`C:
`00
`" " e
`-£
`"2
`-0
`·u
`.9
`"'
`::l
`c,:
`er,
`u:i
`
`00
`
`Patient
`
`s :
`
`Notifying institution
`
`d
`
`d .
`
`Trun~offire
`
`s: s s: s
`
`k :
`
`k
`
`: s
`
`: k
`
`k
`
`k
`
`k
`
`s2: s2 d
`Supervising office
`Registration office
`k
`....... , .............................................. .
`Cooperating trusted office
`
`Cooperating reg. office
`
`s3
`
`Research institute
`
`Outsider
`
`Fig.3 Access matrix of the registry model. 1 only own patients; 2 only re-identified cases;
`3 in its own linkage format.
`
`The cooperating trusted office sees
`the linkage data even in pure hash
`format and could perform a
`trial
`encryption. However, it is trusted by
`definition.
`The registration office could try
`illegal data matching with the epidemi(cid:173)
`ological data and a statistical attack at
`the linkage data in linkage format.
`The supervising office sees the iden(cid:173)
`tity data of re-identified cases. How(cid:173)
`ever, it is also trusted, and it gets only
`few data.
`The trusted office sees the iden(cid:173)
`tification data and the epidemiological
`data, but it is trusted by definition.
`The notifying institution and the
`patient get no knowledge of data they
`should not know. They know their own
`data only.
`The question what a party can do
`that has unauthorized knowledge of an
`additional piece of data, say, by col(cid:173)
`laborating with another party, can be
`answered by the analysis in section 4.2.
`Covert channels could be exploited, for
`instance, by faking notifications; we
`to
`this
`in section 7.1.
`come back
`Unauthorized matching with epidemi(cid:173)
`ological data is only possible for an
`
`employee of the registration office or of
`the research institute; the trusted office
`that also sees the epidemiological data
`sees the identity anyway.
`
`5. Encryption Procedures
`
`Encryption of identifying data is per(cid:173)
`formed by using different techniques
`which are suited for different purposes.
`A detailed technical description of the
`basic algorithms is given in [14]. As a
`basis to assess the performance of the
`procedures one has to take an expected
`number of 50,000 notifications each
`year for Rheinland-Pfalz. The efficien(cid:173)
`cy of the procedures also suffices for
`larger registries.
`
`5.1. Asymmetric Encryption
`of Identification Data
`
`Asymmetric encryption techniques
`use two different keys for encryption
`and decryption, often called 'public key'
`and 'private key'. This notation, how(cid:173)
`ever does not fit in the present context.
`Therefore we speak of 'encryption key'
`
`Meth. Inform. Med., Vol. 35, No. 2, 1996
`
`Downloaded from www.methods-online.com on 2018-01-29 | ID: 90046 | IP: 115.113.232.2
`For personal or educational use only. No other uses without permission. All rights reserved.
`
`DATAVANT, INC. EXHIBIT NO. 1006
`Page 5 of 10
`
`
`
`and 're-identification key'. Knowledge
`of one of the keys does not help in any
`way to derive the other.
`The identity data of each incoming
`record are encrypted in the trusted of(cid:173)
`fice using the encryption key, see Fig. 4.
`If, under special circumstances (as in
`3.4),
`the decryption of some iden(cid:173)
`tification data becomes necessary, the
`registration office sends the encrypted
`identity data back to the trusted office
`that initiates the re-identification, see
`section 3.4.
`The most suitable asymmetric en(cid:173)
`cryption method, according to the state(cid:173)
`of-the-art, is the RSA algorithm [14, 18,
`19]. It uses the mathematical operation
`aof modular exponentiation, x --+ x' mod
`ll'n; character strings are treated as num(cid:173)
`bers according to their bit patterns and
`decomposed into blocks such that each
`block represents a number smaller than
`n. The modulus n is a very large num(cid:173)
`ber. The exponent e is the encryption
`key. The re-identification key d has a
`size similar to n and the property that
`xed ~ x (mod n). Thus, modular expo(cid:173)
`nentiation with d is the inverse opera(cid:173)
`tion of modular exponentiation with e.
`Deriving e from n and d requires de(cid:173)
`composition of n into its prime factors,
`a task that is mathematically infeasible,
`if n is large enough. Experts recom(cid:173)
`mend a key length of > 700 bits [20].
`Since in a cancer registry data are
`stored for a long time, one should rath(cid:173)
`er choose a key length of > 1,000 bits to
`1- e prepared for possible technological
`progress. For performance reasons, in(cid:173)
`stead of RSA one could use a hybrid
`encryption method [19, section V.1.7]
`such as RSA + DES or PGP (RSA+
`
`IDEA) [14, section 17.9]. This makes
`sense as soon as the data to be encrypt(cid:173)
`ed are longer than a single RSA block.
`DES and IDEA are symmetric encryp(cid:173)
`tion procedures, meaning that encryp(cid:173)
`tion and decryption use the same key.
`The exact description is too complicat(cid:173)
`ed to be given here; we refer to [14, 17].
`They are several orders of magnitude
`faster than all known asymmetric pro(cid:173)
`cedures but do not fit directly to our
`model which relies on asymmetric en(cid:173)
`cryption. Therefore, a hybrid combina(cid:173)
`tion with RSA has to be used.
`If an employee of the registration of(cid:173)
`fice gains knowledge of the encryption
`key, or if an outsider gains knowledge
`of the encryption key and access to the
`registered data, he could perform a trial
`encryption ('chosen plain-text attack')
`with the corresponding identity data.
`In order to prevent this possible misuse,
`each record is complemented by a
`random number before encryption.
`As shown in Fig. 4, this random number
`is kept in the encrypted part of the
`record.
`
`5.2. Key Management
`
`The keys have to be generated in a
`secure manner under special organiza(cid:173)
`tional precautions, e.g., in the supervis(cid:173)
`ing office. The encryption key is kept in
`the trusted office. It has not necessarily
`to be kept secret because the encryption
`is randomized (see section 5.1). There(cid:173)
`fore, there is no need for a cryptograph(cid:173)
`ic token, like a smart card, to hold this
`key. But a smart card is desirable as
`access-control token. It could then also
`hold the key. On the other hand, the
`
`'need to know' principle says that it is
`better keeping the key secret.
`There are two cases where a change
`of the encryption and re-identification
`keys becomes necessary:
`- The actual keys are compromised; at
`least there is suspicion that an unau(cid:173)
`thorized person has got the keys.
`- The progress of cryptanalysis or the
`performance of hardware have ad(cid:173)
`vanced to a great extent such that the
`chosen key length can no longer be
`assumed to be sufficient.
`In these cases a new, more secure
`pair of encryption and re-identification
`keys has to be generated and used. This
`could be done by decrypting and then
`re-encrypting all the stored records in
`the trusted office. However, the Ger(cid:173)
`man BSI ('Bundesamt fur Sicherheit in
`der Informationstechnik', Federal Of(cid:173)
`fice for Security in Information Techno(cid:173)
`logy) proposed a more efficient meth(cid:173)
`od: define the new encryption method
`to be the composition of the old one
`and the "over-encryption" with the new
`key, thereby avoiding even a temporal
`exposition of the plain-text data; the
`future decryption key is the composi(cid:173)
`tion of the old and the new keys. Over(cid:173)
`encryption of the old records can be
`done in the registration office under
`special security precautions. An analo(cid:173)
`gous procedure also applies in case the
`chosen encryption met



