throbber
Journal of Hepatology 46 (2007) 947–954
`
`Review
`
`www.elsevier.com/locate/jhep
`
`Methodology of superiority vs. equivalence trials
`and non-inferiority trials
`
`Erik Christensen*
`
`Clinic of Internal Medicine I, Bispebjerg University Hospital, Bispebjerg Bakke 23, DK-2400 Copenhagen NV, Copenhagen, Denmark
`
`The randomized clinical trial (RCT) is generally accepted as the best method of comparing effects of therapies. Most
`often the aim of an RCT is to show that a new therapy is superior to an established therapy or placebo, i.e. they are planned
`and performed as superiority trials. Sometimes the aim of an RCT is just to show that a new therapy is not superior but
`equivalent to or not inferior to an established therapy, i.e. they are planned and performed as equivalence trials or non-
`inferiority trials. Since the types of trials have different aims, they differ significantly in various methodological aspects.
`The awareness of the methodological differences is generally quite limited. This paper reviews the methodology of these
`types of trials with special reference to differences in respect to planning, performance, analysis and reporting of the trial.
`In this context the relevant basal statistical concepts are reviewed. Some of the important points are illustrated by
`examples.
`Ó 2007 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.
`
`1. Introduction
`
`The randomized clinical trial (RCT) is generally
`accepted as the best method of comparing effects of ther-
`apies [1,2]. Most often the aim of an RCT is to show
`that a new therapy is superior to an established therapy
`or placebo, i.e. they are planned and performed as supe-
`riority trials. Sometimes the aim of an RCT is just to
`show that a new therapy is not superior but equivalent
`to or not inferior to an established therapy, i.e. they
`are planned and performed as equivalence trials or
`non-inferiority trials [3]. Since these types of trials have
`different aims, they differ significantly in various meth-
`odological aspects [4]. The awareness of the methodo-
`logical differences
`is generally quite limited. For
`example it is a rather common belief that failure of find-
`ing a significant difference between therapies in a superi-
`ority trial implies that the therapies have the same effect
`or are equivalent [5–10]. However, such a conclusion is
`
`* Tel.: +45 3531 2854; fax: +45 3531 3556.
`E-mail address: ec05@bbh.hosp.dk
`
`not correct because of a considerable risk of overlooking
`a clinically relevant effect due to insufficient sample size.
`The purpose of this paper is to review the method-
`ology of the different types of trials, with special refer-
`ence to differences in respect to planning, performance,
`analysis and reporting of the trial. In this context the
`relevant basal statistical concepts will be reviewed.
`Some of the important points will be illustrated by
`examples.
`
`2. Superiority trials
`
`2.1. Sample size estimation and power of an RCT
`
`An important aspect in the planning of any RCT is to
`estimate the number of patients necessary i.e. the sample
`size. The various types of trials differ in this respect
`[1,2,11]. A superiority trial aims to demonstrate the
`superiority of a new therapy compared to an established
`therapy or placebo. The following description applies to
`a superiority trial. The features, by which an equivalence
`or a non-inferiority trial differ, will be described later.
`
`0168-8278/$32.00 Ó 2007 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.
`doi:10.1016/j.jhep.2007.02.015
`
`Mylan Exhibit 1059
`Mylan v. Regeneron, IPR2022-01226
`Page 1
`
`

`

`948
`
`E. Christensen / Journal of Hepatology 46 (2007) 947–954
`
`To estimate the sample size one needs to consider some
`important aspects described in the following.
`By how much should the new therapy be better than
`the reference therapy? This extra effect of the new com-
`pared to the reference therapy is called the Least Rele-
`vant Difference or the Clinical Significance. It is often
`denoted by the Greek letter D (Fig. 1).
`By how much would the difference in effect between
`the two groups be influenced by random factors? Like
`any other biological measurement a treatment effect is
`subject to a considerable ‘‘random’’ variation, which
`needs to be determined and taken into account. The
`magnitude of the variation is described in statistical
`terms by the standard deviation S or the variance S2
`(see Fig. 1c). The variance of the effect variable would
`need to be obtained from a pilot study or from previ-
`ously published similar studies. The trial should dem-
`onstrate as precisely as possible the true difference in
`effect between the treatments. However, because of
`the random variation the final result of the trial may
`deviate from the true difference and give erroneous
`results. If for example the null hypothesis H0 of no
`difference were true, it could be still that the trial in
`some cases would show a difference. This type of error
`– the type 1 error (‘‘false positive’’) (Fig. 1) – would
`have the consequence of
`introducing an ineffective
`therapy. If on the other hand the alternative hypothe-
`sis HD of the difference being D were true, the trial
`could in some cases fail to show a difference. This type
`of error – the type 2 error (‘‘false negative’’) (Fig. 1) –
`would have the consequence of rejecting an effective
`therapy.
`Thus one needs to specify how large risks of type 1
`and type 2 errors would be acceptable for the trial. Ide-
`ally the type 1 and type 2 error risks should be near zero,
`but this would need extremely large trials. Limited
`resources and patient numbers make it necessary to
`accept some small risk of type 1 and 2 errors.
`Most often the type 1 error risk a would be specified
`to 5%. In this paper, a means the type 1 error risk in one
`direction i.e. either up or down from H0 i.e. a = 5%.
`However, in many situations one would be interested
`in detecting both beneficial and harmful effects of the
`new therapy compared to the control therapy, i.e. one
`would be interested in ‘‘two-sided’’ testing for a differ-
`ence in both ‘‘upward’’ and ‘‘downward’’ direction
`(Fig. 1). Hence we would instead specifiy the type 1 error
`risk to be 2a (i.e. aupwards + adownwards), i.e. 2a = 5%.
`The type 2 error risk b would normally be specified
`to 10-20%. Since a given value of D is always either
`above or below zero (H0), the type 2 error risk b is
`always one-sided. The smaller b, the larger the com-
`plementary probability 1 b of accepting HD when
`it is in fact true. 1 b is called the power of the trial
`because it states the probability of finding D if this
`difference truly exists.
`
`Fig. 1. Illustration of factors influencing the sample size of a trial. The
`effect difference found in a trial will be subject to random variation. The
`variation is illustrated by bell-shaped normal distribution curves for a
`difference of zero corresponding to the null hypothesis (H0) and for a
`difference of D corresponding to the alternative hypothesis (HD),
`respectively. Defined areas under the curves indicate the probability of
`a given difference being compatible with H0 or HD, respectively. If the
`difference lies near H0, one would accept H0. The farther the difference
`would be from H0, the less probable H0 would be. If the probability of H0
`becomes very small (less than the specified type 1 error) risk 2a (being a
`in either tail of the curve) one would reject H0. The sample distribution
`curves show some overlap. A large overlap will result in considerable risk
`of interpretation error,
`in particular the type 2 error risk may be
`substantial as indicated in the figure. An important issue would be to
`reduce the type 2 error risk b (and increase the power 1 b) to a
`reasonable level. Three ways of doing that are shown in (b–d), a being a
`reference situation. (b) Isolated increase of 2a will decrease b and
`increase power. Conversely, isolated decrease of 2a will increase b and
`decrease power. (c) Isolated narrowing of the sample distribution curves –
`by increasing sample size 2N and/or decreasing variance of the difference
`S2 – will decrease b and increase power. Conversely, isolated widening of
`the sample distribution curves – by decreasing sample size and/or
`increasing variance of the difference – will increase b and decrease power.
`(d) Isolated increase of D – larger therapeutic effect – will decrease b and
`increase power. Conversely, isolated decrease of D – smaller therapeutic
`effect – will increase b and decrease power.
`
`From given values of D, S2, a and b the needed num-
`ber (N) of patients in each group can be estimated using
`this relatively simple general formula:
`
`Mylan Exhibit 1059
`Mylan v. Regeneron, IPR2022-01226
`Page 2
`
`

`

`E. Christensen / Journal of Hepatology 46 (2007) 947–954
`
`949
`
`N ¼ ðZ2a þ ZbÞ2  S2=D2;
`
`where Z2a and Zb are the standardized normal deviates
`corresponding to the levels of the defined values of 2a
`(Table 1, left), and b (Table 1, right), respectively. If
`for some reason one wants to test for difference in only
`one direction (‘‘one-sided’’ testing) one should replace
`Z2a with Za in the formula and apply the right side of
`Table 1. The formula is approximate, but it gives in
`most cases a good estimation of the necessary number
`of patients. For a trial with two parallel groups of equal
`size the total sample size will be 2N.
`The values used for 2a, b and D should be decided by
`the researcher, not by the statistician. The values chosen
`should take into account the disease, its stage, the effec-
`tiveness and side effects of the control therapy and an
`estimate of how much extra effect may be reasonably
`expected by the new therapy.
`If for example the disease is rather benign with a rel-
`atively good prognosis and the new therapy is more
`expensive and may have more side effects than a rather
`effective control therapy, one should specify a relatively
`larger D and b and a smaller 2a, because the new therapy
`would only be interesting if it is markedly better than the
`control therapy.
`If on the other hand the disease is aggressive, the new
`therapy is less expensive or may have less side effects
`than a not very effective control therapy, one should
`specify a relatively smaller D and b and a larger 2a,
`because the new therapy would be interesting even if it
`is only slightly better than the control therapy.
`
`As mentioned above 2a would normally be specified
`to 5% or 0.05, but one may justify values of 0.10 or
`0.01 in certain situations as mentioned above. b would
`normally be specified to 0.10–0.20, but in special situa-
`tions a higher or lower value may be justified. D should
`be decided on clinical grounds as the least relevant ther-
`apeutic gain of the new therapy considering the disease
`and its prognosis, the efficacy of the control therapy
`and what may reasonably be expected of the new ther-
`apy. Preliminary data from pilot studies or historical
`observational data can be guidelines for the choice of
`D. Even if it may be tempting to specify a relatively large
`D as fewer patients will then be needed, D should never
`be specified larger than what is biologically reasonable.
`It will always be unethical to perform trials with unreal-
`istic aims. Fig. 1 illustrates the effects on the type 2 error
`risk b and hence also on the power (1 b) of changing
`2a, N, S2 and D. Thus b will be decreased and the power
`1 b will be increased if 2a is increased (Fig. 1b), if the
`sample size is increased (Fig. 1c), and if D is increased
`(Fig. 1d).
`The estimated sample size should be increased in pro-
`portion to the expected loss of patients during follow-up
`due to drop-outs and withdrawals.
`
`2.2. The confidence interval
`
`An important concept indicating the confidence of
`the result obtained in an RCT is the width of the confi-
`dence interval of the difference D in effect between the
`therapies investigated [1,2]. The narrower the confidence
`
`Table 1
`Abbreviated table of the standardized normal distribution (adapted for this paper)
`
`Two-sided
`probability
`
`Z2a
`
`One-sided probability
`
`2a
`
`Za or Zb
`
`a or b
`
`Za or Zb
`
`a or b
`
`3.72
`3.29
`3.09
`2.58
`2.33
`1.96
`1.64
`1.28
`1.04
`0.84
`0.67
`0.52
`0.39
`0.25
`0.13
`0.00
`
`3.72
`3.29
`3.09
`2.58
`2.33
`1.96
`1.64
`1.28
`1.04
`0.84
`0.67
`0.52
`0.39
`0.25
`0.13
`0.00
`
`0.0001
`0.0005
`0.001
`0.005
`0.010
`0.025
`0.05
`0.10
`0.15
`0.20
`0.25
`0.30
`0.35
`0.40
`0.45
`0.50
`
`0.0002
`0.001
`0.002
`0.01
`0.02
`0.05
`0.1
`0.2
`0.3
`0.4
`0.5
`0.6
`0.7
`0.8
`0.9
`1.0
`
`0.00
`0.13
`0.25
`0.39
`0.52
`0.67
`0.84
`1.04
`1.28
`1.64
`1.96
`2.33
`2.58
`3.09
`3.29
`3.72
`Note. The total area under the normal distribution curve is one. The area under a given part of the curve gives the probability of an observation being
`in that part. The y-axis indicates the ‘‘probability density’’, which is highest in the middle of the curve and decreases in either direction toward the
`tails of the curve. The normal distribution is symmetric, i.e. the probability from Z to plus infinity (right side of the table) is the same as from Z to
`1. The right side of the table gives the one-sided probability from a given Z-value on the x-axis to +1. The left side of the table gives the two-sided
`probability as the sum of the probability from a given positive Z-value to +1 and the probability from the corresponding negative Z-value to 1.
`
`0.50
`0.55
`0.60
`0.65
`0.70
`0.75
`0.80
`0.85
`0.90
`0.95
`0.975
`0.990
`0.995
`0.999
`0.9995
`0.9999
`
`Mylan Exhibit 1059
`Mylan v. Regeneron, IPR2022-01226
`Page 3
`
`

`

`950
`
`E. Christensen / Journal of Hepatology 46 (2007) 947–954
`
`interval would be, the more reliable the result would be.
`In general the width of the confidence interval is deter-
`mined by the sample size. A large sample size would
`result in a narrow confidence interval. Normally the
`95% confidence interval would be estimated. The 95%
`confidence interval is the interval, which would on aver-
`age include the true difference in 95 out of 100 similar
`studies. This is illustrated in Fig. 2 where 100 trial sam-
`ples of the same size have been randomly drawn from
`the same population. It is important to note that in 5
`of the 100 samples the 95% confidence interval of the
`difference in effect D does not include the true difference
`found in the population. When the sorted confidence
`intervals are aligned to their middle (Fig. 2c), the varia-
`tion in relation to the true value in the population
`becomes even clearer. If simulation is carried out on
`an even greater scale, the likelihood distribution of the
`true difference in the population, given the results from
`a certain trial sample, will follow a normal distribution
`like that presented in Fig. 3 [2]. It is seen that the likeli-
`hood of the true difference in the population is maxi-
`mum at the difference D found in the sample and that
`it decreases with higher and lower values. The figure also
`
`Fig. 2. Illustration of the variation of confidence limits in random
`samples (computer simulation). (a) ninety-five percent confidence inter-
`vals in 100 random samples of same size from the same population
`aligned according to the true value in the population. In 5 of the samples
`the 95% confidence interval does not include the true value found in the
`population. (b) The same confidence intervals are here sorted according
`to their values. (c) When the sorted confidence intervals are aligned to
`their middle, their variation in relation to the true value in the population
`is again clearly seen. This presentation corresponds to how investigators
`would see the world. They investigate samples in order to extrapolate the
`findings to the population. However,
`the potential
`imprecision of
`extrapolating from a sample to the population is apparent – especially
`if the confidence interval is wide. Thus keeping confidence intervals rather
`narrow is important. This would mean relatively large trials.
`
`Fig. 3. (a) Histogram showing the distribution of the true difference in
`the population in relation to the difference D found in the trial sample
`(computer simulation of 10,000 samples). (b) The normally distributed
`likelihood curve of the true difference in the population in relation to the
`difference D found in a trial sample. The 95% confidence interval (CI) is
`shown.
`
`illustrates the 95% confidence interval, which is the
`interval that includes the middle 95% of the total likeli-
`hood area under the normal curve. This area can be cal-
`culated from the difference D and its standard error
`SED. To be surer that the true difference is included in
`the confidence interval, one may calculate a 99% confi-
`dence interval, which would be wider, since it should
`include the middle 99% of the total likelihood area.
`
`2.3. The type 2 error risk of having overlooked
`a difference D
`
`If the 95% confidence interval of D includes zero,
`then there is no significant difference in effect between
`the two therapies. However, this does not mean that
`one can conclude that the effects of the therapies are
`the same. There may still be a true difference in effect
`between the therapies, which the RCT has just not been
`able to detect e.g. because of insufficient sample size and
`power. The risk of having overlooked a certain differ-
`ence in effect of D between the therapies is the type 2
`error risk b. In some cases this risk may be substantial.
`Example 1 gives an illustration of this.
`
`Example 1. In naı¨ve cases of chronic hepatitis C
`genotype 1 pegylated interferon plus ribavirin for 3
`months induce sustained virologic response in about
`40%. One wishes to test if a new therapeutic regimen can
`increase the sustained response in this type of patients to
`
`Mylan Exhibit 1059
`Mylan v. Regeneron, IPR2022-01226
`Page 4
`
`

`

`E. Christensen / Journal of Hepatology 46 (2007) 947–954
`
`951
`
`60% with a power (1 b) of 80%. The type 1 error risk
`(2a) should be 5%. One needs to estimate the number of
`patients necessary for this trial. For comparison of
`proportions like in this trial,
`the variance of
`the
`difference (S2) is equal to p1(1 p1) + p2(1 p2), where
`p1 and p2 are the proportions with response in the
`compared groups. So we have:
`b ¼ 0:20 ) Zb ¼ 0:84
`2a ¼ 0:05 ) Z2a ¼ 1:96:
`D ¼ 0:2:
`p1 ¼ 0:4
`p2 ¼ 0:6
`Using N = (Z2a + Zb)2 · p1(1 p1) + p2(1 p2)/D2 one
`gets:
`N ¼ ð1:96 þ 0:84Þ2  ð0:4  0:6 þ 0:6  0:4Þ=0:22
`¼ 7:84  0:48=0:04 ¼ 94:
`Therefore the necessary number of patients (2N) would
`be 188 patients.
`However, due to various difficulties only 120 patients
`(60 in each group) of this kind could be recruited. By
`solving the general sample size formula according to Zb
`ffiffiffiffi
`one obtains:
`N
` D Z 2a:
`Zb ¼
`
`p S
`
`Fig. 4. Illustration of the type 2 error risk b in an RCT showing a
`difference D in effect, which is not significant, since zero (0) difference lies
`between the lower (L) and upper (U) 95% confidence limits. The type 2
`error risk of having overlooked an effect of D is substantial.
`
`estimated as follows: Zb = (D D)/SED = (0.20–0.15)/
`0.09 = 0.55. Using Table 1 (right part) with interpola-
`tion b becomes 0.29. Thus the risk of having overlooked
`an effect of 20% is 29%. This is a consequence of the
`smaller number of patients included and the reduced
`power of the trial. The situation corresponds to that
`illustrated in Fig. 4. As seen from this figure the result
`of a negative RCT like this does not rule out that the
`true difference may be D, since the type 2 error risk b
`of having overlooked an effect of D is substantial.
`
`3. Equivalence trials
`
`The purpose of an equivalence trial is to establish
`identical effects of the therapies being compared [12–
`17,15]. Complete equivalent effects would mean a D-
`value of zero. As seen from the formula for estimation
`of the sample size (see above) this would mean division
`by zero, which is not possible. Dividing by a very
`small D-value would result
`in an unrealistic large
`estimated sample size. Therefore, as a manageable
`compromise, the aim of an equivalence trial would
`be to determine if the difference in effects between
`two therapies lies within a specified small interval D
`to +D.
`An equivalence trial would be relevant if the new
`therapy is simpler, associated with fewer side-effects or
`less expensive, even if it is not expected to have a larger
`therapeutic effect than the control therapy.
`It is crucial to specify a relevant size of D [14,17]. This
`is not simple. One should aim at limiting as much as
`possible the acceptance of a new therapy, which is infe-
`rior to the control therapy. Therefore D should be spec-
`ified rather small and in any case smaller than the
`smallest value that would represent a clinically meaning-
`ful difference. As a crude general rule D should be spec-
`ified to no more than half the value which may be used
`in a superiority trial [13]. Equivalence between the ther-
`apies would be demonstrated if the confidence interval
`
`Using this formula, the power of the trial with the
`reduced number of patients can be estimated as follows:
`ffiffiffiffiffi
`ffiffiffiffiffiffiffiffi
`60
`0:48
`
`pffi
`
`p
`
`Zb ¼
`
` 0:2 Z2a
`
`Zb ¼ 7:75=0:69  0:2 1:96 ¼ 0:29
`
`Using Table 1 (right part) with interpolation b becomes
`0.39. Thus with this limited number of patients, the
`power 1 b is now only 0.61 or 61% (a reduction from
`80%). This markedly reduced power seriously dimin-
`ishes the chances of demonstrating a significant treat-
`ment effect. A post hoc power calculation like this can
`only be used to explain why a superiority trial is incon-
`clusive; it can never be used to support a negative result
`of a superiority trial.
`The result of the trial was as follows: sustained
`virologic response was found in 26 of 60 (0.43 or 43%)
`in the control group and in 35/60 (0.58 or 58%) in the
`new therapy group. The difference D is 0.15 or 15%,
`but it is not statistically significant (p > 0.10). A simple
`approximate formula for the standard error of the
`difference is:
`p
`ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
`p1ð1 p1Þ=n1 þ p2ð1 p2Þ=n2
`SED ¼
`ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
`¼ 0:09

`0:43  0:57=60 þ 0:58  0:42=60
`95%
`confidence
`interval
`for
`The
`is
`D
`or 0.026
`D ± Z2a · SED = 0.15 ± 1.96 · 0.09
`to
`0.326 (2.6% to 32.6%), which is rather wide, as it
`includes both zero and D. The type 2 error risk of over-
`looking an effect of 20% (corresponding to D) can be
`
`p
`
`Mylan Exhibit 1059
`Mylan v. Regeneron, IPR2022-01226
`Page 5
`
`

`

`952
`
`E. Christensen / Journal of Hepatology 46 (2007) 947–954
`
`for the difference in effect between the therapies turns
`out to lie entirely between D and +D [13]. Fig. 5 illus-
`trates the conclusions that can be drawn from the posi-
`tion of the confidence limits for the difference in effect
`found in the performed trial.
`In the equivalence trial the roles of the null and alter-
`native hypotheses are reversed. In the equivalence trial
`the relevant null hypothesis is that a difference of at least
`D exists, and the aim of the trial is to disprove this in
`favor of the alternative hypothesis that no difference
`exists [13]. Even if this situation is like a mirror image
`of the situation for the superiority trial, it turns out that
`the method for sample size estimation is similar in the
`two types of trial, although D has different meanings
`in the superiority and equivalence trials.
`
`Example 2. In the same patients as described in Exam-
`ple 1 one wishes to test in an RCT the therapeutic
`equivalence of the current regimen of pegylated inter-
`feron plus ribavirin (giving a sustained response in 40%)
`and another new inexpensive therapeutic regimen hav-
`ing less side-effects.
`One needs to estimate the number of patients
`necessary for this trial. The power (1 b) of the trial
`should be 80%. The type 1 error risk (2a) should be 5%.
`The therapies would be considered equivalent if the
`confidence interval for the difference in proportion with
`sustained response falls entirely within the interval
`
`Fig. 5. Examples of observed treatment differences (new therapy –
`control therapy) with 95% confidence intervals and conclusions to be
`drawn. (a) The new therapy is significantly better than the control
`therapy. However,
`the magnitude of
`the effect may be clinically
`unimportant. (b–d) The therapies can be considered having equivalent
`effects. (e–f) Result inconclusive. (g) The new therapy is significantly
`worse than the control therapy, but the magnitude of the difference may
`be clinically unimportant. (h) The new therapy is significantly worse than
`the control therapy.
`
`±0.10% or ±10%. Thus D is specified to 0.10. So we
`have:
`2a ¼ 0:05 ) Z2a ¼ 1:96:
`p1 ¼ 0:4
`p2 ¼ 0:4
`
`b ¼ 0:20 ) Zb ¼ 0:84
`D ¼ 0:10:
`
`Using the same expression for the variance of the differ-
`ence (S2) as in Example 1 this result is obtained:
`N ¼ ð1:96 þ 0:84Þ2  ð0:4  0:6 þ 0:4  0:6Þ=0:12
`¼ 7:84  0:48=0:01 ¼ 376:
`Therefore the necessary number of patients (2N) would
`be 752 patients.
`The trial was conducted and the result of the trial was
`as follows: Sustained virologic response was found in
`145 of 372 (0.39 or 39%) in the control group and in 156/
`380 (0.41 or 41%) in the new therapy group. The
`difference D is 0.02 or 2%, but it is not statistically
`significant (p > 0.50). The standard error of the differ-
`p
`ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
`ence is:
`p1ð1 p1Þ=n1 þ p2ð1 p2Þ=n2
`SED ¼
`
`ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffip
`¼ 0:036

`0:39  0:61=372 þ 0:41  0:59=380
`The
`95%
`confidence
`interval
`for
`is
`D
`D ± Z2a · SED = 0.02 ± 1.96 · 0.036 or 0.050
`to
`0.091 (5.0% to 9.1%). Since this confidence interval lies
`completely within the specified interval for D from 0.1
`to +0.1, the effects of the two therapies can be consid-
`ered equivalent. The situation corresponds to B or C
`in Fig. 5.
`Like in this example the necessary sample size in an
`equivalence trial will often be at least 4· that of a
`corresponding superiority trial. Therefore the necessary
`resources will be larger.
`
`4. Non-inferiority trials
`
`The non-inferiority trial, which is related to the equiv-
`alence trial, aims not at showing equivalence but only at
`showing that the new therapy is no worse than the refer-
`ence therapy. Thus the non-inferiority trial is designed to
`demonstrate that the difference in effect (new therapy–
`control therapy) should be no less than D. Non-inferi-
`ority of the new therapy would then be demonstrated if
`the lower confidence limit for the difference in effect
`between the therapies turns out to lie above D. The
`position of the upper confidence limit is not of primary
`interest. Thus the non-inferiority trial is designed as a
`one-sided trial. For that reason the necessary number
`of patients would be less than for a corresponding equiv-
`alence trial as illustrated in the following example.
`
`Example 3. We want to conduct the trial described in
`Example 2 not as an equivalence trial but as a non-
`
`Mylan Exhibit 1059
`Mylan v. Regeneron, IPR2022-01226
`Page 6
`
`

`

`E. Christensen / Journal of Hepatology 46 (2007) 947–954
`
`953
`
`inferiority trial. Thus the trial should be one-sided
`instead of the two-sided equivalence trial. The only
`difference would be that one should use Za instead of
`Z2a. For a = 0.05 one gets Za = 1.64 (Table 1, right
`side). Thus we obtain:
`N ¼ ð1:64 þ 0:84Þ2  ð0:4  0:6 þ 0:4  0:6Þ=0:12
`¼ 6; 15  0:48=0:01 ¼ 295:
`Therefore the necessary number of patients (2N) would
`be 590 patients.
`The trial was conducted and the result of the trial was
`as follows: Sustained virologic response was found in 114
`of 292 (0.39 or 39%) in the control group and in 125/298
`(0.42 or 42%) in the new therapy group. The difference D
`is 0.03 or 3%, but it is not statistically significant
`(p > 0.50). The standard error of the difference is:
`p
`ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
`p1ð1 p1Þ=n1 þ p2ð1 p2Þ=n2
`SED ¼
`ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

`0:39  0:61=292 þ 0:42  0:58=298
`
`¼ 0:040
`
`p
`
`The lower one-sided 95% confidence limit would be D –
`Za · SED = 0.03 1.64 · 0.040 = 0.036
`(3.6%).
`Since the lower confidence limit lies above the specified lim-
`it for D of0.1, the effect of the new therapy is not inferior
`to the control therapy. If the two-sided 95% confidence
`interval (which is recommended by some even for the
`non-inferiority trial [18]) is being estimated, one obtains
`D ± Z2a · SED = 0.03 ± 1.96 · 0.040 or 0.048 to 0.108
`(4.8% to 10.8%). The lower confidence limit still lies
`above 0.1, but the upper confidence limit lies above 0.1
`(the upper limit for equivalence – see Example 2). There-
`fore the new therapy may be slightly better than the control
`therapy. The type 2 error risk of having overlooked an ef-
`fect of 0.1 or 10% could be estimated as follows:
`Zb = (D D)/SED = (0.10 –0.03)/0.04 = 1.75. Using
`Table 1 (right part) with interpolation b becomes 0.04, a
`rather small risk.
`
`5. Other factors
`
`Since the aim of an equivalence or non-inferiority
`trial is to establish equivalence between the therapies
`or non-inferiority of the new therapy, there is not the
`same incentive to remove factors likely to obscure any
`difference between the treatments as in a superiority
`trial. Thus in some cases finding of equivalence may
`be due to trial deficiencies like small sample size, lack
`of double blinding, lack of concealed random allocation,
`incorrect doses of drugs, effects of concomitant medicine
`or spontaneous recovery of patients without medical
`intervention [19].
`An equivalence or non-inferiority trial should mirror
`as closely as possible the methods used in previous superi-
`ority trials assessing the effect of the control therapy ver-
`
`sus placebo. In particular it is important that the inclusion
`and exclusion criteria, which define the patient popula-
`tion, the blinding, the randomization, the dosing schedule
`of the standard treatment, the use of concomitant medica-
`tion and other interventions, the primary response vari-
`able and its schedule of measurements, are the same as
`in the preceding superiority trials, which have evaluated
`the reference therapy being used in the comparison. In
`addition one should pay attention to patient compliance,
`the response during any run in period, and the scale of
`patient losses and the reasons for them. These should
`not be different from previous superiority trials.
`
`6. Analysis: both ‘‘intention to treat’’ and ‘‘per protocol’’
`
`An important point in the analysis of equivalence and
`non-inferiority trials concerns whether to use an ‘‘inten-
`tion to treat’’ or a ‘‘per protocol’’ analysis. In a superi-
`ority trial, where the aim is to decide if two treatments
`are different, an intention to treat analysis is generally
`conservative: the inclusion of protocol violators and
`withdrawals will usually tend to make the results from
`the two treatment groups more similar. However, for
`an equivalence or non-inferiority trial this effect is no
`longer conservative: any blurring of
`the difference
`between the treatment groups will increase the chance
`of finding equivalence or non-inferiority.
`A per protocol analysis compares patients according
`to the treatment actually received and includes only
`those patients who satisfied the entry criteria and prop-
`erly followed the protocol. In a superiority trial this
`approach may tend to enhance any difference between
`the treatments rather than diminishing it, because unin-
`formative ‘‘noise’’ is removed. In an equivalence or non-
`inferiority trial both types of analysis should be per-
`formed and equivalence or non-inferiority can only be
`established if both analyses support it. To ensure the
`best possible quality of the analysis it is important to
`collect complete follow-up data on all randomized
`patients as per protocol, irrespective of whether they
`are subsequently found to have failed entry criteria,
`withdraw from trial medication prematurely, or violate
`the protocol
`in some other way [20]. Such a rigid
`approach to data collection allows maximum flexibility
`during later analysis and hence provides a more robust
`basis for decisions.
`The most common problem in reported equivalence
`or non-inferiority studies is that they are planned and
`analyzed as if they were superiority trials and that the
`lack of a statistically significant difference is taken as
`proof of equivalence [7–10]. Thus there seems to be a
`need for a better knowledge of how equivalence and
`non-inferiority studies should be planned, performed,
`analyzed and reported.
`
`Mylan Exhibit 1059
`Mylan v. Regeneron, IPR2022-01226
`Page 7
`
`

`

`954
`
`E. Christensen / Journal of Hepatology 46 (2007) 947–954
`
`7. Ensuring a high quality
`
`References
`
`A recent paper reported on the quality of reporting of
`published equivalence trials [21]. They found that some
`trials had been planned as superiority trials but were
`reported as if they had been equivalence trials after fail-
`ure to demonstrate superiority, since they did not
`include an equivalence margin. They also found that
`one-third of the reports which included a sample size
`calculation had omitted elements needed to reproduce
`it; one third of the reports described a confidence inter-
`val whose size was not

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket