`Distinct mutational processes shapeselection of
`MHCclass| and classIl mutations across primary
`and metastatic tumors
`
`Graphical abstract
`
`>30 cancer types
`
`Authors
`
`Michael B. Mumphrey, Noshad Hosseini,
`Abhijit Parolia, ..., Malini Raghavan,
`oo,
`own
`Arul Chinnaiyan, Marcin Cieslik
`
`Mutational
`processes
`APOBEC/AID
`MSI
`
`TMB.
`
`refractory
`
`ae
`_ Mutation calling
`personalize4
`“ey references
`
`camn
`
`Loss of High dN/dS
`function
`ratio
`
`Highlights
`e Hapster detects MHCclass | and II mutations with high
`sensitivity and specificity
`
`e MHC genesare among the most recurrently mutated genes
`pan-cancer
`
`e Tumor mutation burden and mutational processes shape the
`spectrum of MHC mutations
`
`e MHC missense mutationsarelikely loss of function,
`disrupting B2M and antigen binding
`
`e Cell?ress
`
`Mumphrey et al., 2023, Cell Reports 42, 112965
`August29, 2023 © 2023 The Authors.
`https://doi.org/10.1016/j.celrep.2023.11 2965
`
`®o
`
`er
`
`1
`
`JHU 2034
`Merck Sharp v. Johns Hopkins
`IPR2024-00649
`
`variants
`
`Correspondence
`mcieslik@med.umich.edu
`In brief
`Using the personalized mutation caller
`Hapster, Mumphreyet al. report a pan-
`canceranalysis of positive selection for
`sovenenensncoroooe peseesoosees
`
`MHC-V/II 1,079 MHC4&=|Mutation MHCclass! and class II mutations across
`
`Compendium,
`990 MHC-II eo,
`\
`primary and metastatic cancers. Their
`mutations
`f
`S
`.
`.
`.
`analysis provides evidencefor the
`enrichmentof inactivating MHC
`\
`|
`MHC-I/Il amongthe most
`
`recurrent driver genes|Positional Enrichment||mutationsin select cancers,as well as the
`Statistical
`;
`recurrence
`in protein
`mutational processes responsible.
`analysis
`» nue
`functional
`:
`domains
`
`\ | [
`d
`6
`
`;
`
` \
`
`1
`
`JHU 2034
`Merck Sharp v. Johns Hopkins
`IPR2024-00649
`
`
`
`Cell Reports
`
`© CelPress
`
`OPEN ACCESS
`
`Distinct mutational processes shape selection
`of MHCclass | and class Il mutations
`across primary and metastatic tumors
`
`Michael B. Mumphrey,? Noshad Hosseini,? Abhijit Parolia,’* Jie Geng,* Weiping Zou,**®-* Malini Raghavan,
`Arul Chinnaiyan,'-2:*-5-7.8.9 and Marcin Cieslik':2:34-9.10*
`Departmentof Pathology, University of Michigan, Ann Arbor, MI 48109, USA
`2Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
`$Michigan Center for Translational Pathology, University of Michigan, Ann Arbor, MI 48109, USA
`4Departmentof Microbiology & Immunology, University of Michigan, Ann Arbor, MI 48109, USA
`5Departmentof Urology, University of Michigan, Ann Arbor, MI 48109, USA
`®Centerof Excellence for Cancer Immunology and Immunotherapy, University of Michigan, Ann Arbor, MI 48109, USA
`7Howard Hughes MedicalInstitute, Ann Arbor, MI 48109, USA
`8University of Michigan Rogel Cancer Center, Ann Arbor, MI 48109, USA
`8Senior author
`10lead contact
`*Correspondence: mcieslik@med.umich.edu
`https://doi.org/10.1016/.celrep.2023.112965
`
`SUMMARY
`
`Disruption of antigen presentation via loss of major histocompatibility complex (MHC) expressionis a strat-
`egy wherebycancercells escape immunesurveillance and develop resistance to immunotherapy. Here, we
`develop the personalized genomics algorithm Hapster and accurately call somatic mutations within the MHC
`genes of 10,001 primary and 2,199 metastatic tumors, creating acatalog of 1,663 non-synonymous mutations
`that provide key insights into MHC mutagenesis. We find that MHC class | genes are among the most
`frequently mutated genes in both primary and metastatic tumors, while MHC class II mutations are more
`restricted. Recurrent deleterious mutations are found within haplotype- and cancer-type-specific hotspots
`associated with distinct mutational processes. Functional classification of MHC residues reveals significant
`positive selection for mutations disruptive to the B2M, peptide, and T cell binding interfaces, as well as to
`MHC chaperones.
`
`INTRODUCTION
`
`The immune system is capable of identifying and eliminating
`cancercells via CD8* T cell-mediated cytotoxicity.' To avoid
`this destruction, successful cancers often evolve strategies to
`disruptT cell immunity, such as overexpression of the immuno-
`suppressive PD-L1? or reduced expression of key proteins such
`as the major histocompatibility complex (MHC) class | mole-
`cules.° Research into how tumors escape T cell surveillance
`has led to immunotherapies that restore cancer immunity in
`select patients.*° However, even the most promising immuno-
`therapies still only provide a clinical benefit
`in a minority
`of cases.'° Understanding the mechanisms that
`lead to
`primary and acquired resistance to T cell-based immunother-
`apies will be critical for the continued improvement of patient
`outcomes.
`
`T cells are able to identify malignant cells via the presence of
`mutant peptides known as neoantigens.' In orderto detect these
`neoantigens, T cells require the peptides to be presented at the
`cell surface by the MHC. The MHCclass| proteins present neo-
`antigens to cytotoxic CD8* T cells' and,as a result, are directly
`
`involved in the destruction of malignant cells. Consistent with
`this role as a tumor suppressor,correlations between decreased
`MHCclass | expression and poor outcomes have beenrepeat-
`edly observed across many cancers.''~'® The MHC class II pro-
`teins present neoantigens to CD4* T cells and play an important
`role in tumor suppression.'*'® Their expression can be induced
`in most cell types,”° including cancer cells,?'~7° and MHC class
`\l-restricted neoantigen vaccines have been showntotrigger an
`anti-tumor immune response,” suggesting that cancer-cell-
`specific MHCclassIl expression may also play a tumor-suppres-
`sor role.
`Identifying the mechanisms that lead to loss of the MHC will be
`key to understanding resistance to T cell-based immunother-
`apies. However, genetic studies of the MHC are hindered by
`the extreme polymorphism of the MHC genes.”° To address
`this issue, we created Hapster, a generalized algorithm that con-
`structs personalized reference sequences for alignment and mu-
`tation calling within polymorphic genes. The high sensitivity and
`specificity of the identified mutations enabled us to study the
`mutational and evolutionary processes driving somatic MHC
`loss in primary and metastatic tumors.
`
`2
`
`Cell R
`This is an open accessarticle under the C'
`
`1
`42, 112965, August 29, 2023 © 2023 The Authors.
`BYlicense (http://creativecommons.org/licenses/by/4.0/).
`
`2
`
`2
`
`
`
`ll
`
`OPEN ACCESS
`
`Resource
`
`Figure 1. Hapster algorithm and validation
`(A) Simplified overview of the Hapster algorithm. For each gene, Hapster infers optimal reference sequences from normal sequencing data, realigns to these
`personalized references, and calls mutations. For a complete description, see STAR Methods.
`(B) Germline variants identified from 69 WES samples from the 1000 Genomes Project relative to either the standard reference GRCh38 or to dynamically selected
`references using 8 different haplotypers. A perfect reference sequence should produce 0 apparent germline variants.
`(C) Fraction of simulated insertions or deletions (indels) and SNVs that were either called and passed all filters, were called and filtered by either Hapster or
`Mutect2, or were never called. Shown here are mutations simulated at a VAF of 0.45 and a coverage of 1003.
`(D) QQ plot for observed RNA seq read support for HLA variants, assuming read support is only due to sequencing error according to a beta binomial model.
`Variants were identified by Hapster alone (red) or by both Hapster and Polysolver (blue) from WES data. A comparison is shown to randomly generated alternate
`bases (gray), which are only supported by noisy reads and follow the null model (diagonal black line).
`(E) Boxplot showing the number of private germline variants observed per tumor in those cases with or without somatic MHC mutations. Wilcoxon rank sum test p
`values with BH correction.
`
`(legend continued on next page)
`
`2 Cell Reports 42, 112965, August 29, 2023
`
`3
`
`
`
`Resource
`
`RESULTS
`
`Hapster allows for more sensitive and specific somatic
`mutation calls in MHC genes
`Hapster is a complete mutation calling pipeline that (1) selects
`personalized reference haplotype sequences,
`(2) prunes
`contaminant and misaligned reads, and (3) detects26,27 and fil-
`ters variants (Figure 1A). For the first function, in principle, any
`existing human leukocyte antigen (HLA) haplotyper28–34 could
`be used to identify HLA haplotype sequences. However, in prac-
`tice, existing haplotypers often report HLA types for which only
`the sequences for exons 2 and 3 are known35 (Figure S1A).
`This is insufficient for somatic mutation calling where a full-length
`sequence is needed to call variants in all exons/introns. We
`therefore developed a generalized haplotyping algorithm to re-
`turn full-length sequences for MHC class I and class II genes.
`For the second and third functions, we developed MHC-specific
`strategies for pruning of contaminant and misaligned reads orig-
`inating from other MHC genes and pseudogenes, as well as for
`the identification of false positive somatic mutation calls. A
`detailed overview of the Hapster pipeline is included (STAR
`Methods).
`To benchmark the haplotype inference portion, we used a set
`of 69 whole-exome sequencing (WES) samples from the 1000
`Genomes Project with reported haplotype calls.33,36 To compare
`Hapster with other methods, we called germline variants in WES
`data relative to each haplotyper’s inferred haplotype for each in-
`dividual. Using sequencing data as ground truth, a perfectly
`identified reference would lead to no germline variants being
`identified. We observe that, relative to the standard GRCh38
`reference, there is a median of 17–38 germline variants observed
`per allele. All tested haplotypers improve upon this, with each
`having a median of 0 and a mean of <0.5 observed germline mu-
`tation per allele (Figure 1B), similar to the genome-wide average
`of 1 variant per kilobase.37 In a larger population of 10,001
`normal tissue samples from TCGA, Hapster identified 0–3 germ-
`line variants per allele (Table S1).
`To assess Hapster’s sensitivity, we simulated 200 synthetic
`MHC haplotypes with a random mutation, followed by simulated
`WES at depths ranging from 53 to 1003 and variant allele frac-
`tions (VAFs) of 0.025–0.45. Of the 200 simulated mutations, 94%
`(187/200) were successfully identified (Figure 1C) at 1003
`coverage and a VAF of 0.45. Following filtering, 18 of these calls
`were removed by either Hapster’s or Mutect2’s filters, giving a
`final sensitivity of 85% (169/200) for high-coverage clonal vari-
`ants. Inspection of the 13 variants that failed to be called showed
`that 12 were in regions of low coverage following probe capture
`(Table S2). As such, they are false negatives due to the capture
`kit design rather than to Hapster’s algorithm. When looking at re-
`sults over the range of coverages and VAFs, we see that as
`coverage and VAF decrease, sensitivity decreases as expected
`
`ll
`
`OPEN ACCESS
`
`due to lower read support for variants (Figure S1D). A compari-
`son of simulated vs. observed VAFs for each mutation call shows
`that at most simulated VAFs, Hapster produces calls with slightly
`lower observed VAFs, likely due to a slight loss of reads following
`read filtering (Figure S1E).
`To assess specificity, we called somatic mutations in all 450
`samples from the TCGA head and neck squamous cell carci-
`noma (HNSC) cohort with tumor and normal
`labels swapped
`such that no somatic variants should be identified. In 9 cases,
`an apparent somatic variant was identified that passed all filters.
`Assuming all 9 calls are false positives gives a specificity of 98%
`(441/450).
`To assess Hapster’s accuracy using an orthogonal
`sequencing technology, we used Hapster to call somatic muta-
`tions in WES data and then determined if these same mutations
`were supported by paired RNA sequencing (RNA-seq) data. Es-
`tablished RNA-seq validation methods are not ideal, as they rely
`on alignment of reads to a reference, which would be inappro-
`priate in validating Hapster. We therefore developed an orthog-
`onal alignment-free kmer-based approach to determine if the
`reads support variants in the RNA based on a beta-binomial
`model of sequencer error (STAR Methods), avoiding reference
`selection or alignment biases. Of the 80 variants in the WES
`data, 72 had high enough coverage in the RNA-seq data to un-
`dergo validation. Of these, 63 variants (88%) had read support
`significantly exceeding the null model
`(p < 0.05 [Benjamini-
`Hochberg adjusted (BH-adj.)], Figure 1D), and 4 (5%) were trun-
`cating with evidence of nonsense-mediated decay (Figure S1B).
`This leaves only 5 variants (7%) without RNA evidence, poten-
`tially due to the limitations of our model, low tumor cellularity,
`loss of heterozygosity (LOH), or transcriptional silencing.38
`For a second orthogonal validation, we performed Sanger
`sequencing on tumors from MI-ONCOSEQ39 with sufficient
`DNA or tissue samples. All 14 candidate variants called by Hap-
`ster were detected in the Sanger chromatograms from tumor
`specimens while being absent in traces from patient-matched
`normal tissues (Figure S1C). Addressing the possibility of germ-
`line variants being miscalled as somatic due to poor reference
`selections, we found no evidence of enrichment of somatic mu-
`tations in cases with higher numbers of private germline variants
`(Figure 1E).
`Finally, we applied Hapster to a larger set of 7,746 samples
`from TCGA that have previously reported mutations called by
`both the Broad Genomic Data Analysis Center (GDAC) standard
`reference-based pipeline and the Polysolver personalized pipe-
`line.34,40 We found that when calling mutations in the MHC class I
`genes, Hapster detected over twice as many non-synonymous
`mutations as the GDAC pipeline and 36% more than Polysolver
`(Table S3; Figures 1F and S2A). Next we examined variant allele
`frequency (VAF) distributions. Hapster tended to report slightly
`higher VAFs (Figure S2B) due to preserving more variant reads
`
`(F) Comparison of non synonymous mutation calls for the MHC class I genes between the GDAC pipeline, Polysolver, and Hapster across various cancer types
`from TCGA. Lightly shaded bars represent possible false positives.
`(G) Comparison of mutational consequences for variants called by the standard GDAC pipeline, Polysolver, or Hapster in the MHC genes vs. oncogenes, tumor
`suppressors, and neutral gene mutations from TCGA. Oncogenes (OGs): KRAS, PIK3CA, IDH1, CTNNB1, FOXA1, BRAF, AKT1, EGFR. Tumor suppressors (TSs):
`TP53, RB1, PTEN, APC, BRCA2, VHL. Neutral genes: all others.
`See also Figures S1 and S2 and Tables S1, S2, and S3.
`
`Cell Reports 42, 112965, August 29, 2023 3
`
`4
`
`
`
`ll
`
`OPEN ACCESS
`
`low VAF mutations.
`(Figure S2C) rather than failing to call
`Conversely, variants exclusive to Hapster tended to be low
`VAF mutations, supporting its higher sensitivity (Figure S2D).
`We next performed an exhaustive search for potential false pos-
`itives resulting from misalignment of sequencing reads origi-
`nating from other homologous MHC genes or pseudogenes.
`We found that only 6% of Hapster’s non-synonymous
`calls matched known sequences in any other MHC gene, a
`rate significantly lower than that of Polysolver (15%, Fisher’s
`exact test p < 1e 10, BH-adj.) but similar to the GDAC pipeline
`(4%, Fisher’s exact test p = 1, BH-adj.) (Figures 1F and S2E;
`Table S3). An analysis of synonymous mutation calls shows
`the apparently recurrent synonymous variants p.T214T and
`p.A269A, which are identified as somatic mutations by Poly-
`solver (Figure S2F). These mutations are unlikely to be under
`extreme positive selection but have sequences exactly matching
`non-classical MHC class I genes, i.e., are likely due to alignment
`errors from HLA-E, HLA-F, or HLA pseudogenes. We next
`compared the distribution of functional consequences of HLA
`mutations called by each of the approaches. For both Hapster
`and the GDAC pipeline, synonymous mutation calls were under-
`represented when compared to neutral genes, consistent with
`what would be expected for a potential driver gene (Figure 1G).
`In contrast, we found that Polysolver had an over-representation
`of synonymous calls, many of which can likely be attributed to
`misaligned reads originating from non-classical MHC class I
`genes (Figure S2F; Table S3).
`
`Pan-cancer compendium of MHC class I and class II
`mutations
`To comprehensively characterize MHC class I and class II muta-
`tion rates in human cancer, we analyzed 10,001 tumors across
`35 cancer types from TCGA and 2,199 tumors across 24 cancer
`types from MI-ONCOSEQ39 (Table S4), for a total compendium
`of 2,069 MHC class I and class II mutations (Figure 2A;
`Table S5). Samples from TCGA are mainly primary tumors, with
`the exception of the melanoma cohort (skin cutaneous melanoma
`[SKCM]), which consists only of metastatic samples. Microsatellite
`unstable (MSI) tumors are immunologically distinct due to their
`significantly higher neoantigen burden,41 and we have therefore
`separated them from their microsatellite stable (MSS) counterparts
`within the colon (colon adenocarcinoma [COAD]), stomach (stom-
`ach adenocarcinoma [STAD]), and endometrial (uterine corpus
`endometrial carcinoma [UCEC]) TCGA cohorts. While some other
`cancers also have distinct subtypes, such as breast cancer
`(BRCA) estrogen receptor (ER)+/
`and HNSC human papilloma-
`virus (HPV)+/
`, no significant difference in MHC mutation rates
`was observed between subtypes (BH-adj. chi-squared p = 0.27–
`1). These cohorts were therefore not subdivided for further ana-
`lyses. Samples from MI-ONCOSEQ are metastatic/refractory
`and represent a significantly more advanced form of disease
`compared with the corresponding TCGA cohorts, with cases
`generally having received multiple forms of prior systemic therapy
`for their primary cancer, and one or more rounds of therapy for their
`metastatic cancer.
`Mutations were in general distributed uniformly across the gene
`body but occasionally concentrated within prominent hotspots
`(Figures 2A and S3A). We found that for MHC class I, HLA-A and
`
`4 Cell Reports 42, 112965, August 29, 2023
`
`Resource
`
`HLA-B contained significantly more mutations than HLA-C, and
`that for MHC class II, HLA-DRA contained significantly more mu-
`tations than all other MHC class II genes except for HLA-DQB1
`(Figure 2B). Within each HLA gene, no allele was found to bear
`an excess of mutations. In primary tumors, we noted substantial
`variation in both mutational frequency and functional conse-
`quences across tumor types and MHC gene classes (Figure 2C).
`We found non-synonymous MHC class I and class II mutations
`in 10.5% of primary tumors (ranging from 2.7% to 72.5% across
`cancer types) (Figure S3B), with 5.6% (range: 0.2%–62.3%) of pa-
`tients harboring an MHC class I and 5.7% (range: 1.1%–21.7%)
`harboring an MHC class II somatic variant. Consistent with previ-
`ous reports that MSI tumors are under strong pressure to lose
`MHC function,42,43 the COAD-MSI, STAD-MSI, and UCEC-MSI
`cohorts make up 3 of the top 4 cohorts for MHC class I mutations
`(Figures 2D and S3B), with the majority of variants being loss-of-
`function (LOF) frameshifts or stop gains. MHC class II mutations
`were also most prevalent in cancers with high mutation burden
`including MSI tumors and melanoma (Figure 2E). However, LOF
`mutations in the top-mutated cohorts were less frequent and the
`variation in mutation rates across cancer types was lower
`compared with MHC class I (Figures 2E and S3B). Interestingly,
`there was a slight pan-cancer association between both MHC
`class I and class II mutations and immune cell infiltration after
`adjusting for tumor purity (Figure S3C), but further cohort-level
`analysis showed that no individual cancer type had a significant
`association after multiple testing correction.
`
`Prevalence of MHC class I and class II mutations in
`primary vs. metastatic tumors
`The prevalence of MHC mutations in metastatic tumors is un-
`known, a critical gap in knowledge considering the predominant
`use of immunotherapies in this setting and the immunological
`differences between the primary and metastatic tumor microen-
`vironment (TME).39,44–47 Overall, we observed non-synonymous
`MHC class I and class II mutations in 7.6% (range: 3.3%–20%) of
`metastatic/refractory patients, with substantial variation in muta-
`tional frequency and functional consequences between cancer
`types (Figures 2F, 2G, and S3D). To directly compare mutation
`rates between primary and metastatic cancers, we created a
`set of pairings to match TCGA cohorts to MI-ONCOSEQ cohorts
`(Table S6). For 15/17 pairings (88%), there were no significant
`changes in primary vs. metastatic MHC class I or class II muta-
`tion rates. However,
`for prostate and breast cancers, we
`observed a significant increase in MHC class I mutations in met-
`astatic cancers compared with primary cancers (Figure S3E;
`prostate: F(1, 909) = 9.35, p = 0.03; breast: F(1, 1140) = 12.8,
`p = 0.01, BH-adj.). No significant differences were seen in
`MHC class II mutations.
`Overall, these data provide a comprehensive look at MHC
`class I and class II mutations pan-cancer, across both primary
`and metastatic tumors. We find that somatic mutations of
`HLA-A and HLA-B are most common, while HLA-C and MHC
`class II genes are less frequently mutated. While some significant
`differences in MHC mutation rate between primary and metasta-
`tic tumors are noted, the majority of MHC mutations in metasta-
`tic tumors are expected to be already present in the primary
`tumor.
`
`5
`
`
`
`Resource
`
`ll
`
`OPEN ACCESS
`
`Figure 2. Compendium of MHC class I and class II mutations in primary and metastatic tumors
`(A) Distribution of all observed mutations in both primary and metastatic cancers across the coding region of the MHC genes. Binding pocket secondary
`structures are noted above.
`(B) Significant differences in the prevalence of non synonymous mutations and indels of individual MHC class I and class II genes. *p < 0.05; **p < 0.01;
`***p < 0.001; ****p < 0.0001, BH adj. Fisher’s exact test.
`(C) Cohort specific mutation rates for MHC class I and class II genes across all primary and metastatic cancers. Values are scaled to the number of individuals
`within each cohort. Colors represent the fraction of cancers with non synonymous/indel mutations.
`(D G) Cohort summaries of coding region mutations in MHC class I (D and F) and class II (E and G) genes in primary (D and E) and metastatic (F and G) cancers.
`Values are scaled by the number of individuals within each cohort.
`See also Figure S3 and Tables S4 and S5.
`
`Positive selection of non-synonymous MHC somatic
`mutations
`To identify positive selection of functional mutations within the
`MHC genes, we applied CBaSE48 to each primary and metasta-
`tic cohort from TCGA and MI-ONCOSEQ. HLA genes and haplo-
`
`types are codominant, and each allele presents a largely unique
`set of neoantigens.49 Additionally, specific T cell responses are
`often immunodominant and mounted against only a few of
`the presented neoantigens such that the mutation of a single
`HLA allele may result in the complete inability to present an
`
`Cell Reports 42, 112965, August 29, 2023 5
`
`6
`
`
`
`ll
`
`OPEN ACCESS
`
`immunodominant neoantigen. Given this, we treat all MHC class
`I genes (and separately, all MHC class II genes) as one functional
`unit, analogous to multiple genes of a protein complex,50 taking
`into account the increased genomic length of this combined set
`of genes. In primary cancers, CBaSE identified 6 cohorts (COAD-
`MSI, STAD-MSI, diffuse large B cell lymphoma [DLBC], cervical
`squamous cell carcinoma [CESC], HNSC, LUSC) with statisti-
`cally significant evidence for positive selection of non-synony-
`mous variants in the MHC class I genes and 3 cohorts
`(cholangiocarcinoma [CHOL], kidney chromophobe [KICH],
`uveal melanoma [UVM]) for the MHC class II genes (Figure 3A).
`By this measure, the MHC class I genes are tied for 7th and
`the MHC class II genes are tied for the 17th most recurrent driver
`genes pan-cancer as determined by applying CBaSE to all pro-
`tein-coding genes across primary cancers. A similar trend was
`identified in metastatic and refractory cancers with the MHC
`class I genes being mutated in two lymphoma cohorts
`(M-DLBC, M-LYM), making them tied for 6th most recurrent
`pan-cancer driver gene by number of cohorts significantly
`mutated (Figure 3B). As an alternative measure of positive selec-
`tion, we used Fisher’s method to create a combined score (Fpos)
`for the strength of selection across all cohorts (Figures 3C and
`3D). We found that in both primary and metastatic cancers, the
`MHC class I genes scored in the top 0.1% of all protein-coding
`genes according to this metric of positive selection (Figures 3C
`and 3D), and in primary cancers, the MHC class II genes scored
`in the top 0.5% (Figure 3C). Due to the exclusion of MHC class II
`genes from the sequencing panel in a subset of MI-ONCOSEQ
`samples, we were not statistically powered to investigate selec-
`tion of MHC class II genes in metastatic cohorts.
`We next looked at the clonality of mutations within the 6 TCGA
`cohorts (COAD-MSI, STAD-MSI, DLBC, CESC, HNSC, LUSC)
`reported to be significantly mutated by CBaSE. We show that
`the majority of mutations in HLA-A (111/164, 68%) and HLA-B
`(132/179, 74%) within these cohorts have a cancer cell fraction
`(CCF) >0.7, consistent with the variants providing a survival
`advantage followed by a clonal sweep (Figure 3E). In tumors
`with multiple hits, we see that in each of these cohorts, the ma-
`jority (63%–100%) of cases have at least one clonal variant
`(Figures S4A and S4B). When there is at least one leading clonal
`mutation, we see that co-occurring MHC class I mutations are
`also primarily clonal in the DLBC (80%) and HNSC (88%) co-
`horts, while in the remaining cohorts, we observe a mix of clonal
`and subclonal mutations (Figures S4A and S4C). In cohorts
`showing no evidence of positive selection, the proportion of
`clonal mutations were significantly lower in both HLA-A and
`HLA-B (Figure 3E), consistent with these being mostly subclonal
`passenger mutations. In both groups, HLA-C is primarily subclo-
`nal, indicating that mutations in this gene may not provide as
`much survival benefit, consistent with our earlier finding that
`HLA-C is less frequently mutated than HLA-A and HLA-B
`(Figure 2B).
`
`Impact of tumor mutation burden on MHC class I and
`class II mutation frequency
`To investigate the association between tumor mutational burden
`(TMB) and MHC mutations, we compared the local TMB within
`the MHC genes to the genome-wide TMB for each cancer
`
`6 Cell Reports 42, 112965, August 29, 2023
`
`Resource
`
`cohort. As TMB increases, we expect the number of passenger
`mutations in a gene to increase stochastically. However, as TMB
`increases, neoantigen burden also increases, and we would
`expect increased selective pressures for LOF MHC mutations.
`We therefore expect all cancer types to show a positive associ-
`ation between TMB and MHC mutations, but in cohorts with sig-
`nificant evidence of positive selection, this increase should be
`elevated due to the added effect of both TMB- and neoanti-
`gen-induced selective pressures. We show this to be the case,
`with significantly mutated cohorts having a higher local TMB
`within the MHC genes than other cohorts of similar global TMB
`(Figure S4D).
`We originally hypothesized that somatic loss of MHC class II
`should mirror that of MHC class I given that both have been
`shown to promote anti-tumor immune responses. However,
`there was no association at the cohort level between MHC class
`I mutations and MHC class II mutations after controlling for TMB
`(Figures S4E–S4G). While MHC class I mutations appeared to be
`most prevalent in cancer types with high TMB, MHC class II mu-
`tations were frequently increased in low-TMB cancers with few
`MHC class I mutations.
`
`Functional consequences of MHC class I and class II
`mutations
`We next characterized the distributions of mutation functional
`consequences in cohorts with and without evidence of positive
`selection. We constructed an approximately neutral model by
`looking at the distribution of functional consequences across
`2.6 million mutations called from the entirety of the TCGA, the
`overwhelming majority of which are known to be passengers51
`(Figure 3F, ‘‘TCGA’’). MHC class I mutations within cohorts
`showing no evidence of positive selection showed a conse-
`quence distribution nearly identical to that of the neutral model
`(Figure 3F, ‘‘unselected’’), suggesting that these mutations are
`primarily passengers. However, in each of the 8 cancer types
`that did show positive selection, there was a significant difference
`in consequence distributions when compared to the TCGA-
`derived neutral model (chi-squared tests, p < 1e 3 to 1e 16,
`BH-adj.). Consistent with the MHC’s role as a tumor suppressor,
`this deviation was caused by an increase in truncating mutations,
`which accounted for more than 40% of mutations in most co-
`horts, as compared with the expected neutral rate of 12%.
`The DLBC, CESC, and HNSC cohorts all have a high proportion
`of stop gains (46%, 32%, and 28%, respectively) within the
`MHC class I genes, accounting for 56% (60/108) of all observed
`stop gains despite only comprising 8% (792/10,001) of TCGA pa-
`tients (Figure S4H). Notably, frameshift mutations in MHC class II
`were rare even in MSI tumors but were unexpectedly common in
`some MSS tumors including glioblastoma (GBM), ovarian cancer
`(OV), and liver hepatocellular carcinoma (LIHC). These cohorts
`were also depleted of synonymous mutations (Figure 3G).
`Similar to the TCGA DLBC cohort, the refractory M-DLBC
`cohort showed both a high mutation rate and a bias toward trun-
`cating mutations in the MHC class I genes (35%). Other non-
`DLBC refractory lymphomas (M-LYM) had a lower MHC class I
`mutation rate but still had a bias toward truncating mutations
`(65%) (Figures 3F and S4I). The lymphomas alone account for
`52% of stop gains observed across all MI-ONCOSEQ cohorts
`
`7
`
`
`
`Resource
`
`ll
`
`OPEN ACCESS
`
`Figure 3. Evidence for strong positive selection and deleteriousness of MHC somatic mutations
`(A and B) Top 30 genes showing evidence of positive selection in primary (A) or metastatic (B) cancers by CBaSE by number of cohorts with significant evidence.
`(C and D) Comparison of the number of cohorts significantly mutated vs. pan cancer metastatic Fpos for protein coding genes in primary (C) or metastatic
`(D) cancers as measured by CBaSE. Vertical dashed lines show the cutoff for the top 0.5% of genes by Fpos.
`(E) Cancer cell fraction (CCF) of MHC class I variants in TCGA cohorts showing significant evidence of positive selection compared with all other cohorts. Vertical
`line shows 70% CCF, above which mutations are considered clonal. ***p < 0.001, Wilcoxon rank sum test after BH correction.
`(F) Proportion of functional consequences observed in various groups: ‘‘TCGA,’’ 2,600,654 pan cancer mutations from TCGA an approx. neutral model; ‘‘un
`selected,’’ cohorts showing no evidence of positive selection; others, cohorts showing evidence of positive selection (n = 21 96 mutations within positively
`selected cohorts). ‘‘TCGA’’ and ‘‘unselected’’ are average frequencies across cohorts, with error bars showing SEM.
`(G) Functional consequences of MHC class II mutations in select primary cohorts. Mutational consequence distribution of known OGs (KRAS, PIK3CA, IDH1,
`CTNNB1, FOXA1, BRAF, AKT1, EGFR) and TSs (TP53, RB1, PTEN, APC, BRCA2, VHL) are shown.
`(H) Sample level co occurrence of mutations in either the MHC class I or APM genes within positively selected cohorts. Percentage values show percentage of
`mutated samples containing a hit in both the MHC class I and APM.
`
`Cell Reports 42, 112965, August 29, 2023 7
`
`8
`
`
`
`ll
`
`OPEN ACCESS
`
`(11/21) despite containing only 7% of patients. The HNSC,
`CESC, and LUSC cohorts in TCGA are all squamous cell
`carcinomas that correspond to a single cohort M-SQCC within
`MI-ONCOSEQ. Similar
`to what was observed across the
`primary squamous cancers, the pan-squamous M-SQCC (squa-
`mous cell carcinoma) cohort showed an overall elevated muta-
`tion rate and a high rate of LOF mutations when considering
`frameshifts, stop gains, and splice region variants (35%; Fig-
`ure S4J). Metastatic MSI
`tumors are underrepresented in
`MI-ONCOSEQ, preventing any comparison with primary MSI
`tumors. Altogether, these data reveal striking differences in
`mutation frequency and deleteriousness not only across cancer
`types but also between MHC class I and class II genes.
`
`Patterns of mutual exclusivity and independence of
`MHC mutations
`We next looked at the relationships between deleterious mutations
`in the MHC class I genes and the APM (antigen presentation ma-
`chinery)52 (Figure 3H). Other genes linked to MHC class I have
`been identified as cancer driver genes (e.g., beta-2 microglobulin
`[B2M]53), and it has been shown that driver gene



