`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`R E S E A R C H
`Open Access
`The complete genome of Blastobotrys (Arxula)
`adeninivorans LS3 - a yeast of biotechnological
`interest
`Gotthard Kunze1,25*†, Claude Gaillardin2,3†, Małgorzata Czernicka4, Pascal Durrens5, Tiphaine Martin5, Erik Böer1,
`Toni Gabaldón6,7, Jose A Cruz8, Emmanuel Talla9, Christian Marck10, André Goffeau11, Valérie Barbe12,
`Philippe Baret13, Keith Baronian14, Sebastian Beier1, Claudine Bleykasten15, Rüdiger Bode16, Serge Casaregola2,3,
`Laurence Despons15, Cécile Fairhead17, Martin Giersberg1, Przemysław Piotr Gierski18, Urs Hähnel1, Anja Hartmann1,
`Dagmara Jankowska1, Claire Jubin12,19,20, Paul Jung15, Ingrid Lafontaine21, Véronique Leh-Louis15, Marc Lemaire22,
`Marina Marcet-Houben6,7, Martin Mascher1, Guillaume Morel2,3, Guy-Franck Richard21, Jan Riechen1,
`Christine Sacerdot21,23, Anasua Sarkar5, Guilhem Savel5, Joseph Schacherer15, David J Sherman5, Nils Stein1,
`Marie-Laure Straub15, Agnès Thierry21, Anke Trautwein-Schult1, Benoit Vacherie12, Eric Westhof8, Sebastian Worch1,
`Bernard Dujon21, Jean-Luc Souciet15, Patrick Wincker12,19,20, Uwe Scholz1 and Cécile Neuvéglise2,3,24*
`
`Abstract
`
`Background: The industrially important yeast Blastobotrys (Arxula) adeninivorans is an asexual hemiascomycete
`phylogenetically very distant from Saccharomyces cerevisiae. Its unusual metabolic flexibility allows it to use a wide
`range of carbon and nitrogen sources, while being thermotolerant, xerotolerant and osmotolerant.
`Results: The sequencing of strain LS3 revealed that the nuclear genome of A. adeninivorans is 11.8 Mb long and
`consists of four chromosomes with regional centromeres. Its closest sequenced relative is Yarrowia lipolytica,
`although mean conservation of orthologs is low. With 914 introns within 6116 genes, A. adeninivorans is one of the
`most intron-rich hemiascomycetes sequenced to date. Several large species-specific families appear to result from
`multiple rounds of segmental duplications of tandem gene arrays, a novel mechanism not yet described in yeasts.
`An analysis of the genome and its transcriptome revealed enzymes with biotechnological potential, such as two
`extracellular tannases (Atan1p and Atan2p) of the tannic-acid catabolic route, and a new pathway for the
`assimilation of n-butanol via butyric aldehyde and butyric acid.
`Conclusions: The high-quality genome of this species that diverged early in Saccharomycotina will allow further
`fundamental studies on comparative genomics, evolution and phylogenetics. Protein components of different
`pathways for carbon and nitrogen source utilization were identified, which so far has remained unexplored in yeast,
`offering clues for further biotechnological developments. In the course of identifying alternative microorganisms for
`biotechnological interest, A. adeninivorans has already proved its strengthened competitiveness as a promising cell
`factory for many more applications.
`Keywords: Yeast, Genome, Biotechnology, Tannic acid, n-butanol, Metabolism
`
`* Correspondence: kunzeg@ipk-gatersleben.de; ncecile@grignon.inra.fr
`†Equal contributors
`1Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Corrensstr.
`3, Gatersleben D-06466, Germany
`2AgroParisTech, Micalis UMR 1319, CBAI, Thiverval-Grignon F-78850, France
`Full list of author information is available at the end of the article
`
`© 2014 Kunze et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
`Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
`reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain
`Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,
`unless otherwise stated.
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 1 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 2 of 15
`
`Background
`This paper discusses the sequencing of the genome of
`Arxula adeninivorans, a yeast of biotechnological interest.
`This species is currently exploited as biocatalyst for the syn-
`thesis of various biotechnological products such as tannases
`[1], 1-(S)-phenylethanol [2] or β-D-galactopyranoside [3],
`for the production of food with low purine content [4], and
`for the detection of estrogenic activity in various aqueous
`media [5,6]. It is also used as a host for the production of
`recombinant proteins, and as a donor for genes encoding
`valuable products [7,8]. Also developed as a microbial fuel
`cell, A. adeninivorans is shown to have a higher power out-
`put than Saccharomyces cerevisiae due to the production of
`an extracellular redox molecule [9].
`This species was first described by Middelhoven
`et al. [10] who isolated a yeast strain from soil and des-
`ignated it as Trichosporon adeninovorans CBS 8244T.
`This strain was found to exhibit unusual biochemical
`activities,
`including the ability to assimilate a wide
`range of amines, adenine and several other purine
`compounds as a sole energy and carbon source. A sec-
`ond wild-type isolate (strain LS3 (PAR-4)) with charac-
`teristics similar to CBS 8244T was selected from wood
`hydrolysates in Siberia, and additional strains were
`later isolated from chopped maize silage or humus-
`rich soil. A new genus name Arxula Van der Walt,
`Smith & Yamada (candidaceae) was proposed for all of
`these strains [11,12]. No sexual reproduction has been
`observed in any of these strains, showing that they are
`all anamorphic ascomycetes. They also share common
`properties, such as nitrate assimilation and xerotoler-
`ance [13].
`Kurtzmann and Robnett [14] revisited the phylogeny
`of yeasts and deduced that Arxula is a member of the
`Blastobotrys genus that contains both anamorphic and
`ascosporic species. Recent classifications consider this
`taxon as basal to the hemiascomycete tree in a region
`where genomic data are available for few other species
`[15]. This sequencing bias remains despite the number
`of recent publications of yeast genome sequences. For
`instance, Ogataea angusta (Hansenula polymorpha),
`Komagataella (Pichia) pastoris, Dekkera bruxellensis
`or more recently Kuraichia capsulata, use the basal
`yeast species Yarrowia lipolytica, which is the closest
`one of A. adeninivorans, as a single outgroup [16-19].
`Thus, sequencing of the Blastobotrys (Arxula) adeni-
`nivorans genome was of interest in order to generate
`an additional
`landmark in the basal portion of the
`hemiascomycete tree and possibly resolve phylogenetic
`relationships among basal species. In addition, the se-
`quence provides biotechnologists with complete infor-
`mation on the gene content of this species for which
`only 40 different protein entries are currently recorded
`in databases.
`
`Results
`Genome architecture and main non-coding
`genetic elements
`The A. adeninivorans strain LS3 was selected because of
`its established biotechnological use [20]. Both mitochon-
`drial and nuclear genomes were sequenced using the
`Sanger and 454 pyrosequencing approaches with different
`shotgun, plasmid and BAC libraries (Additional file 1).
`The circular mapping mitochondrial genome has a final
`size of 31,662 bp. It encodes 24 tRNA genes, 15 protein-
`coding genes including the seven NADH: ubiquinone de-
`hydrogenase subunits of complex I, the genes encoding
`the RNA component of RNase P and the two subunits of
`the mitochondrial ribosomal RNA, as expected from the
`phylogenetic position of this species. All of these genes are
`transcribed from the same DNA strand except for the
`tRNA-Cys gene (Additional file 2).
`After directed finishing phases, the 11.8 Mb final assem-
`bly of the nuclear genome resulted in four contigs corre-
`sponding to the four chromosomes Arad1A, Arad1B,
`Arad1C and Arad1D, of 1,659,397, 2,016,785, 3,827,910,
`and 4,300,524 nt, respectively, as predicted from previous
`pulsed-field gel electrophoresis analyses [21] (Figure 1).
`K. pastoris and Y. lipolytica have four and six chromo-
`somes respectively, while an average of eight and sixteen
`chromosomes is observed in protoploid and post-whole
`genome duplication species [22]. This may suggest a whole
`genome duplication event during early hemiascomycete
`evolution although there is presently no other evidence to
`support this hypothesis [23]. The four contigs contain
`no internal gaps and lack only terminal repeats at the telo-
`meres. There is a single rDNA cluster, located approxi-
`mately 75 kb upstream of
`the chromosome D right
`subtelomere. Based on 454 read counts, there are about 35
`to 40 tandem repeats at this locus of the 18S, 5.8S and 26S
`rRNA genes, the latter housing a 411-bp group-IC, self-
`splicing intron [24]. We have left two copies of the rDNA
`units in Arad1D flanked by two artificial gaps of 874 “N”.
`As in Y. lipolytica, the 5S rRNA genes are not included
`in the rDNA repeat, but occur as 30 copies dispersed
`throughout the genome (Table 1). A set of 46 tDNAs
`encoded by 147 tRNA genes was identified and confirmed
`that A. adeninivorans follows the regular eukaryotic-type
`sparing rules to read CTY Leu and CGA Arg codons [25].
`Forty seven genes encoding snRNAs or snoRNAs were
`identified: the small nuclear RNAs (U1, U2, U4, U5 and
`U6), the RNA components of the RNase P, and of the
`signal recognition particle, as well as 14 H/ACA and 33
`file 3). Additionally,
`C/D snoRNAs (Additional
`three
`thiamine pyrophosphate (TPP) riboswitch sequence candi-
`dates were found in the 5′ region from homologs of S. cere-
`visiae THI4 (YGR144W), UGA4 (YDL210W) and DUR3
`(YHL016C), namely ARAD1R43560g, ARAD1D08074g
`and ARAD1B12386g; they show a remarkable conservation
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 2 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 3 of 15
`
`Figure 1 Circos map of the complete nuclear genome of A. adeninivorans LS3. Chromosome structure (the outermost circle - circle 1):
`presumed centromeric positions are indicated by black bands and black triangles outside the circle, and tRNA and rRNA genes by green and
`orange bands, respectively. Genes (circle 2): density of genes in the filtered gene set across the genome, from a gene count per 15 kb sliding
`window at 5 kb intervals. Repeat content (circle 3): for creating k-mer density ring, k-mers with length = 20 in whole genome using jellyfish
`program v. 1.1.1 (http://www.cbcb.umd.edu) were counted, a position map of k-mer count was created, k-mers counted in blocks of 3 kb were
`divided by 3,000 and the data was plotted using Circos’s heatmap. 454 reads mapped to chromosomes (circle 4): density of 454 reads mapped to
`chromosomes, from a 454 read count per 9 kb sliding window at 3 kb intervals. Underlined blocks indicate alignment in the reverse strand. In the
`centre of the Circos map the phylogenetic relationship of A. adeninivorans is presented as inferred by gene tree parsimony analysis of the complete
`A. adeninivorans phylome. k-mer, tuple of length k.
`
`of known structural domains and sequence motifs [26].
`A single transposable element was identified on chromo-
`some B (Taa3, ARAD1B13860t) that belongs to the
`Gypsy superfamily of Long Terminal Repeat (LTR) ret-
`rotransposons with the two gag and pol open reading
`frames separated by a minus 1 frameshift as seen in the
`homologous element Tyl6 of Y.
`lipolytica [27]. The
`
`single copy of Taa3 was found 13 bp upstream of a
`tRNA gene, suggesting a possible specificity of insertion
`[28]. Only three relics of solo LTRs were identified in
`the genome, which implies that Taa3 has low activity.
`Putative centromeres were identified within one region
`per chromosome with a conspicuous G + C (Guanine +
`cytosine) bias defining approximately 6 kb G + C troughs,
`
`Table 1 General features of A. adeninivorans LS3 nuclear genome
`Chromosome
`CDS
`G + C % Coding %
`G + C % Mean Size (nt)
`48.2
`73.2
`49.6
`1395
`
`#
`871
`
`Size
`1659397
`
`Arad1A
`
`Arad1B
`
`2016785
`
`Arad1C
`
`3827910
`
`Arad1D
`
`4300524
`
`48.4
`
`48.0
`
`48.1
`
`Total
`
`11804616
`
`48.1
`
`72.6
`
`75.8
`
`73.6
`
`74.1
`
`1051
`
`1991
`
`2203
`
`49.8
`
`49.2
`
`49.3
`
`6116
`
`49.4
`
`1394
`
`1457
`
`1437
`
`1430
`
`CDS, coding DNA sequence; G, guanine, C, cytosine; ncRNA, non coding RNA.
`
`Pseudo-genes
`
`i-genes
`
`Introns
`
`tRNA 5S rRNA ncRNA
`
`3
`
`5
`
`11
`
`14
`
`33
`
`84
`
`109
`
`260
`
`250
`
`703
`
`106
`
`135
`
`343
`
`330
`
`914
`
`13
`
`31
`
`54
`
`49
`
`147
`
`4
`
`8
`
`6
`
`12
`
`30
`
`9
`
`5
`
`16
`
`15
`
`45
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 3 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 4 of 15
`
`with a G + C content of 31 to 33% as compared to 48% for
`the whole genome. Like those of Y. lipolytica, they share
`features of both regional centromeres found in yeasts of
`the CTG group, and of point centromeres characteristic of
`Saccharomycetaceae [29] (Additional file 4).
`
`Protein-coding genes, pseudogenes, introns
`A total of 6,116 protein-coding genes and 33 pseudogenes
`were identified. This is slightly less than reported for Y.
`lipolytica or Debaryomyces hansenii, but significantly more
`than for the Saccharomycetaceae species (Table 2). The fre-
`quency of pseudogenes is one of the lowest reported in
`hemiascomycetes, while gene density is one of the highest.
`A total of 4,815 (78.7%) genes were assigned to gene
`ontology (GO) terms: 3,853 genes to molecular functions,
`2,626 to cellular components and 3,308 to biological pro-
`cesses. In the biological processes, the largest fraction of
`genes, 1,351 (22.1%), was assigned to metabolism, while in
`the molecular functions the largest category was repre-
`sented by genes encoding catalytic activities. The GO slim
`categories of A. adeninivorans are presented in Additional
`file 5. InterPro domains were detected in 5,147 (84.2%)
`proteins corresponding to 459 distinct Pfam domains. A
`secretion signal peptide of type I or type II was predicted
`including Atan1p and
`in 957 (15.6%) gene products,
`Alip1p that were previously characterized experimentally
`by N-terminal sequencing and mass spectrometry (MS)
`
`analysis [32,33]. Transmembrane helices were found in
`1,271 (20.8%) proteins. An Enzyme Commission (EC) num-
`ber was assigned to 676 (11.1%) genes. We assigned 884
`(14.4%) genes to 98 metabolic pathways present at the
`Kyoto Encyclopedia of Genes and Genomes (KEGG)
`with the highest number of genes related to purine me-
`tabolism. Blast2GO BLASTx alignments using the NCBI
`NRPEP database confirmed that the closest matches to
`A. adeninivorans genes were very often found in Y. lipo-
`lytica (Additional file 5).
`Spliceosomal introns are more frequent than in Saccharo-
`mycetaceae or in Debaryomycetaceae, but in the same
`range as reported for Y. lipolytica (914 versus 1119, Table 2).
`In this latter species, introns are characterized by a very
`short distance between the 3′ splicing site and the branch
`point, but have in contrast retained the ancestral consensus
`hemiascomycete 5′ splicing site (GTAAGT). Finally, multi-
`intronic genes tend to be more frequent in A. adeninivor-
`ans than in Y. lipolytica (21.5% vs. 11.5%). For additional
`information, see Additional file 6 and Genosplicing [31].
`
`Phylogeny and synteny conservation
`A phylogenetic tree was reconstructed for each A. adenini-
`vorans protein-coding gene, the so-called phylome, and
`used to identify orthology and paralogy relationships
`among related species [34]. This comprehensive collection
`of evolutionary histories is publicly available at PhylomeDB
`
`Table 2 Annotated features of A. adeninivorans when compared to other representative Hemiascomycetes
`S. cerevisiae
`L. thermotolerans
`D. hansenii
`Y. lipolytica
`A. adeninivorans
`Species
`Strain
`Chromosome number
`
`S288c
`16
`
`CBS 6340
`8
`
`CBS 767
`7
`
`E150
`6
`
`LS3
`4
`
`Genome
`
`Ploidy
`
`Size
`
`Average G + C content (%)
`
`Genome coding coverage (%)
`
`CDS
`
`Total CDS (pseudo)
`
`Average G + C (%)
`
`Average size (aa)
`
`i-genes
`
`Introns
`
`Total tRNA genes
`
`Total snRNA
`
`Total snoRNA
`
`rDNA clusters
`
`n
`
`12.1
`
`38.3
`
`70.0
`
`5769
`
`39.6
`
`485
`
`287
`
`296
`
`274
`
`6
`
`77
`
`2n
`
`10.4
`
`47.3
`
`72.3
`
`n
`
`12.2
`
`36.3
`
`74.2
`
`n
`
`20.5
`
`49.0
`
`46
`
`n
`
`11.8
`
`48.1
`
`74.1
`
`5094 (46)
`
`6272 (129)
`
`6449 (137)
`
`6116 (33)
`
`49.2
`
`492
`
`278
`
`285
`
`229
`
`5
`
`43
`
`38.0
`
`479
`
`420
`
`467
`
`205
`
`5
`
`ND
`
`52.9
`
`476
`
`984
`
`1119
`
`510
`
`6
`
`ND
`
`49.4
`
`477
`
`703
`
`914
`
`147
`
`5
`
`37
`
`1 (internal)
`
`1 (internal)
`
`3 (internal)
`
`6 (subtelomeric)
`
`1 (internal)
`
`Total dispersed 5S rRNA genes
`
`0
`
`0
`
`0
`
`116
`
`30
`
`snRNA, small nuclear ribonucleic acid; snoRNA, small nucleolar ribonucleic acid; CDS, coding DNA sequence; G + C, guanine and cytosine; aa, amino acids; i-genes,
`intron-containing genes; ND, not-determined. Data from S. cerevisiae, Lachancea thermotolerans, D. hansenii and Y. lipolytica were taken from [30]; data on
`intron-containing genes from [31].
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 4 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 5 of 15
`
`[35]. Species phylogenies were computed on a set of
`concatenated orthologs and using a super tree approach
`combining all
`individual gene phylogenies. The two
`methods gave the same topology, in which A. adeninivorans
`groups with Y. lipolytica (Additional file 7), although the
`two species have greatly diverged. For instance, our analyses
`identified 2,520 A. adeninivorans proteins that lack an
`ortholog in Y. lipolytica, 591 of which do not even have a
`homolog in that species. For 121 proteins we could only
`detect homologs in Pezizomycotina genomes (Additional
`file 8). Horizontal gene transfer between prokaryotes and
`fungi was detected using a published pipeline [36], which
`pinpointed six candidates with putative enzymatic function
`that are likely to have been transferred from prokaryotes to
`Arxula (Additional file 8). Few genes of bacterial origin
`have been reported in Saccharomycotina so far, but most of
`them encode metabolic enzymes with important physio-
`logical roles that may facilitate host adaptation to biotope
`variations (see [36,37] for large-scale trans-kingdom trans-
`fer in fungi).
`The number of conserved gene blocks between A. ade-
`ninivorans and other genomes ranged from 300 with S.
`cerevisiae to >800 with Y. lipolytica, and was roughly pro-
`portional to the mean percentage of protein similarity, as
`is expected when species have greatly diverged. Indeed, in
`the comparison between Y. lipolytica and A. adeninivor-
`ans, 92% of the blocks contained less than four genes,
`showing that there is no large-scale conservation of syn-
`teny (Additional file 7).
`
`Gene families: expansion and contraction
`The gene trees in the phylome were scanned to detect and
`date duplication events [38]. With an average of 0.253 du-
`plications per gene in the specific lineage leading to
`Arxula, this genome does not seem to contain a large
`amount of duplications. This is nevertheless greater than
`the 0.015 value found in the common ancestor of Y. lipoly-
`tica and A. adeninivorans (Additional file 9). Most Arxula-
`specific expansions are not very large (between three and
`nine sequences) and correspond to peptidases,
`trans-
`porters, dehydrogenases and some proteins related to ni-
`trogen metabolism (Additional
`file 9). One expansion,
`however, contains over 100 members of unknown function
`and no homologs in any database, which is to our know-
`ledge the largest gene family described in yeast (Figure 2
`and Additional file 9).
`There are fewer transporters in A. adeninivorans than in
`for example D. hansenii or Y. lipolytica, but some have
`undergone strong amplification. Remarkably, sugar trans-
`porters appear overrepresented in this species: there are
`60 members of the Sugar Porter family, which is three
`times as many as in Kluyveromyces lactis or K. pastoris,
`and 1.8 times more than in S. cerevisiae (Additional file 10).
`These include 15 glycerol: H+ symporters, paralogs of
`
`the S. cerevisiae singleton STL1, compared to eight in the
`osmotolerant yeast D. hansenii, which may reflect the salt
`tolerance of A. adeninivorans. The ability to use various
`carbon sources is highlighted by the abundance of high af-
`finity glucose: H+ symporters (10 members), maltose: H+
`symporters (10 members), lactose permeases (four mem-
`bers versus one in K. lactis and D. hansenii), allantoate
`permeases (six members), and of facilitators for the uptake
`of xylose (six members), quinate (four members), fructose
`(four members) and myo-inositol (three members). Sur-
`prisingly, there are few glucose uniporters (two members,
`versus eighteen and four in S. cerevisiae and D. hansenii,
`respectively) and few sugar sensors. High-affinity nicotinic
`acid transporters (six members), polyamine transporters
`(15 members) and nitrate/nitrite permeases (three mem-
`bers) are also amplified (Additional file 10).
`About 10% of the duplicated genes (213/2285) are orga-
`nized in tandem gene arrays (TGAs), mostly as arrays of
`two genes. These arrays are sometimes entirely duplicated
`on the same or on different chromosome(s), a situation
`that so far remains unusual. The mechanism involved has
`given rise to the largest protein family in yeasts as men-
`tioned above. BLASTn searches indicated that coding and
`intergenic regions of duplicated TGAs are highly con-
`served at the nucleotide level, suggesting propagation of
`ancestral tandems by segmental duplication at ectopic po-
`sitions (Figure 2 and Additional file 11).
`
`Mating genes
`A. adeninivorans LS3 is only known to reproduce asexu-
`ally [20], yet a MAT locus was identified on chromosome
`D as is the case in many asexual species [40]. The region
`around the mating type locus is conserved between Y. lipo-
`lytica and A. adeninivorans, while it is rearranged in basal
`species such as Lipomyces starkeyi, filamentous fungi, and
`in species that emerged later, such as K. pastoris or K. lactis
`(Figure 3). The MAT locus encodes a homolog of the tran-
`scriptional factor Matα1 present in other yeast species
`(ARAD1D19294g, MTAL1), with a canonical DNA binding
`domain and a C-terminal extension partially conserved in
`Y. lipolytica, but absent from other species (Additional
`file 12). There is no Matα2 coding sequence (MTAL2) con-
`trary to the situation reported in other heterothallic yeast
`species such as S. cerevisiae and Y. lipolytica. The presence
`of only MTAL1 at the MATalpha locus is, however, found
`in several sexually competent filamentous fungi and
`yeasts such as Aspergillus nidulans, Clavispora lusitaniae,
`Meyerozyma guillermondii, Scheffersomyces stipitis or D.
`hansenii [40]. Whether A. adeninivorans is asexual or
`not is still an open question. Either A. adeninivorans is
`truly asexual and the loss of MTAL2 may be the cause,
`or alternatively, A. adeninivorans is sexual but strains of
`the opposite mating type have not yet been identified,
`thus preventing successful mating.
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 5 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 6 of 15
`
`Figure 2 Tandem gene arrays in A. adeninivorans. (a) Intermingled families. A. adeninivorans chromosomes are indicated on the left. Gene
`members of TGAs are depicted by boxed arrows colored according to their family. Family numbers refer to the Génolevures classification as
`shown in the legend in the box on the right. Pseudogenes are indicated by dotted lines. The GL3C4705 family is the largest one. Most of its
`members are tail-to-tail inverted tandem repeats, numbered from one to nine in black disks. (b) Neighbor-joining tree based on the muscle [39]
`alignment of positions one and two of the codons. Robustness of the tree is indicated by 100 bootstrap values calculated with a maximum
`composite likelihood model with uniform rates. Thin blue lines indicate pairs in inverted repeats of GL3C4705 family; heavy blue lines indicate
`relative orientation of genes in inverted repeats (see Additional file 11 for additional information).
`
`A search for genes involved in mating, meiosis and
`sporulation in S. cerevisiae identified the presence of
`most genes conserved in the sexual species D. hansenii,
`K. pastoris and Y. lipolytica (Additional file 13). For ex-
`ample, out of 368 genes tested, 292 were conserved in
`Y. lipolytica and 288 in A. adeninivorans. Candidates
`for the mating pheromones MFa and MFα and of their
`cognate receptors as well as for the signaling cascade
`were identified, confirming that A. adeninivorans is
`
`either still sexually active or has lost this ability only re-
`cently (Additional file 14).
`
`Metabolic pathways
`A. adeninivorans is described as having a wide substrate
`spectrum that includes the assimilation of many nitrogen-
`ous and aromatic compounds such as nitrate and nitrite,
`purines, tannins and benzoic acid derivatives [13,41,42].
`The ability to degrade purine compounds is reported in all
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 6 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 7 of 15
`
`Figure 3 MAT locus of A. adeninivorans in comparison to other ascomycetes. Conserved genes are depicted by boxed arrows with the
`same colour, MATα genes are in red and MATa genes in green. Genes without any homologs at the locus are represented by black boxed
`arrows. Thin lines (black or white) in genes correspond to the relative position of introns, assuming that the scheme is not to scale. Mating-type
`locus of CTG species are strongly rearranged, thus not represented here (see [40] for additional information on this clade).
`
`kingdoms and can occur either aerobically or anaerobic-
`ally in separate pathways. In the aerobic pathway, the
`critical step in the degradation of purine bases is the oxi-
`dation of hypoxanthine and xanthine to uric acid, cata-
`lyzed by xanthine oxidase and/or dehydrogenase. The
`various purine-degradative pathways are unique and differ
`from other metabolic pathways because they may serve
`quite different purposes, depending on the organism or
`tissue. While some organisms degrade the naturally occur-
`ring purines to CO2 and ammonia, others contain only
`some of the steps of the purine degradation pathways,
`resulting in partial degradation of purines or certain inter-
`mediary catabolites [43].
`Purine catabolism is a characteristic feature of A.
`adeninivorans [13]. The purine nucleosides (adenosine,
`inosine, xanthosine and guanosine) are transported
`across the membrane and into the cytoplasm by a pur-
`ine permease. They are then converted to adenine,
`hypoxanthine, xanthine and guanine, further degraded
`to uric acid and, after transport into the peroxisomes,
`to urea. All corresponding genes of this pathway are lo-
`calized on different chromosomes and are induced by
`adenine and other pathway intermediates [4,44]. Inter-
`estingly, an adenosine deaminase, needed to transform
`adenosine to inosine in animals and human, is absent
`(Figure 4). This pathway allows A. adeninivorans to use
`all of these purine derivatives as nitrogen and carbon
`sources [4,44].
`
`As in Ogatea (Hansenula) polymorpha [45] and in
`Kuraishia capsulata [18], a cluster of genes encoding a
`nitrate transporter, a nitrate reductase and a nitrite re-
`ductase has been previously identified in A. adeninivor-
`ans [46]. Genome data indicate that nitrate transporter
`encoding genes form a three member family, two of
`which are part of the nitrate cluster.
`Microarrays were designed based on the complete
`genome data of A. adeninivorans to analyze gene ex-
`pression changes before and after a shift from yeast
`minimal medium (YMM) + 2% glucose with NaNO3 to
`YMM medium with adenine as the nitrogen source. A
`significant down regulation of the genes involved in ni-
`trate metabolism was observed two hours after the
`shift. Key components of the purine degradation path-
`way on the other hand, clearly showed an increased ac-
`tivity (Figure 4). This provides further insight into the
`regulation of purine degradation by A. adeninivorans
`and emphasizes the possibility of using transcriptomic
`approaches to identify candidate genes for new bio-
`technological applications. Arxula specificities of the
`purine degradation pathway include the regulation of the
`respective genes. Activity tests, qRT-PCR experiments and
`microarray assays with xanthine dehydrogenase inducers
`demonstrated strong gene inducibility when cells were
`cultured on hypoxanthine and adenine and a lower level
`of induction with uric acid as the sole nitrogen source.
`However, enzyme induction by purines stops after
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 7 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 8 of 15
`
`Figure 4 Scheme of the key components of the purine degradation pathway. The image shows the reversible (double headed arrow) and
`irreversible (single headed arrow) reactions catalyzed by the corresponding enzymes (rectangular square) for purine degradation. The colors represent
`up regulation (red) and down regulation (blue) of genes in cells shifted to medium containing adenine as the sole nitrogen source compared to cells
`grown with nitrate. Black marked symbols indicate intermediates occurring several times in the pathway. Fold change (FC) values of gene expression
`are given within the colored boxes.
`
`− as ni-
`supplementing the medium with NH4+ or NO3
`
`trogen sources, which is in contrast to the situation in
`N. crassa where the enzyme is induced in the presence of
`
`NO3−, but not with NH4
`
`+. It is known that in A. nidulans,
`+ inactivates the GATA factor AreA, which is respon-
`NH4
`sible for expression of the urate-xanthine transporter [47].
`
`LCY Biotechnology Holding, Inc.
`Ex. 1023
`Page 8 of 20
`
`
`
`Kunze et al. Biotechnology for Biofuels 2014, 7:66
`http://www.biotechnologyforbiofuels.com/content/7/1/66
`
`Page 9 of 15
`
`It is not clear which mechanism triggers the repression
`− in A. adeninivorans [4].
`
`with NH4+ and NO3
`Tannin, a plant polyphenol molecule, is widely distrib-
`uted in the plant kingdom where it protects plants
`against attack by parasites and herbivores. It inhibits the
`activity of enzymes by binding and precipitation and is
`to a greater or lesser extent recalcitrant to biodegrad-
`ation [48]. While tannins are growth inhibitors for most
`microorganisms, a few bacteria, fungi and yeast such as
`D. hansenii, Mycotorula japonica or Candida sp. are
`capable of exploiting tannins as a carbon and/or energy
`source for growth [49-51]. A. adeninivorans is one of
`these yeasts that use tannic acid and gallic acid as car-
`bon sources [52]. Genes encoding tannases (ATAN1 -
`ARAD1A06094g, ATAN2 - ARAD1A19822g), gallate
`decarboxylase (AGDC - ARAD1C45804g) and catechol
`1,2 dioxygenase (ACDO - ARAD1D18458g) have been
`identified and His-tagged recombinant enzymes and
`corresponding gene mutants were used to confirm the
`activity of these enzymes (data not shown). This demon-
`strated that the tannic acid catabolism pathway enables
`this yeast to assimilate tannic acid and other hydroxylated
`derivatives of benzoic acid by non-oxidative decarboxyl-
`ation. All suitable derivatives require an hydroxide
`group at the m or p position of the carboxylic acid
`(Additional file 15). Interestingly, A. adeninivorans is
`thus the first eukaryote known to synthesize two tan-
`nases, one extracellular (Atan1p) [32] and one cell-wall
`localized (Atan2p - data not shown) which permits effective
`degradation of extracellular tannic acid. Both enzymes are
`able to remove gallic acid from both condensed and hydro-
`lysable tannins. Substrate specificity, biochemical parame-
`ters (temperature optimum 35 to 40°C, pH optimum at ca.
`6.0) and nearly complete extracellular localization (≥97%)
`distinguish Atan1p as an important industrial enzyme. First,
`transgenic tannase producer strains were constructed with
`a constitutively expressed ATAN1 module integrated into
`a chromosome. In fed-batch fermentation experiments,
`the transgenic strain produced 51,900 U/L of tannase ac-
`tivity after 42 h with a dry cell weight of 162 g/L [1].
`Another uncommon substrate used by this yeast is n-
`butanol. The n-butanol degradation pathway has not
`previously been reported to exist in eukaryotes. Genome
`mining suggests that n-butanol is oxidized to butyralde-
`hyde by an alcohol dehydrogenase (Aadh1p, AADH1 -
`ARAD1B16786g) that has a high substrate specificity, and
`then to butyric acid by two aldehyde dehydrogenases
`(Aald2p, AALD2 - ARAD1B17094g; Aald5p, AALD5 -
`ARAD1C17776g). The last steps involve an acyl-CoA lig-
`ase, a cytoplasmic acyl-CoA carnitine acyltransferase and
`a peroxisomal acyl-CoA carnitine acyltransferase for
`butyryl-carnitine synthesis via a butyryl-CoA inter