Specialized Metabolism: Flavonoid Biosynthesis

Plants produce a plethora of specialized metabolites to cope with numerous environmental challenges. Examples are terpenoids, flavonoids, and betalains. We are particularly interested in flavonoids which are responsible for the pigmentation of many blue or red flowers. Flavonoids can be separated into several subgroups including flavonols, anthocyanins, proanthocyanidins, and flavones. There are multiple reasons why working on the flavonoid biosynthesis is promising.


1) The core of the flavonoid biosynthesis is a well understood model system for the regulation of biosynthesis pathways in plants. Decades of work revealed insights into the different enzymes, but also into the transcriptional regulation of the involved genes. Even Nobel Prizes have been won in association with the flavonoid biosynthesis.

2) Flavonoids have enormous potential in biotechnological applications as natural colorants and due to their high nutritional value. While the general biosynthesis pathway is well conserved across distantly related plant species, their are species-specific differences in the modification of flavonoids. Detailed knowledge about the flavonoid biosynthesis in (orphan) crop species can pave the way to a healthy nutrition. Genome editing or breeding methods can be used to improve crops with respect to their nutritional value.

3) Visible phenotypes resulting from the knock-out or increased activation of flavonoid biosynthesis genes can be helpful in the identification of promising mutants. Visible phenotypes were one reason why the flavonoid biosynthesis was established as a model system. However, coloration is not only important for basic research. Plants with altered expression of certain flavonoid biosynthesis genes show often fascinating colorations and are favored in horticultural plant species.

4) Our work investigates the Caryophyllales which represent an outstanding system to study the flavonoid biosynthesis. This order of plants is characterized by a complex pigment evolution. Briefly, specific families in the Caryophyllales show a replacement of anthocyanins (a group of flavonoids) by betalains. Betalains do not occur in plants outside the Caryophyllales and it appears that anthocyanins and betalains are mutually exclusive. The Caryophyllales are a unique opportunity to investigate the interplay of different branches of the flavonoid biosynthesis. The comparison between anthocyanin-pigmented species and species without anthocyanins reveals new insights into the biosynthesis and regulation.

Generally, we are interested in the discovery of promising biosynthesis pathways for biotechnological applications. This is not restricted to the flavonoid biosynthesis, but instead using this model system to develop new genome mining approaches. Different methods for the identification biosynthetic pathways are combined including screens and comparisons of plant genome sequences.

Plant Genomics: Long Read Sequencing

Plant genome sequences contain the blue print for all proteins (enzymes). Sequencing and investigating genomes is an effective approach to reveal the biochemical potential of plants. Especially the correlation of genomic data (DNA) with transcriptomic (RNA) and metabolomic (chemical compounds) data sets allows the identification of biosynthesis pathways. Rapid developments of long read sequencing technologies allow the cost-effective analysis of large plant genomes. Sequencers distributed by Oxford Nanopore Technologies (ONT) are portable and can even be operated in the field. This so called nanopore sequencing approach analysis individual DNA strands. We use this technology to resolve the genome sequences of of important plant species. This is also a great opportunity for students to contribute to a genome sequencing project.




Specific biological questions require the development of dedicated tools. We write such tools mostly in Python and R. The developed tools are freely available on github (bpucker). The following tools are examples of active developments.


  • KIPEs (A): Knowledge-based Identification of Pathway Enzymes allows the automatic annotation of the proteins involved in the core steps of the flavonoid biosynthesis. This supports the identification of molecular mechanisms underlying color differences between cultivars of the same species. In addition, rapid annotation of flavonoid biosynthesis genes in novel/uncharacterized species becomes convenient.
  • MGSE (B): Mapping-based Genome Size Estimation is a novel approach to infer the true genome size of a species based on sequence reads. This approach harnesses the equal representation of all regions (even repeats) in the read set. The average number of sequence reads (coverage) is estimated based on single copy regions in a reference genome sequence. Dividing the combined coverage of all positions in an assembly by this sequencing depth results in the genome size estimation.
  • NAVIP (C): Neighborhood-Aware Variant Impact Predictor enables the prediction of functional consequences arising from sequence variants between a sequenced sample and a reference. In contrast to many established tools, NAVIP considers all variants in one gene at the same time when predicting the potential effect of sequence variants.
  • MYB_annotator (D): This tool enables the automatic identification and annotation of MYBs in a novel transcriptome/genome sequence assembly of a plant species. The identified candidates are functionally annotated based on orthology to previously characterized sequences.
  • bHLH_annotator (E): This tool enables the automatic identification and annotation of bHLHs in a novel transcriptome/genome sequence assembly of a plant species. The identified candidates are functionally annotated based on orthology to previously characterized sequences.