Big Data Analytics for Life Scientists

Slide and teaching material availability

Teaching materials are available through the Big Data Analytics for Life Scientists GitHub repository.

Summer School ‘Big Data Analytics for Life Scientists’

This summer school is intended for doctoral students in the life sciences who are interested in
learning fundamental skills for big data projects. No previous experience, but high motivation is
expected. The program designed for one week is listed below. There will be lectures covering
the theoretical background (about 2h) followed by practical exercises to gain hands-on
experiences (3h-6h).
This summer school will take place from June 17th to 21st in R021B (Mendelssohnstraße 4). It
is planned that lectures will start at 9am. Exercises will be conducted before and after a lunch
break. Participation in this summer school is free of charge, but application is required. There
are 12 slots available per cohort and priority will be given to students working on projects related
to plant-microbe interactions. Depending on the number of applications, a second week for
another cohort might be offered.
Please send your application with the following details via email to Boas Pucker:

Name
Affiliation
Current situation (progress in doctoral studies)
Summary of your research / research interests
Expectations / motivation for participating in this summer school

We are looking forward to a great data analytics week!

Tentative program

Day 1

Lecture

Exponential growth of databases
Systems Biology & Omics (Genomics, Transcriptomics, Proteomics, Metabolomics)
Potential for data upcycling & Data Life Cycle
Recent progress and current challenges in big data analytics
How to find the right software
Open science principles
Introduction to Linux (Ubuntu)
Data management (File and folder structures for big data projects)
Backup and archiving strategies
Documentation
Important script languages in bioinformatics: Python, R

Practical course

How to get help from an AI (and the limitations)
Working in a virtual machine
Installing computational tools
Finding usage of computational tools
Transferring files (filezilla, scp)
Jupyter Notebook

Day 2

Lecture

Introduction to genome biology
Sequencing technologies (Sanger, Illumina, ONT, PacBio)
Long read sequencing workflow (ONT)
Genome sequence assembly
Structural and functional annotation
Comparative genomics
Read mapping, variant calling, variant annotation
Genome-Wide Association Studies (GWAS) / Mapping-By-Sequencing (MBS)
File formats: FASTA, FASTQ, GFF, SAM/BAM, VCF

Practical course

QC of long reads
Trimming/filtering of reads
Assembly (Shasta)
Gene prediction (BRAKER3)
Functional annotation (Mercator)
Biosynthesis pathway annotation (KIPEs3)
Long read mapping (minimap2)
Variant calling and annotation (SnpEff, NAVIP)

Day 3

Lecture

Introduction to transcriptomics
History of transcriptomics (microarrays, RT-qPCR)
Concept of RNA-seq & workflow
Experimental design considerations
Quality control (RNA & data)
Read mapping & quantification
PCA, Heatmaps, DEG identification
Direct RNA sequencing & full cDNA sequencing
scRNA-seq
Re-using public datasets

Practical course

QC and trimming of reads (fastQC, Trimmomatic)
De novo transcriptome assembly (Trinity)
Split read mapping (STAR, HISAT2)
Quantification (kallisto)
Identification of DEGs (DESeq2)
Co-expression analysis (ppb-tools.de)

Day 4

Lecture

How to reduce complexity?
Considerations for designing scientific figures
Types of figures
Matplotlib, plotly, ggplot2
Examples: circos plots, synteny figures, PCA, phylogenetic trees
File formats: PNG, JPEG, TIFF, PDF, SVG
Manual editing (Inkscape)
Visualizing complex networks (Cytoscape)
Web examples: eFP browser
Designing figures with bioRender

Practical course

How to generate figures in Python, R (with AI support)
Circos plots & synteny figures
DEG plots & enrichment analysis
Visualize coexpression network with Cytoscape
Phylogenetic tree construction (FastTree2) + visualization (iTOL)

Day 5

Lecture

Introduction to scientific publishing business
How to publish your data (enable reuse)
Importance of metadata
FAIR data
Details about methods (#OpenMethods)
Sharing protocols through protocols.io
Publishing data sets through LeoPARD
Submission of sequencing data to ENA
Depositing scripts in GitHub and Zenodo

Practical course

Sharing protocols through protocols.io
Create a GitHub repository
Prepare data for submission to ENA
Complete the LeoPARD template

Q & A

Chances to ask remaining questions
Collection of feedback about content / structure

Big Data Analytics for Life Scientists

Slide and teaching material availability

Summer School ‘Big Data Analytics for Life Scientists’

Tentative program

For All Visitors

For Students

Internal Tools

Contact