Machine Learning for Life Sciences

The course provides an introduction to machine learning methods and workflows for life science research. It introduces the full end-to-end machine learning (ML) workflow, from data preprocessing and feature engineering to model training, evaluation, interpretation, and reproducible reporting, with a focus on the analysis of complex, high-dimensional biological data. Participants explore biological datasets using unsupervised methods such as dimensionality reduction and clustering, and build predictive models using supervised approaches including linear and tree-based models. Methods for multi-omics integration, including partial least squares (PLS), are introduced together with specialized modeling settings relevant to life sciences, such as mixed-effects models and survival analysis.

Course content

Overview of the machine learning workflow
Dimensionality reduction methods such as PCA and UMAP
Unsupervised learning and clustering methods
Supervised learning models, including tree-based models
Partial least squares (PLS) for multi-omics integration
Mixed-effects models for analysis of repeated-measures and longitudinal data
Survival analysis methods for time-to-event data
Model training, evaluation and validation strategies
Model interpretation and explainable machine learning methods

Learning outcomes

After completing the course, participants will be able to:

Explain the main components of the machine learning workflow and their role in life science research.
Perform data preprocessing and exploratory analysis of high-dimensional biological datasets.
Apply unsupervised learning methods to discover structure and generate biological hypotheses.
Train, evaluate, and compare supervised learning models commonly used in life sciences.
Apply specialized modeling approaches, including mixed-effects models for repeated measures and survival analysis for time-to-event data.
Assess model performance using appropriate evaluation metrics and validation strategies.
Interpret and communicate model results using explainable machine learning techniques.
Apply basic principles of reproducible and FAIR machine learning workflows.
Collaborate in interdisciplinary teams to design, implement, and present an ML-based data analysis.

Education

In this course we focus on an active learning approach. We alternate between lectures, live coding sessions, exercises and group discussions.
While we focus on biostatistics and machine learning, not coding, some coding is needed and the examples used are in R and Python programming languages. See below for entry requirements.

Entry requirements

Prior exposure to basic statistical concepts (e.g. descriptive statistics, linear regression) or having attended the Statistical Methods for Life Sciences course or alike.
Basic programming skills in R or Python, including working with data frames and running scripts.
Familiarity with data analysis environments such as RStudio or Jupyter Notebooks.

No prior experience with machine learning is required.

More on R and Python skills

Basic syntax and arithmetic (using the language as a calculator) (R: 1 + 2; Python: 1 + 2)
Core data structures: vectors/arrays, matrices, and data frames, including subsetting and basic matrix operations (R: vectors, matrices, data frames; Python: NumPy arrays, pandas DataFrames)
Reading data and managing files: (R: read_csv(), relative paths; Python: pandas.read_csv(), relative paths)
Inspecting and summarising data: (R: head(), tail(), sum(), min(), max(); Python: head(), tail(), sum(), min(), max())
Handling missing values (R: NA, na.rm = TRUE; Python: NaN, isna())
Writing simple control flow and functions (R: if/else, loops, functions; Python: if/else, loops, functions)
Finding and using documentation (R: help(), ?; Python: help(), docstrings)
Installing and loading/importing external packages (R: install.packages(), library(); Python: pip / conda, import)
Data transformation and manipulation (filtering rows, selecting columns, creating new variables) (R: tidyverse; Python: pandas)
Creating and interpreting basic plots, including simple customisation (labels, titles): (R: plot(), ggplot2; Python: matplotlib, seaborn)
Basic familiarity with reproducible documents: (R: R Markdown / Quarto; Python: Quarto / Jupyter)

Selection criteria

Due to limited space the course can accommodate maximum of 24 participants. If we receive more applications, participants will be selected based on several criteria. Selection criteria include correct entry requirements, motivation to attend the course as well as gender and geographical balance.
We prioritize academic participants (students, staff, affiliated researchers) in Sweden. We welcome participants from industry and/or outside Sweden if there are seats available and the requirements criteria are met.

Fees

3000 SEK for academic participants
15 000 SEK for non-academic participants
the fee includes lunches and coffee

Course credits

Upon successful course completion, assessed based on active participation in all course session, we will issue a course certificate.
Please note that we are not able to provide any formal university credits (högskolepoäng). Many universities, however, recognize the attendance in our courses, and award 1.5 HPs, corresponding to 40h of studying. It is up to participants to clarify and arrange credit transfer with the relevant university department.