Reproducible Deep Learning

PhD Course in Data Science

Timetable: May 5-7-12-14, 9-13 AM.
Attendance: Zoom link, contact me if you do not receive the passcode.

Visual overview of the course

Overview

Building a deep learning model is a complex task, full of interacting design decisions, data engineering, parameter tweaking, and experimentation. Having access to powerful tools for versioning, storing, and analyzing every step of the process (MLOps) is essential.

The aim of this practical course is to start from a simple deep learning model implemented in a notebook, and port it to a ‘reproducible’ world by including code versioning (Git), data versioning (DVC), experiment logging (Weight & Biases), hyper-parameter tuning, configuration (Hydra), and ‘Dockerization’. While the focus is on vertical, well-established tools, we will discuss more advanced integrated frameworks (e.g., MLFlow) and techniques (e.g., CI/CD pipelines).

Setup your machine

We will install most libraries as we go along. For the initial setup, perform an Anaconda installation on your machine, and create an environment:

conda create -n reprodl; conda activate reprodl

Then, install a few generic prerequisites (notebook handling, Pandas, …):

conda install -y -c conda-forge notebook matplotlib pandas ipywidgets pathlib

Finally, install PyTorch and PyTorch Lightning. The instructions below can vary depending on whether you have a CUDA-enabled machine, Linux, etc. In general, follow the instructions from the website.

conda install -y pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch -c conda-forge
conda install -y pytorch-lightning -c conda-forge

Organization of the material

Slides will be available below, while the code will be uploaded to a GitHub repository. The course is split into exercises (Git, DVC, …). To start the exercise, swith to the corresponding Git branch, and follow the instructions on video or in the corresponding README file. To see the completed exercise, you can switch to a completed branch, as shown below.

An example

To follow the DVC exercise, check in the table below the name of the branch (exercise2_dvc), and perform a checkout:

git checkout exercise2_dvc

If you want to see the completed exercise, add _completed to the name of the branch:

git checkout exercise2_dvc_completed

You can inspect the commits to look at specific changes in the code:

git log --graph --abbrev-commit --decorate

If you want to inspect a specific change, you can checkout again using the ID of the commit.

Material

	Topic	Branch name	Material
0	Introduction	-	Slides, Bare repository, Video
1	Deep learning recap	-	Notebook, Video
2	Git & Scripting	exercise1_git	Slides, Video, Code
3	Hydra configuration	exercise2_hydra	Hydra repository, Video, Code
4	Data versioning with DVC	exercise3_dvc	DVC Website, Slides, Video (part 1), Video (part 2), Code
5	Docker	exercise4_docker	Docker Website, Slides, Video, Code
6	Weight & Biases	exercise5_wandb	Weights & Biases Website, Video, Code
7	Continuous integration	exercise6_hooks	Video, Code
-	Exam		Instructions

Advanced reading material

The new edition of Full Stack Deep Learning (UC Berkeley CS194-080) covers a larger set of material than this course.

TL;DR 👇

PhD Course in Data Science

Overview

Setup your machine

Organization of the material

An example

Material

Advanced reading material