This repo contains the course material for the course Feature selection in GWAS given at the Machine Learning in Genomics intensive week (day 4). The slides are from Chloé-Agathe Azencott and the practicals are adapted from https://github.com/chagaz/ds3-2018-genetics.
This repo includes the slides of the lecture and the jupyter notebooks of the practical sessions. The notebooks cover the same tools as the lecture:
- practical1:
- T-test and Manhattan plots
- Linear regression
- Lasso
- practical2:
- Elastic-net
- Multi-task lasso
- Network-constained lasso
The practicals require writting very little code: most questions are about commenting on the results. Corrected version of the practicals are provided.
-
Clone the repository
git clone https://github.com/goepp/ml-in-genomics-2021/ -
You need to download the heavy files
athaliana_small.X.txtandathaliana_small.W.txthere and place them inpractical/data/. Alternatively, you can just run the notebooks cells which download these two files.
You need python3, conda, and jupyter notebook. An easy way to set things up from scratch is:
-
Create a conda environment:
conda env create --file=environment.yml. -
Activate the conda env:
conda activate mlgen. -
Run the jupyter notebook from within the conda env:
jupyter notebookand your notebook should open in a web browser. You're good to go!