Skip to content

[DRAFT] A Senseable Python Template

View on GitHub

Motivation

This guide provides recommendations for how to structure Python projects developed within the MIT Senseable City Lab, and has the following goals in mind:

  1. Remove friction to get started with Python projects
  2. Improve internal code reuse and workflow reproducibility
  3. Centralize access to shared resources
  4. Make it easier to publish project code publicly

Overview

This guide strongly recommends that the cookiecutter-data-science template be used to create the skeleton for your project. Using this template will create an initial project structure that can be shared among projects in the lab, but can also be modified if needed.

This template is implemented using Cookiecutter, a command line utility to quickly build out your project structure. There are many different cookiecutter templates that can be used for different project setups with different focuses and strengths. We specifically recommend the template cookiecutter-data-science because it is widely adopted for use in research environments and presents an easy-to-share and flexible starting off point. As stated in their documentation:

A logical, flexible, and reasonably standardized project structure for doing and sharing data science work.

It is recommended that you read about the opinions reflected in the template, and use them to guide your own work and to inform any choice to deviate from the recommended structure.

Resulting Directory Structure

This is the directory structure that is created after completing the setup with this template.

├── LICENSE            <- Open-source license if one is chosen
├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── docs               <- A default mkdocs project; see www.mkdocs.org for details
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── pyproject.toml     <- Project configuration file with package metadata for 
│                         {{ cookiecutter.module_name }} and configuration for tools like black
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
├── requirements.txt   <- The requirements file for reproducing the analysis environment, e.g.
│                         generated with `pip freeze > requirements.txt`
│
├── setup.cfg          <- Configuration file for flake8
│
└── {{ cookiecutter.module_name }}   <- Source code for use in this project.
    │
    ├── __init__.py             <- Makes {{ cookiecutter.module_name }} a Python module
    │
    ├── config.py               <- Store useful variables and configuration
    │
    ├── dataset.py              <- Scripts to download or generate data
    │
    ├── features.py             <- Code to create features for modeling
    │
    ├── modeling                
    │   ├── __init__.py 
    │   ├── predict.py          <- Code to run model inference with trained models          
    │   └── train.py            <- Code to train models
    │
    └── plots.py                <- Code to create visualizations