Skip to content

mfhepp/cookiecutter-pydev

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python data analysis cookiecutter

Bodenmiller Lab template for Python data analysis projects using Jupyter notebooks

Requirements

cookiecutter

Usage

To create a new project from this template:

cookiecutter https://github.com/BodenmillerGroup/cookiecutter-jupyter

After project creation, it is recommended to initialize git and add the origin:

cd <package_name>
git init
git remote add origin <origin_url>

Project requirements

The created project contains both a pip-style requirements.txt file and a conda-style environment.yml file:

  • The requirements.txt file should contain all Python packages required for executing the code in the project repository. This file allows the user to install the most recent version of all packages and should therefore not be version-pinned, unless specific package versions are required.
  • The environment.yml file should contain a conda environment for which the correct execution of the code in the project is guaranteed, including binary packages and the Python runtime. This file allows the user to reproduce the analysis results for which the project was created and should therefore be version-pinned.

Both the requirements.txt file and the environment.yml file are prepopulated with essential dependencies. Initially, the environment.yml file is not version-pinned, to allow for the creation of a fresh conda environment after project initialization. Following conda environment creation, this file should be replaced by an all-pinned environment file as follows:

conda env export > environment.yml

Also, for sharing your analysis environments, consider containerization tools such as Singularity or Docker.

This topic was discussed in more detail in issue #2.

Versioning Jupyter notebooks

By default, Jupyter notebook files ending with .ipynb are not versioned. This can be changed anytime by removing the corresponding line from the project's generated .gitignore file.

Advantages of versioning Jupyter notebooks:

  • Static output of Jupyter notebook files is supported & rendered by GitHub.
  • "Lab journal-style" Jupyter notebooks: Not only the code, but also the output embedded in the notebooks is versioned. This allows to track analysis results, as long as they are embedded in the Jupyter notebook.
  • Pure code changes can still be tracked on an individual file-level by simultaneously version-controlling the .py files autogenerated by jupytext (enabled by default, see example.ipynb).
  • Per-commit code changes can be viewed using third-party tools such as nbdime.

Disadvantages of versioning Jupyter notebooks:

  • Processed data should be easily reproducible by simply rerunning the code for the respective revision. Storage of such data is often not required/desirable.
  • In principle, large and/or binary data should not be stored in git repositories, but tracked using appropriate data storage systems (file system, dolt, dvc, ...) instead. Storing (large) binary files in git repositories increases the physical disk space and data transfer requirements and makes it harder to understand changes on a per-commit level without third-party tooling (see above).
  • Code & version history duplication: both the .ipynb and the autogenerated .py files contain the same code.
  • Processed data not embedded in Jupyter notebooks has to be tracked separate from the output embedded in Jupyter notebooks. Also, not all data embedded in Jupyter notebooks can be stored (e.g. interactive visualization results).

Whether to version-control Jupyter notebooks or not is a design choice. See issue #3 for details.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Authors

Jonas Windhager

Contributors

  • Vito Zanotelli

Acknowledgements

License

MIT License

About

Template for Python data analysis projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 50.3%
  • Python 49.7%