Python data analysis cookiecutter

Bodenmiller Lab template for Python data analysis projects using Jupyter notebooks

Requirements

Usage

To create a new project from this template:

cookiecutter https://github.com/BodenmillerGroup/cookiecutter-jupyter

After project creation, it is recommended to initialize git and add the origin:

cd <package_name>
git init
git remote add origin <origin_url>

Project requirements

The created project contains both a pip-style requirements.txt file and a conda-style environment.yml file:

The requirements.txt file should contain all Python packages required for executing the code in the project repository. This file allows the user to install the most recent version of all packages and should therefore not be version-pinned, unless specific package versions are required.
The environment.yml file should contain a conda environment for which the correct execution of the code in the project is guaranteed, including binary packages and the Python runtime. This file allows the user to reproduce the analysis results for which the project was created and should therefore be version-pinned.

Both the requirements.txt file and the environment.yml file are prepopulated with essential dependencies. Initially, the environment.yml file is not version-pinned, to allow for the creation of a fresh conda environment after project initialization. Following conda environment creation, this file should be replaced by an all-pinned environment file as follows:

conda env export > environment.yml

Also, for sharing your analysis environments, consider containerization tools such as Singularity or Docker.

This topic was discussed in more detail in issue #2.

Versioning Jupyter notebooks

By default, Jupyter notebook files ending with .ipynb are not versioned. This can be changed anytime by removing the corresponding line from the project's generated .gitignore file.

Advantages of versioning Jupyter notebooks:

Static output of Jupyter notebook files is supported & rendered by GitHub.
"Lab journal-style" Jupyter notebooks: Not only the code, but also the output embedded in the notebooks is versioned. This allows to track analysis results, as long as they are embedded in the Jupyter notebook.
Pure code changes can still be tracked on an individual file-level by simultaneously version-controlling the .py files autogenerated by jupytext (enabled by default, see example.ipynb).
Per-commit code changes can be viewed using third-party tools such as nbdime.

Disadvantages of versioning Jupyter notebooks:

Processed data should be easily reproducible by simply rerunning the code for the respective revision. Storage of such data is often not required/desirable.
In principle, large and/or binary data should not be stored in git repositories, but tracked using appropriate data storage systems (file system, dolt, dvc, ...) instead. Storing (large) binary files in git repositories increases the physical disk space and data transfer requirements and makes it harder to understand changes on a per-commit level without third-party tooling (see above).
Code & version history duplication: both the .ipynb and the autogenerated .py files contain the same code.
Processed data not embedded in Jupyter notebooks has to be tracked separate from the output embedded in Jupyter notebooks. Also, not all data embedded in Jupyter notebooks can be stored (e.g. interactive visualization results).

Whether to version-control Jupyter notebooks or not is a design choice. See issue #3 for details.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Authors

Jonas Windhager

Contributors

Vito Zanotelli

Acknowledgements

Cookiecutter Data Science

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
{{cookiecutter.package_name}}		{{cookiecutter.package_name}}
LICENSE		LICENSE
README.md		README.md
cookiecutter.json		cookiecutter.json
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Python data analysis cookiecutter

Requirements

Usage

Project requirements

Versioning Jupyter notebooks

Contributing

Authors

Contributors

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mfhepp/cookiecutter-pydev

Folders and files

Latest commit

History

Repository files navigation

Python data analysis cookiecutter

Requirements

Usage

Project requirements

Versioning Jupyter notebooks

Contributing

Authors

Contributors

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages