This is a template for a Data Science project using Python, conda for environment management and Quarto for documentation.
To adapt to your individual project change sample to the respective project name in the commands below
Adapt the LICENSE as required.
Provide a brief description of the project here.
According to Is It Ops That Make Data Science Scientific? Archives of Data Science, Series A, vol 8, p. 12, 2022.
Code and configurations used in the different project phases are stored in the subfolders
data_acquisitionedamodellingdeployment
Templates for the documentation artefacts from the different project phases are provided in the subfolder docs in the form of a Quarto project:
- Project charta
- Data report
- Modelling report
- Evaluation decision log
See section Quarto Setup and Usage for instructions on how to build and serve the documentation website from the indvidual reports using Quarto.
Make sure to have uv installed: https://docs.astral.sh/uv/getting-started/installation/
After cloning the repository, create the python environment with all dependencies based on the .python-version, pyproject.toml and uv.lock files by running
uv syncTo add new dependencies, use
uv add <package>which will add the package to pyproject.toml and update the uv.lock file. You can also specify a version, e.g. uv add pandas==2.0.3.
Remove packages with
uv remove <package>Commit changes to pyproject.toml and uv.lock files into version control.
Run uv sync after pulling changes to update the local environment.
Whenever the python environment is used, make sure to prefix every command that uses python with uv run, e.g.
uv run python script.pyThe environment variables are specified in a .env-File, which is never commited into version control, as it may contain secrets. The repo just contains the file .env.template to demonstrate how environment variables are specified.
You have to create a local copy of .env.template in the project root folder and the easiest is to just rename it to .env.
The content of the .env-file is then read by the pypi-dependency: python-dotenv. Usage:
import os
from dotenv import load_dotenvload_dotenv reads the .env-file and sets the environment variables:
load_dotenv()which can then be accessed (assuming the file contains a line SAMPLE_VAR=<some value>):
os.environ['SAMPLE_VAR']If Quarto is used to build a documentation website as described in the Project Organisation section, you need to
- Install Quarto
- Optional: quarto-extension for VS Code
- If working with svg files and pdf output you will need to install rsvg-convert:
- On macOS, this can be done via
brew install librsvg - on Windows using chocolatey:
- Install chocolatey
- Install rsvg-convert * run in a terminal:
choco install rsvg-convert
- On macOS, this can be done via
Source *.qmd and configuration files are in the docs folder. The quarto project configuration is setup as follows:
Base config: docs/_quarto.yml
Two profiles (https://quarto.org/docs/projects/profiles.html) allow you two generate two different outputs from the same source files. With the benefit to reuse as much as possible from the ongoing project documentation in the final thesis:
- Website documentation:
docs/_quarto-website.ymlquarto render --profile websiterenders todocs/build - book project - thesis document:
docs/_quarto-thesis.yml
quarto render --profile thesisrenders todocs/thesis
With embedded python code chunks that perform computations, you need to make sure that the python environment is activated when rendering. This can be done by prefixing the render command with uv run, e.g.:
uv run quarto renderThe default profile, which is used when the quarto render command is used without argument, can be set in docs/_quarto.yml according to the current need when working with local previews etc.
Further adaptations to the configuration files as needed.
- Make changes to the
.qmdsource files - Preview:
quarto preview(default pofile indocs/_quarto.ymlis set towebsite). Therefore also preview in vscode automatically loads the website profile - Build the documentation website by running
quarto render --profile websitefrom thedocssubfolder. This will push all files into thedocs/buildsubfolder. - You can check the website locally by opening
docs/build/index.htmlin a browser docs/buildis excluded from git versioning. The workflow file in.github/workflows/configures automatic remote build and deployment as an Azure static web app
A github actions workflow file (.github/workflows/azure-static-web-apps-ashy-pond-07dfc0003.yml) ensures that every push/merge to the main branch triggers a build and deployment as an Azure static Web-App
: https://spectraltuning.manuel-doemer.ch.
The Web-App configuration is in staticwebapp.config.json.
The setting
execute:
freeze: auto
in the _quarto.yml file ensures that all the python computations are only performed locally (with the last render command before push) and checked into the repository under docs/.freeze, so that no Python code is executed by the github runners and the pre-computed results are actually used in the remote build and deployment.
A github actions workflow file (.github/workflows/<xyz>>.yml) ensures that every push/merge to the main branch triggers a build and deployment as an Azure static Web-App
The Web-App configuration is in staticwebapp.config.json.
The setting
execute:
freeze: auto
in the _quarto.yml file ensures that all the python computations are only performed locally (with the last render command before push) and checked into the repository under docs/.freeze, so that no Python code is executed by the github runners and the pre-computed results are actually used in the remote build and deployment.
If you would like to use github pages to serve the documentation website, and at the same time avoid pushing the rendered files into the repo (makes very ugly diffs) but executing embedded code blocks only locally, the initial setup (only needed once) of the github action is according to https://quarto.org/docs/publishing/github-pages.html#github-action as follows:
- Add
to the
execute: freeze: auto
_quarto.ymlfile - execute
quarto renderfrom thedocsfolder - run
quarto publish gh-pages(generates and pushes a branch calledgh-pages) - make sure that github pages is configured to serve the root of the
gh-pagesbranch - add the definition of the github action workflow
.github/workflows/publish.yml(see below) - check all of the newly created files (including the
_freezedirectory) into themainbranch of the repository docs/buildis excluded by the.gitignore- then push to
main
Github action workflow configuration file to add in .github/workflows/publish.yml:
on:
workflow_dispatch:
push:
branches: main
name: Quarto Publish
jobs:
build-deploy:
runs-on: ubuntu-latest
permissions:
contents: write
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Install librsvg
run: sudo apt-get install librsvg2-bin
- name: Set up Quarto
uses: quarto-dev/quarto-actions/setup@v2
with:
tinytex: true
- name: Render and Publish
uses: quarto-dev/quarto-actions/publish@v2
with:
target: gh-pages
path: docs
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}After setup of the corresponding github action, every update just needs:
- Build the website by running
quarto renderfrom thedocssubfolder. This will push the rendered files intodocs/build(not checked into the repository via .gitignore) and computations in thedocs/_freeze(checked in so that github action runners to not need python) subfolder. - Check the website locally by opening the
docs/build/index.html - Push all updated files into the
mainbranch. This will trigger a github action that- pushes an update to the
github-pagesbranch - renders and publishes the site to https://.github.io/sample/
- pushes an update to the
Additional notes:
- Rendering
svg-files requires thelibrsvgpackage. The github action (Linux Ubuntu) installs it viasudo apt-get install librsvg2-bin. To render locally, you need to install it on your system as well. On macOS, this can be done viabrew install librsvg. On Windows you can use chocholatey to install it:choco install rsvg-convet(https://community.chocolatey.org/packages?&tags=librsvg).
Files that come in addition to the website documentation:
bibliography.qmdfor the bibliography- Appendix (
appendices) format pdf template-partials: - before-body.texspecifies the pages before the table of contents containing preface, abstract, placeholder for declaration of independence etc.. These contents must therefore be adapted directly in this.texfile.
- Make changes to the
.qmdsource files - For preview:
quarto preview --profile thesis. Falls der vscode-Preview für das pdf verwendet werden soll, muss in der Datei_quarto.yml"default: website" auf "thesis" umgestellt werden. - Build the report by running
quarto render --profile thesisfrom thedocssubfolder. This will push all files into thedocs/thesissubfolder. - You can check the generated pdf
- Modify manually the thesis pdf as needed in the
docs/thesis_finalsubfolder. This is the final version of the thesis document that will be submitted. Thedocs/thesissubfolder is only for the intermediate files generated by quarto.- copy the thesis pdf document into
docs/thesis_final - fill out the cover page templates in
docs/thesis_final - generate pdf pages from the templates
- replace the placeholders in the thesis pdf with the separately generated pdf pages
- copy the thesis pdf document into
