(Image created using ChatGPT)
The word GaiaFlow is a combination of Gaia (the Greek goddess of Earth, symbolizing our planet)
and Flow (representing seamless workflows in MLOps). It is an MLOps
framework tailored for efficient Earth Observation projects. GaiaFlow is built
to provide you with a framework for the entire pipeline of remote sensing applications, from data
ingestion to machine learning modeling to deploying them.
It is a comprehensive template for machine learning projects
providing a MLOps framework with tools like Airflow, MLFlow,
JupyterLab, Minio and Minikube to allow the user to create ML projects,
experiments, model deployments and more in an standardized way. The documentation
is available here
The architecture below describes what we want to achieve as our MLOps framework. This is taken from the Google Cloud Architecture Centre
Please note: This framework has only been tested on Linux Ubuntu and Windows 11 using WSL2 and it works as expected. As we have not tested it yet on MacOS and directly on Windows, we are not sure if it works in there.
This template provides a standardized project structure for ML initiatives at BC.
A python package Gaiaflow has also been developed for integrating essential MLOps tools:
- Apache Airflow: For orchestrating ML pipelines and workflows
- MLflow: For experiment tracking and model registry
- JupyterLab: For interactive development and experimentation
- MinIO: For local object storage for ML artifacts
- Minikube: For local lightweight Kubernetes cluster
You will get the following project when you use this template to get started with your ML project.
├── .github/ # GitHub Actions workflows (you are provided with a starter CI)
├── dags/ # Airflow DAG definitions
│ (you can either define dags using a config-file (dag-factory)
│ or use Python scripts.)
├── notebooks/ # JupyterLab notebooks
├── your_package/ (If you chose pixi as env manager, this will be suffixed by `src/`
│ │ (For new projects, it would be good to follow this standardized folder structure.
│ │ You are of course allowed to add anything you like to it.)
│ ├── dataloader/ # Your Data loading scripts
│ ├── train/ # Your Model training scripts
│ ├── preprocess/ # Your Feature engineering/preprocessing scripts
│ ├── postprocess/ # Your Postprocessing model output scripts
│ ├── model/ # Your Model defintion
│ ├── model_pipeline/ # Your Model Pipeline to be used for inference
│ └── utils/ # Utility functions
├── tests/ # Unit and integration tests
├── data/ # If you have data locally, move it here and use it so that airflow has access to it.
├── README.md # Its a readme. Feel to change it!
├── CHANGES.md # You put your changelog for every version here.
├── pyproject.toml # Config file containing your package's build information and its metadata
├── .env # Your environment variables that docker compose and python scripts can use (already added to .gitignore)
├── .gitignore # Files to ignore when pushing to git.
└── environment.yml # Libraries required for local mlops and your project (if pixi is used, this will not be present)
Please make sure that you install the following from the links provided as they have been tried and tested.
If you face any issues, please let us know.
- Mamba – Please make sure you install Python 3.12, as this repository has been tested with that version.
or - Pixi (We recommend using this)
Inside your terminal (Linux or WSL2), check:
mamba # should print Mamba help page
or
pixi # should print pixi help page
Once the pre-requisites are done, you can go ahead with the project creation:
- Create a separate environment for cookiecutter
mamba create -n cc cookiecutter ruamel.yaml
mamba activate cc- Generate the project from template:
cookiecutter https://github.com/bcdev/gaiaflow-cookiecutterWhen prompted for input, enter the details requested. If you dont provide any input for a given choice, the first choice from the list is taken as the default.
- (Optional) - If you wish to use Gaiaflow dockerized MLOps services (Airflow, MLFlow, Minio) please follow the steps here. Once gaiaflow is installed, please read the user guide.
NOTE: The python package currently only works with the conda version of this template, pixi version will be released soon.
