113 kubernetes integration by SimoneBendazzoli93 · Pull Request #130 · BrainLesion/BraTS

SimoneBendazzoli93 · 2025-11-03T19:33:26Z

fixes #113

- Introduced a new backend option for Kubernetes in the Backends enum. - Updated segmentation algorithms to accept optional keyword arguments for Kubernetes backend, enhancing flexibility for users.

…thm execution - Added a new section outlining how to configure and use the Kubernetes backend for running algorithms. - Provided examples for setting the KUBECONFIG environment variable and specifying backend options in the inference method.

- Integrated Kubernetes backend option into the BraTSAlgorithm class. - Updated methods to accept optional keyword arguments for Kubernetes, allowing for enhanced configuration during inference. - Added error handling to ensure Kubernetes kwargs are only used with the Kubernetes backend.

- Updated pyproject.toml to include the Kubernetes package with a minimum version of 34.1.0, enabling support for Kubernetes features in the project.

…Kubernetes configuration - Changed the parameter name from `mount_path` to `data_mount_path` in the documentation to accurately describe its function in the `infer_single` method for Kubernetes backend usage.

- Implemented functions for creating and managing Kubernetes jobs, including PVC creation, job execution, and output handling. - Added methods for downloading additional files from Zenodo and checking file presence in pods. - Enhanced logging for better traceability during job execution and output verification. - Integrated command execution within pods to facilitate file uploads and downloads, ensuring smooth operation of the Kubernetes backend for algorithm inference.

…_from_pod function in kubernetes.py

…ation

…rnetes.py`. Tests cover command argument building, file handling, job creation, and PVC management for different algorithm configurations.

…`local_base_dir` parameter

…hanced Kubernetes backend configuration

brainless-bot · 2025-11-03T19:33:49Z

🤖 Code Formatting Reminder

Hello there! 👋 It looks like the code in this pull request might benefit from some formatting improvements.
Fix the issues locally or use our auto format action by commenting /format on this PR!

SimoneBendazzoli93 · 2025-11-03T19:34:17Z

/format

brainless-bot · 2025-11-03T19:34:29Z

🤖 I will now format your code with black. Check the status here.

Copilot

Pull Request Overview

This PR adds Kubernetes backend support to the BraTS orchestrator, enabling remote algorithm execution via Kubernetes Jobs as an alternative to local Docker/Singularity containers. Key changes include:

New Kubernetes backend implementation with job orchestration, file transfer, and PVC management
Integration of Kubernetes backend into the existing algorithm inference pipeline
Support for configurable Kubernetes resources via kubernetes_kwargs parameter

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
brats/constants.py	Added KUBERNETES enum value to Backends
brats/core/kubernetes.py	New module implementing Kubernetes job execution with PVC management, file transfers, and pod lifecycle management
brats/core/brats_algorithm.py	Updated _infer_single and _infer_batch to support kubernetes_kwargs parameter and dispatch to Kubernetes backend
brats/core/segmentation_algorithms.py	Added kubernetes_kwargs parameter to infer_single and infer_batch methods for both Adult and Pediatric classes
tests/core/test_kubernetes.py	Comprehensive test suite for Kubernetes backend functionality
README.md	Added documentation for Kubernetes backend usage with configuration examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

brats/core/kubernetes.py

tests/core/test_kubernetes.py

…new packages: `cachetools`, `durationpy`, `google-auth`, `kubernetes`, `oauthlib`, `pyasn1`, `pyasn1-modules`, `requests-oauthlib`, `rsa`, and `websocket-client`. Adjust version constraints and add optional dependencies for improved functionality.

…Lesion/BraTS into 113-kubernetes-integration

… path handling based on algorithm year, and streamline command execution logging.

Copilot

Pull Request Overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

pyproject.toml

brats/core/kubernetes.py

tests/core/test_kubernetes.py

… and modify type hints in `kubernetes.py` for clarity. Enhance logging in `_download_folder_from_pod` and add TODO comments for future security context implementation. Remove commented-out code in `run_job` and adjust test cases in `test_kubernetes.py` to reflect changes.

brainless-bot · 2025-11-03T20:46:23Z

🤖 Code Formatting Reminder

Hello there! 👋 It looks like the code in this pull request might benefit from some formatting improvements.
Fix the issues locally or use our auto format action by commenting /format on this PR!

Copilot

Pull Request Overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 11 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

brats/core/kubernetes.py

…ity function Updated the return type of _create_finalizer_job and _create_namespaced_job functions to return job names for better usability. Introduced a new utility function _check_pod_terminal_or_running to streamline pod status checks, enhancing code readability and maintainability.

…tional checks Updated the _build_command_args function to ensure additional file paths are only appended if they exist. Enhanced the documentation for _observe_job_output and _create_namespaced_job functions to clarify return values and error handling, improving overall code clarity and usability.

brainless-bot · 2025-11-10T10:22:25Z

🤖 Code Formatting Reminder

Hello there! 👋 It looks like the code in this pull request might benefit from some formatting improvements.
Fix the issues locally or use our auto format action by commenting /format on this PR!

SimoneBendazzoli93 · 2025-11-10T10:24:07Z

/format

brainless-bot · 2025-11-10T10:24:17Z

🤖 I will now format your code with black. Check the status here.

Copilot

Pull Request Overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MarcelRosier · 2026-03-16T10:05:50Z

pyproject.toml

  "numpy>=1.21.0; python_version<='3.9'",
  "numpy>=1.26.0; python_version>='3.10'",
+  "kubernetes>=34.1.0,<35.0.0",
 ]


[Blocker] should be an optional dependency

kubernetes>=34.1.0,<35.0.0 is added as a hard (unconditional) dependency. This forces ~10 new transitive packages on every user of this library, including those who never use Kubernetes.

It should be moved to [project.optional-dependencies] (or [tool.poetry.extras] in Poetry), e.g.:

[project.optional-dependencies] kubernetes = ["kubernetes>=34.1.0,<35.0.0"]

Users would then install it via pip install brats[kubernetes]. The top-level import in brats_algorithm.py also needs to be made lazy (see related comment there) to avoid an ImportError for users without the extra.

MarcelRosier · 2026-03-16T10:06:00Z

brats/core/brats_algorithm.py


 from brats.core.docker import run_container as run_docker_container
 from brats.core.singularity import run_container as run_singularity_container
+from brats.core.kubernetes import run_job as run_kubernetes_job


[Blocker] Top-level import causes ImportError for users without the kubernetes package

This import runs unconditionally at module load time. Anyone who does from brats.core.segmentation_algorithms import ... without having kubernetes installed will get an ImportError, even if they never use the Kubernetes backend.

The import should be deferred to where it is actually needed:

# In _get_backend_runner or inside run_job dispatch: if backend == Backends.KUBERNETES: from brats.core.kubernetes import run_job as run_kubernetes_job return run_kubernetes_job

This is consistent with how optional heavy dependencies are typically guarded and pairs with making kubernetes an optional install extra in pyproject.toml.

MarcelRosier · 2026-03-16T10:06:16Z

brats/core/kubernetes.py

+import random
+import string
+import base64
+import docker


Unconditional import docker in a module that may not require Docker

docker is only used here for the docker.types.DeviceRequest type annotation in _create_namespaced_job. Importing the entire docker SDK at module level adds an unnecessary hard dependency inside kubernetes.py.

Consider using a TYPE_CHECKING guard or switching to a string annotation:

from __future__ import annotations from typing import TYPE_CHECKING if TYPE_CHECKING: import docker

Or replace the type hint with the actual list structure (List[Any]) since DeviceRequest objects are passed in from outside and are already instantiated by the caller (_handle_device_requests in docker.py).

MarcelRosier · 2026-03-16T10:06:49Z

brats/core/brats_algorithm.py

            Backends.DOCKER: run_docker_container,
            Backends.SINGULARITY: run_singularity_container,
+            Backends.KUBERNETES: run_kubernetes_job,
        }


kubernetes_kwargs is an untyped Dict — typos will surface as confusing TypeError deep in run_job

Any misspelled key (e.g. "namespacee") will be silently passed through to run_job and produce a hard-to-diagnose error. Consider:

Defining a TypedDict (or dataclass) for the known kwargs and annotating accordingly.

If a TypedDict feels heavy, at least validate the keys against the known set before merging:

VALID_KUBERNETES_KWARGS = frozenset({ "namespace", "pvc_name", "pvc_storage_size", "pvc_storage_class", "job_name", "data_mount_path", }) unknown = set(kubernetes_kwargs) - VALID_KUBERNETES_KWARGS if unknown: raise ValueError(f"Unknown kubernetes_kwargs keys: {unknown}")

MarcelRosier · 2026-03-16T10:07:00Z

brats/core/brats_algorithm.py

@@ -199,13 +203,22 @@ def _infer_single(
            runner = self._get_backend_runner(backend)
            if runner is None:


Validation order: the backend != KUBERNETES guard fires too late

_get_backend_runner(backend) is called on line 203 (before this check). If an unsupported backend was passed, the runner is None guard on line 204 already handles it. But more importantly: the intent here is to reject kubernetes_kwargs when the backend isn't Kubernetes — this should be checked before any backend-specific setup happens, not after the runner is already retrieved.

Suggested reorder:

if kubernetes_kwargs is not None and backend != Backends.KUBERNETES: raise ValueError("kubernetes_kwargs can only be used with the Kubernetes backend.") runner = self._get_backend_runner(backend) if runner is None: raise ValueError(f"Unsupported backend: {backend}")

The same issue is present in _infer_batch.

MarcelRosier · 2026-03-16T10:07:10Z

brats/core/brats_algorithm.py

@@ -246,14 +261,23 @@ def _infer_batch(
            if runner is None:
                raise ValueError(f"Unsupported backend: {backend}")


Duplicated kubernetes_kwargs merging block — extract to a helper

The block that validates and merges kubernetes_kwargs into runner_kwargs is copy-pasted identically in both _infer_single and _infer_batch. This violates DRY and means any future change (e.g. adding a new allowed key) must be made in two places.

Extract to a small helper:

def _build_runner_kwargs(self, base_kwargs: dict, backend: Backends, kubernetes_kwargs: Optional[Dict]) -> dict: if kubernetes_kwargs is not None: if backend != Backends.KUBERNETES: raise ValueError("kubernetes_kwargs can only be used with the Kubernetes backend.") return {**base_kwargs, **kubernetes_kwargs} return base_kwargs

MarcelRosier · 2026-03-16T10:07:24Z

brats/core/kubernetes.py

+        name="job-container",
+        image=image,
+        volume_mounts=volume_mounts,
+        # security_context=client.V1SecurityContext(run_as_user=user_id, run_as_group=group_id)


[Design] No cleanup of Kubernetes resources — PVCs and Jobs accumulate indefinitely

run_job creates PVCs and Jobs but never deletes them, even on success. Over time (or after repeated runs) this will leave orphaned resources in the cluster. The function should clean up after itself unless the caller explicitly opts out.

Wrap the core logic in a try/finally:

try: # ... main body ... finally: if not keep_resources: # add a keep_resources=False param batch_v1_api.delete_namespaced_job(job_name, namespace, ...) batch_v1_api.delete_namespaced_job(job_name + "-finalizer", namespace, ...) core_v1_api.delete_namespaced_persistent_volume_claim(pvc_name, namespace) if algorithm.meta.year > 2024: core_v1_api.delete_namespaced_persistent_volume_claim(pvc_name + "-output", namespace)

The finalizer job also currently requires a manual touch /etc/content_verified to terminate — that sentinel mechanism means the pod runs forever if run_job crashes before reaching that point.

MarcelRosier · 2026-03-16T10:07:36Z

brats/core/kubernetes.py

+        pvc_name (str): Name of the PVC to create
+        namespace (str): The Kubernetes namespace to create the PVC in
+        storage_size (str): The size of the storage to request
+        storage_class (str): The storage class to use for the PVC. If None, the default storage class will be used.


alpine:latest is not reproducible — pin to a specific version

alpine:latest resolves to a different image digest across runs (and will change when Alpine releases a new version). This breaks reproducibility and can introduce subtle runtime differences.

Pin to a specific version, e.g.:

image="alpine:3.20"

Or make it a module-level constant so it can be updated in one place:

_ALPINE_IMAGE = "alpine:3.20"

The same hardcoded tag also appears in the init-container inside _create_namespaced_job.

MarcelRosier · 2026-03-16T10:07:47Z

brats/core/kubernetes.py

+                commands = [
+                    "ls",
+                    "-la",
+                    str(Path(mount_path).joinpath("input", file.relative_to(path))),


Locale-dependent string matching for file existence check

Detecting file absence by checking for "No such file or directory" in ls output is fragile — this string is locale-dependent and will silently fail on any non-English k8s node image.

Use test -f and check the exit code instead:

commands = ["sh", "-c", f"test -f {remote_path} && echo EXISTS || echo MISSING"] output = _execute_command_in_pod(...) if "MISSING" in output: # upload

Or use _execute_command_in_pod with _preload_content=False and inspect the return code via the websocket channel.

MarcelRosier · 2026-03-16T10:07:58Z

brats/core/kubernetes.py

+        namespace (str): The namespace of the pod to observe the output of
+    Returns:
+        str: The output of the job
+    """


Hardcoded 10-minute timeout — should be configurable

range(300) with time.sleep(2) gives a fixed 10-minute ceiling. Some BraTS algorithms run much longer; others are fast and don't need the wait. These values should be exposed as parameters with sensible defaults:

def _observe_job_output( pod_name: str, namespace: str, timeout_seconds: int = 600, poll_interval: float = 2.0, ) -> str: for _ in range(int(timeout_seconds / poll_interval)): ... time.sleep(poll_interval)

The same hardcoded range(300) pattern appears twice more in run_job (waiting for pod ready and waiting for job completion). All three should be unified under a single configurable timeout parameter passed through from run_job.

MarcelRosier · 2026-03-16T10:08:11Z

brats/core/kubernetes.py

+
+
+def _create_namespaced_job(
+    job_name: str,


Missing activeDeadlineSeconds — sentinel-loop pods can run forever on crash

Both the init-container and the finalizer-container use while [ ! -f /etc/content_verified ]; do sleep 1; done as their entry command. If run_job crashes before executing touch /etc/content_verified, the pod will spin indefinitely and consume cluster resources.

Add activeDeadlineSeconds to the V1JobSpec to bound the maximum pod lifetime:

spec=client.V1JobSpec( active_deadline_seconds=3600, # or make configurable template=client.V1PodTemplateSpec(...) )

This applies to both _create_namespaced_job and _create_finalizer_job.

MarcelRosier · 2026-03-16T10:08:23Z

brats/core/kubernetes.py

+        force_cpu=force_cpu,
+    )
+    logger.debug(f"GPU Device requests: {device_requests}")
+    user = _get_container_user(algorithm=algorithm)


Redundant Zenodo metadata fetch — pre-fetched result should be passed in

run_job already calls _get_zenodo_metadata_and_archive_url (around line 648) to compute additional_files_path. Then _download_additional_files calls the same function again internally (line 157), making two network round-trips to Zenodo for the same data.

Pass the already-fetched metadata into _download_additional_files (or restructure it to accept an optional pre-fetched result) to avoid the redundant call:

def _download_additional_files( algorithm: AlgorithmData, pod_name: str, namespace: str, mount_path: str = "/data", zenodo_response: Optional[tuple] = None, # pass pre-fetched result ) -> Path:

MarcelRosier · 2026-03-16T10:08:38Z

brats/core/kubernetes.py

+        )
+        try:
+            core_v1_api.delete_namespaced_pod(
+                name=pod_name_to_delete, namespace=namespace


run_job is 130+ lines and does too many things — consider decomposing

run_job currently handles: name generation, PVC creation, device resolution, command building, job creation, pod readiness polling, file upload, additional file download, parameter upload, sentinel touch, log observation, completion polling, finalizer job creation, output download, and output sanity check — all inline.

Consider extracting into focused helpers, e.g.:

_prepare_job_resources (PVC + name generation)

_upload_input_data (file upload + additional files + parameters + sentinel)

_wait_for_job_completion (pod polling + log collection)

_retrieve_output (finalizer job + download)

This would make each step independently testable and the overall flow much easier to follow.

MarcelRosier · 2026-03-16T10:08:48Z

brats/core/kubernetes.py

+    ]
+    if shm_size is not None:
+        shm_size_formatted = shm_size.replace("gb", "Gi")
+        volumes.append(


Use uuid.uuid4() instead of random.choices for name generation

random.choices(string.ascii_lowercase + string.digits, k=12) produces names with only ~62^12 ≈ 3×10^21 combinations, but it's seeded by the OS time by default and provides no guarantee of uniqueness. uuid.uuid4() is the idiomatic choice, has better collision properties, and removes the random and string imports:

import uuid pvc_name = f"brats-{uuid.uuid4().hex[:12]}-pvc" job_name = f"brats-{uuid.uuid4().hex[:12]}-job"

MarcelRosier · 2026-03-16T10:09:02Z

brats/core/kubernetes.py

+
+        while resp.is_open():
+            resp.update(timeout=1)
+            if resp.peek_stdout():


Incorrect type annotation: relative_to: Path = None

Path = None is not a valid type annotation for an optional parameter — the type and default are inconsistent. Mypy/pyright will flag this.

Change to:

relative_to: Optional[Path] = None, parent_dir: Optional[Path] = None,

MarcelRosier

Sorry for the super late review (And then it's even AI driven 🤖 ). However i also manually checked and think all comments are valid. especially the hard dependency. Do you have time to address the issue? @SimoneBendazzoli93

SimoneBendazzoli93 added 12 commits October 18, 2025 16:06

Add Kubernetes backend support in segmentation algorithms

46fd341

- Introduced a new backend option for Kubernetes in the Backends enum. - Updated segmentation algorithms to accept optional keyword arguments for Kubernetes backend, enhancing flexibility for users.

Add Kubernetes dependency in pyproject.toml

cead76b

- Updated pyproject.toml to include the Kubernetes package with a minimum version of 34.1.0, enabling support for Kubernetes features in the project.

Update README to reflect parameter name change for PVC mount path in …

39ccaff

…Kubernetes configuration - Changed the parameter name from `mount_path` to `data_mount_path` in the documentation to accurately describe its function in the `infer_single` method for Kubernetes backend usage.

Remove debug print statement for base folder name in _download_folder…

e99a837

…_from_pod function in kubernetes.py

Merge branch '114-singularity-integration' into 113-kubernetes-integr…

e63c443

…ation

Merge branch 'main' into 113-kubernetes-integration

ad42a55

Add comprehensive unit tests for Kubernetes integration in `test_kube…

a59b90e

…rnetes.py`. Tests cover command argument building, file handling, job creation, and PVC management for different algorithm configurations.

Update _download_folder_from_pod function to use absolute path for …

90a16c5

…`local_base_dir` parameter

Add optional kubernetes_kwargs parameter to BraTSAlgorithm for en…

cf340a2

…hanced Kubernetes backend configuration

SimoneBendazzoli93 requested review from Copilot and removed request for Copilot November 3, 2025 19:33

SimoneBendazzoli93 marked this pull request as draft November 3, 2025 19:33

Autoformat with black

f399c7e

SimoneBendazzoli93 marked this pull request as ready for review November 3, 2025 19:34

Copilot AI review requested due to automatic review settings November 3, 2025 19:34

Copilot AI reviewed Nov 3, 2025

View reviewed changes

SimoneBendazzoli93 added 3 commits November 3, 2025 20:21

Merge branch '113-kubernetes-integration' of https://github.com/Brain…

bef0219

…Lesion/BraTS into 113-kubernetes-integration

Refactor Kubernetes integration: remove unused imports, adjust output…

3527cf1

… path handling based on algorithm year, and streamline command execution logging.

SimoneBendazzoli93 requested a review from Copilot November 3, 2025 20:39

Copilot AI reviewed Nov 3, 2025

View reviewed changes

SimoneBendazzoli93 requested a review from Copilot November 3, 2025 20:46

SimoneBendazzoli93 requested a review from Copilot November 10, 2025 09:52

Copilot AI reviewed Nov 10, 2025

View reviewed changes

SimoneBendazzoli93 added 2 commits November 10, 2025 10:17

Autoformat with black

56215a3

SimoneBendazzoli93 requested a review from Copilot November 10, 2025 10:25

Copilot AI reviewed Nov 10, 2025

View reviewed changes

SimoneBendazzoli93 requested a review from neuronflow November 10, 2025 10:32

neuronflow requested a review from MarcelRosier November 10, 2025 12:03

neuronflow assigned SimoneBendazzoli93 Nov 10, 2025

MarcelRosier reviewed Mar 16, 2026

View reviewed changes

Merge branch 'main' into 113-kubernetes-integration

b924c5c

MarcelRosier requested changes Mar 16, 2026

View reviewed changes

		@@ -199,13 +203,22 @@ def _infer_single(
		runner = self._get_backend_runner(backend)
		if runner is None:

		@@ -246,14 +261,23 @@ def _infer_batch(
		if runner is None:
		raise ValueError(f"Unsupported backend: {backend}")

Conversation

SimoneBendazzoli93 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brainless-bot bot commented Nov 3, 2025

Uh oh!

SimoneBendazzoli93 commented Nov 3, 2025

Uh oh!

brainless-bot bot commented Nov 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brainless-bot bot commented Nov 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brainless-bot bot commented Nov 10, 2025

Uh oh!

SimoneBendazzoli93 commented Nov 10, 2025

Uh oh!

brainless-bot bot commented Nov 10, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SimoneBendazzoli93 commented Nov 3, 2025 •

edited

Loading