Skip to content

Conversation

@yuhuan130
Copy link

@yuhuan130 yuhuan130 commented Dec 14, 2025

What This Does

Optimizes Docker build cache by separating heavy ML dependencies (torch, tensorflow) into their own layers and moving Flyte Python wheels to the bottom, so adding lightweight packages doesn't trigger full rebuilds.

Performance

Heavy Benchmark (torch, tensorflow, transformers)

Metric Without Optimization With Optimization Result
Build Time 310. 3s (5.2 min) 8.7s (0.14 min) 35.9× faster

Standard Benchmark (torch, numpy, pandas)

Metric Without Optimization With Optimization Result
Build Time 249.7s (4.2 min) 7.7s 32.3× faster

How It Works

BEFORE:  Any change rebuilds everything
┌─────────────────────────────────────┐
│ ALL packages together               │
└─────────────────────────────────────┘

AFTER:  Optimized layer ordering  
┌─────────────────────────────────────┐
│ Heavy:   torch, tensorflow (cached) │ ← Rarely changes
├─────────────────────────────────────┤
│ Other                               │ ← Changes more often
├─────────────────────────────────────┤
│ Flyte PythonWheels (bottom layer)   │ ← Compiled once, fully cached
└─────────────────────────────────────┘

Key Changes

  • heavy_deps.py: New centralized configuration file for heavy dependencies (tensorflow, torch, scikit-learn, etc.)
  • image_builder.py: Enhanced _optimize_image_layers() method with:
    • Heavy dependency extraction to top layer
    • Flyte wheel layer placement at bottom for maximum caching
    • UV script dependency parsing support
  • New flag: optimize_layers=True (enabled by default, can disable)
  • 3 benchmark examples showing 32-36× speedup
  • Smarter layer ordering: Heavy deps cached separately, Flyte wheels at bottom

Files Changed

  • src/flyte/_internal/imagebuild/heavy_deps.py (new)
  • src/flyte/_internal/imagebuild/image_builder.py
  • src/flyte/_image. py
  • tests/flyte/test_image. py
  • examples/image_layer_optimize/ (3 benchmark files)

Total: 7 files (1 new, 6 modified)

Screenshots

Screenshot 2025-12-30 at 17 57 18 Screenshot 2025-12-30 at 18 13 42 Screenshot 2026-01-06 at 16 42 45

Replaces #422 with cleaner git history

Signed-off-by: “Alex <[email protected]>
Signed-off-by: “Alex <[email protected]>
Signed-off-by: “Alex <[email protected]>
@yuhuan130 yuhuan130 marked this pull request as ready for review December 14, 2025 11:31
Signed-off-by: “Alex <[email protected]>
@yuhuan130 yuhuan130 marked this pull request as draft December 14, 2025 17:33
@yuhuan130 yuhuan130 marked this pull request as ready for review December 24, 2025 06:25
categorized[category].append(pkg)

# Helper function to create a layer
def create_pip_layer(pkgs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image.from_debian_base(name="test").with_pip_packages("tensorflow").with_pip_packages("pytorch")

In this case, TensorFlow and Pytorch will be installed in different layers, right? Let's install them in the same layer.

Also, I think it might be better to add _optimize method in ImageBuildEngine, and we can do

image = ImageBuildEngine._optimize(image)
result = await img_builder.build_image(image, dry_run=dry_run)

result = await img_builder.build_image(image, dry_run=dry_run)

It will be cleaner, wdyt?

Copy link
Author

@yuhuan130 yuhuan130 Dec 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. I've updated your suggestion on my latest commit!!

pre: bool = False,
extra_args: Optional[str] = None,
secret_mounts: Optional[SecretRequest] = None,
optimize_layers: bool = True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move it to clone()

Image.from_debian_base().clone(optimize_layers=True)

"heavy": ("tensorflow", "torch", "torchaudio", "torchvision", "scikit-learn"),
# -----------------[ MIDDLE ]----------------- #
# Layer 1: ~200MB | Rebuild cost: Med | Freq: Low
"core": ("numpy", "pandas", "pydantic", "requests", "httpx", "boto3", "fastapi", "uvicorn"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to build these two images; the build times are almost the same. Do we really need this core layer?

image = (
    Image.from_debian_base(install_flyte=False)
    .with_apt_packages("vim", "wget")
    .with_pip_packages("pandas", "numpy")
)
image = (
    Image.from_debian_base(install_flyte=False)
    .with_apt_packages("vim", "wget")
    .with_pip_packages("pandas", "numpy", "ty")
)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the similar build times make sense here, without the additional optimize_layers flag marked false. Both images reuse the cached apt + pandas/numpy layers, so the only real installing is ty, which is small. Technically it'll save around 45 seconds - 1 minute.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm interesting, I tried running it locally once with and without optimize, and I found out python:3.12-slim-bookworm base actually pre-installs the wheels for pandas and numpy etc... so it'll be fast for these two packages either way.



@env.task
async def main():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Signed-off-by: “Alex <[email protected]>
# Step 1: Collect heavy packages and build new layer list
all_heavy_packages: list[str] = []
template_layer: PipPackages | None = None
optimized_layers: list[Layer] = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could we call it original_layers or other_layers?

Comment on lines +210 to +212
template_layer = PipPackages(
packages=(),
index_url=layer.index_url,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have heavy_layers: List. each layers has different settings

ImageBuilderType = typing.Literal["local", "remote"]

@staticmethod
def _optimize_image_layers(image: Image) -> Image:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we also add some unit tests for it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants