[ML][Python] Optimize eager loading setup by removing redundant RDF triggers #21202

siliataider · 2026-02-09T11:48:40Z

This Pull request:

Changes or fixes:

This PR optimizes the setup time of the ROOT dataloader in case of eager loading with undersampling.

Previously, the Python BaseGenerator validated the sampling ratio in the Python layer by calling RDataFrame::Count() on both datasets, triggering the RDataFrame computation graph twice already during generator construction.

With this change we delay the sampling ratio validation inside the ROOT::ML::RSampler where the required entry counts are already known.

Micro-benchmarks measuring the generator creation time in eager mode show a ~3–6% reduction in setup time, depending on the dataset size and graph complexity.

Checklist:

tested changes locally

…riggers

vepadulano

Thank you!

github-actions · 2026-02-09T15:14:04Z

Test Results

22 files 22 suites 3d 19h 45m 31s ⏱️
3 789 tests 3 789 ✅ 0 💤 0 ❌
75 296 runs 75 296 ✅ 0 💤 0 ❌

Results for commit 45f183f.

♻️ This comment has been updated with latest results.

siliataider added 2 commits February 9, 2026 12:33

[ML][Python] Optimize eager laoding setup by removing redundant RDF t…

b2328f6

…riggers

[ML][Python] Add test for invalid sampling ratio

45f183f

siliataider requested review from guitargeek and vepadulano as code owners February 9, 2026 11:48

siliataider self-assigned this Feb 9, 2026

siliataider added in:Python Interface in:ML Everything under ROOT/ML labels Feb 9, 2026

vepadulano approved these changes Feb 9, 2026

View reviewed changes

siliataider changed the title ~~[ML][Python] Optimize eager laoding setup by removing redundant RDF triggers~~ [ML][Python] Optimize eager loading setup by removing redundant RDF triggers Feb 9, 2026

siliataider merged commit ded0336 into root-project:master Feb 9, 2026
79 of 82 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML][Python] Optimize eager loading setup by removing redundant RDF triggers #21202

[ML][Python] Optimize eager loading setup by removing redundant RDF triggers #21202

Uh oh!

siliataider commented Feb 9, 2026 •

edited

Loading

Uh oh!

vepadulano left a comment

Uh oh!

github-actions bot commented Feb 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ML][Python] Optimize eager loading setup by removing redundant RDF triggers #21202

[ML][Python] Optimize eager loading setup by removing redundant RDF triggers #21202

Uh oh!

Conversation

siliataider commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This Pull request:

Changes or fixes:

Checklist:

Uh oh!

vepadulano left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

siliataider commented Feb 9, 2026 •

edited

Loading

github-actions bot commented Feb 9, 2026 •

edited

Loading