Skip to content

Conversation

@siliataider
Copy link
Contributor

@siliataider siliataider commented Feb 9, 2026

This Pull request:

Changes or fixes:

This PR optimizes the setup time of the ROOT dataloader in case of eager loading with undersampling.

Previously, the Python BaseGenerator validated the sampling ratio in the Python layer by calling RDataFrame::Count() on both datasets, triggering the RDataFrame computation graph twice already during generator construction.

With this change we delay the sampling ratio validation inside the ROOT::ML::RSampler where the required entry counts are already known.

Micro-benchmarks measuring the generator creation time in eager mode show a ~3–6% reduction in setup time, depending on the dataset size and graph complexity.

Checklist:

  • tested changes locally

@siliataider siliataider self-assigned this Feb 9, 2026
@siliataider siliataider added in:Python Interface in:ML Everything under ROOT/ML labels Feb 9, 2026
Copy link
Member

@vepadulano vepadulano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@siliataider siliataider changed the title [ML][Python] Optimize eager laoding setup by removing redundant RDF triggers [ML][Python] Optimize eager loading setup by removing redundant RDF triggers Feb 9, 2026
@github-actions
Copy link

github-actions bot commented Feb 9, 2026

Test Results

    22 files      22 suites   3d 19h 45m 31s ⏱️
 3 789 tests  3 789 ✅ 0 💤 0 ❌
75 296 runs  75 296 ✅ 0 💤 0 ❌

Results for commit 45f183f.

♻️ This comment has been updated with latest results.

@siliataider siliataider merged commit ded0336 into root-project:master Feb 9, 2026
79 of 82 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in:ML Everything under ROOT/ML in:Python Interface

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants