Introducing eager loading of dataframe(s) in RBatchGenerator#21035
Merged
martinfoell merged 3 commits intoroot-project:masterfrom Jan 30, 2026
Merged
Conversation
Test Results 22 files 22 suites 3d 12h 55m 43s ⏱️ Results for commit 2d3b47a. ♻️ This comment has been updated with latest results. |
vepadulano
requested changes
Jan 27, 2026
Member
vepadulano
left a comment
There was a problem hiding this comment.
Nice work! First round of review
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Show resolved
Hide resolved
bindings/pyroot/pythonizations/python/ROOT/_pythonization/_tmva/_batchgenerator.py
Show resolved
Hide resolved
Contributor
Author
Thanks for the review @vepadulano ! I addressed the comments that you gave and left some comments to explain where you had questions. |
vepadulano
requested changes
Jan 29, 2026
Member
vepadulano
left a comment
There was a problem hiding this comment.
The changes look good to me! Before merging, I believe the commit history should be cleaned. Ideally, there should be 3 commits in total:
- One introducing all the changes in the C++ code
- One with the changes in
_batchgenerator.pythat rely on the additions from the previous - One with the new tests
This commit introduces the RDatasetLoader class which takes as input a vector of dataframes and loads each of them in memory and further splits them into training and validation datasets that are added to a vector for the datasets from each dataframe. The RSampler class is introduced to concatenate the training and validation datasets from the vector of datasets from RDatasetLoader and further shuffle them before the dataset is passed to RBatchLoader. Some changes are done to the existing classes to help with integrating the eager loading along side the existing chunk loading: - Remove numEntries and rdf_entries as input parameters to the RChunkLoader class - Replace numColumns with cols and vecSizes as input parameters to the RBatchLoader class - Add slice and concatenate methods for Flat2DMatrix in Flat2DMatrixOperator In the RBatchGenerator class the changes mentioned above are integrated to enable eager loading from dataframe(s).
…g from dataframe(s) This commits adjusts the python bindings from RBatchGenerator such that eager loading is enabled in the batch loading from Numpy, PyTorch and TensorFlow. The load_eager (bool) parameter is added to choose between eager loading (True) or chunk loading (False). The sampling_type (str) parameter is added to distingush between which sampling strategy is chosen for eager loading. Further, the rdataframes input parameter is changes such that it now can either be a single dataframe or a list of dataframes.
523de65 to
2d3b47a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This Pull request:
numEntriesandrdf_entriesas input parameters to the RChunkLoader classnumColumnswithcolsandvecSizesas input parameters to the RBatchLoader class