Skip to content

Conversation

@mzabaluev
Copy link

Rationale for this change

The PartitionEvaluator implementation for NthValue in DataFusion has a few shortcomings:

  • When nulls are ignored (meaning the count should skip over them), the evaluation collects an array of all valid indices, to select at most one index accordingly to the First/Last/Nth case.
  • The memoize implementation gives up in the same condition, even after performing part of the logic!

What changes are included in this PR?

Use only as much iteration over the valid indices as needed for the function case, without collecting all indices.
The memoize implementation does the right thing for FirstValue with ignore_nulls set to true, or returns early for other function cases.

Are these changes tested?

All existing tests pass for FirstValue/LastValue/NthValue.

Are there any user-facing changes?

No.

Instead of collecting all valid indices per batch in PartitionEvaluator
for NthValue, use the iterator as appropriate for the case.
Even tn the worst case of negative index larger than 1, only a sliding
window of N last valid indices is needed.
Handle the case when FirstValue is called with ignore_nulls set to true,
can prune the partition on the first non-null value.
Also return early for the other function cases in the same condition,
rather than grinding some logic only to discard the results.
@github-actions github-actions bot added the functions Changes to functions implementation label Dec 25, 2025
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good if there were some benchmarks to check this is indeed a performance boost

Comment on lines +475 to +476
let slice = array.slice(range.start, n_range);
if let Some(nulls) = slice.nulls() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also be more strict by checking if the null_count instead of simply checking for the presence of the null buffer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will be another branch, and the iterator returned by nulls.valid_indices() being empty will do the right thing anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can chain it like

            if let Some(nulls) = slice.nulls()
                && nulls.null_count() > 0
            {

and the iterator returned by nulls.valid_indices() being empty will do the right thing anyway?

But I think it would still iterate through the whole null buffer anyway? So if we're looking via a performance lens perhaps this approach is worth considering.

// for the sliding window that will be discarded in the end.
return None;
}
let mut window = VecDeque::with_capacity(reverse_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we reverse the iterator of valid_indices() to avoid need of this queue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, BitIndexIterator is not bidirectional.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queue is a bad solution, indeed. I think a simple ring buffer will do much better here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, BitIndexIterator is not bidirectional.

Ah you're right, I was mistakenly thinking of BitIterator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we do something like this it can work without needing a separate container?

Ordering::Less => {
    let reverse_index = (-self.n) as usize;
    let total_len = nulls.len();
    let null_count = nulls.null_count();
    let valid_indices_len = total_len - null_count;
    if reverse_index > valid_indices_len {
        return None;
    }
    nulls
        .valid_indices()
        .nth(valid_indices_len - reverse_index)
        .map(|idx| idx + offset)
}

impl NthValueEvaluator {
fn valid_index(&self, array: &ArrayRef, range: &Range<usize>) -> Option<usize> {
let n_range = range.end - range.start;
if self.ignore_nulls {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole null logic block is quite indented; would be good to see if we can refactor it

Comment on lines +374 to +397
if self.ignore_nulls {
match self.state.kind {
// Prune on first non-null output in case of FIRST_VALUE
NthValueKind::First => {
if let Some(nulls) = out.nulls() {
if self.state.finalized_result.is_none() {
if let Some(valid_index) = nulls.valid_indices().next() {
let result =
ScalarValue::try_from_array(out, valid_index)?;
self.state.finalized_result = Some(result);
} else {
// The output is empty or all nulls, ignore
}
}
if state.window_frame_range.start < state.window_frame_range.end {
state.window_frame_range.start =
state.window_frame_range.end - 1;
}
return Ok(());
} else {
// Fall through to the main case because there are no nulls
}
}
// Do not memoize for other kinds when nulls are ignored
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is really hard to follow with all the nesting present

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Between early returns and falling through to the main case, I don't know if I can make it more readable.

@mzabaluev
Copy link
Author

Would be good if there were some benchmarks to check this is indeed a performance boost

I have added a benchmark. It shows significant improvement in many cases, but it also shows that the VecDeque is problematic. I will try to replace it with a ring buffer.

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 4, 2026

Would be good if there were some benchmarks to check this is indeed a performance boost

I have added a benchmark. It shows significant improvement in many cases, but it also shows that the VecDeque is problematic. I will try to replace it with a ring buffer.

Would you be able to post the benchmark results for us to see?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants