Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Dec 30, 2025

Needed to unblock apache/datafusion#19556

This was mostly written by Claude with guidance on to fix the following MRE (reproducible on the branch for apache/datafusion#19556):

COPY (
  SELECT *
  FROM VALUES ({field: [{nested: 1}]}, 1), ({field: [{nested: 2}]}, 2) AS t(struct_col, id)
)
TO 'test.parquet';

set datafusion.execution.parquet.pushdown_filters = true;

CREATE EXTERNAL TABLE t1 STORED AS PARQUET LOCATION 'test.parquet';

-- Works, no issues
select struct_col['field'][1]['nested']
from t1
where id = 1;

-- Works, no issues
select id, struct_col
from t1
where struct_col['field'][1]['nested']  = 1;

-- Error
select id, struct_col['field'][1]['nested']
from t1
where struct_col['field'][1]['nested'] = 1;

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 30, 2025
Copy link
Member

@Weijun-H Weijun-H left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @adriangb

better add this test

    #[test]
    fn test_level_propagation_empty_after_skip() {
        let metrics = ArrowReaderMetrics::disabled();
        let cache = Arc::new(Mutex::new(RowGroupCache::new(4, usize::MAX)));

        // Producer populates cache with levels
        let data = vec![1, 2, 3, 4];
        let def_levels = vec![1, 0, 1, 1];
        let rep_levels = vec![0, 1, 1, 0];
        let mock_reader =
            MockArrayReaderWithLevels::new(data, def_levels.clone(), rep_levels.clone());
        let mut producer = CachedArrayReader::new(
            Box::new(mock_reader),
            cache.clone(),
            0,
            CacheRole::Producer,
            metrics.clone(),
        );

        producer.read_records(4).unwrap();
        producer.consume_batch().unwrap();

        // Consumer skips all rows, resulting in an empty output batch
        let mock_reader2 = MockArrayReaderWithLevels::new(
            vec![10, 20, 30, 40],
            vec![0, 0, 0, 0],
            vec![0, 0, 0, 0],
        );
        let mut consumer = CachedArrayReader::new(
            Box::new(mock_reader2),
            cache,
            0,
            CacheRole::Consumer,
            metrics,
        );

        let skipped = consumer.skip_records(4).unwrap();
        assert_eq!(skipped, 4);

        let array = consumer.consume_batch().unwrap();
        assert_eq!(array.len(), 0);

        assert_eq!(consumer.get_def_levels().unwrap(), &[]);
        assert_eq!(consumer.get_rep_levels().unwrap(), &[]);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants