Skip to content

Preserve recursive CTE declared schema when aligning physical children #22034

@kosiew

Description

@kosiew

Summary

#21912 introduced a SQL behavior change: an existing recursive CTE SLT had to be amended from 0 AS level to SUM(0) AS level to pass.

A follow-up fix is required because recursive CTE physical alignment must preserve the CTE's declared/static schema, including nullability, rather than widening the recursive CTE output schema and forcing SQL rewrites.

Background

Issue #21910 is about preventing recursive CTE batches from leaking recursive-term field names when the recursive term is planned independently from the anchor/static term.

#21912 adds a reusable execution-layer schema helper and wires recursive CTE execution to avoid ad hoc schema rebinding. During review/testing, another problem appeared:

  • The anchor/static term can declare a non-null field, e.g. 0 AS level.
  • The recursive term can compute the same column through an expression that is nullable, e.g. MIN(rs.level) + 1.
  • If RecursiveQueryExec declares a widened schema using static_nullable || recursive_nullable, downstream planning sees a physical schema that no longer matches the logical/static CTE schema.
  • Changing SQL from 0 AS level to SUM(0) AS level makes the logical anchor nullable too, but only hides the mismatch.

Evidence

The SQL behavior change is visible in the amended SLT test:

diff --git a/datafusion/sqllogictest/test_files/cte.slt b/datafusion/sqllogictest/test_files/cte.slt
index e9c1c0245..d13e0d4f0 100644
--- a/datafusion/sqllogictest/test_files/cte.slt
+++ b/datafusion/sqllogictest/test_files/cte.slt
@@ -699,7 +699,7 @@ WITH RECURSIVE region_sales AS (
     SELECT
         s.salesperson_id AS salesperson_id,
         SUM(s.sale_amount) AS amount,
-        0 as level
+        SUM(0) as level
     FROM
         sales s
     GROUP BY

Reverting this SLT query change from:

SUM(0) AS level

to the original:

0 AS level

fails with:

Physical input schema should be the same as the one converted from logical input schema.
Differences:
  - field nullability at index 2 [level]: (physical) true vs (logical) false.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions