Skip to content

Conversation

@antiguru
Copy link
Member

@antiguru antiguru commented Feb 8, 2026

Summary

  • Split LazyUnaryFunc::eval into eval + eval_input so unary functions can be called with pre-evaluated datums, avoiding re-evaluation in a stack machine context
  • Add embedded_exprs() / embedded_exprs_mut() to LazyUnaryFunc for the 9 compound-cast functions that contain embedded MirScalarExpr sub-expressions
  • Extract VariadicFunc::eval_eager() for calling eager variadic functions with pre-evaluated datum slices
  • Implement CompiledMirScalarExpr — a flat instruction sequence compiled from MirScalarExpr trees that replaces recursive AST descent with a linear instruction stream using a value stack and program counter

Details

The CompiledMirScalarExpr handles all MirScalarExpr variants:

  • Column/Literal: push directly onto the stack
  • CallUnary: inline operand evaluation, pop + apply via eval_input
  • CallBinary: inline operand evaluation, pop 2 + apply via static column references
  • CallVariadic: sub-program compilation for each operand (supports short-circuiting for And/Or/Coalesce/etc.)
  • If: compiled to SkipIfNotTrue with two offsets (false→else, error→end) for correct error propagation
  • Compound casts (CastList1ToList2, CastArrayToArray, CastRecord1ToRecord2, CastListToJsonb, CastArrayToJsonb, CastStringToArray/List/Map/Range): dedicated instructions with compiled sub-programs for inner cast expressions

Test plan

  • Unit tests for Column, Literal, If (true/false/null/error branches) pass
  • cargo check -p mz-expr -p mz-transform -p mz-compute builds cleanly
  • Integration with MFP evaluation (follow-up)
  • Performance benchmarks (follow-up)

🤖 Generated with Claude Code

@antiguru antiguru force-pushed the claude/heuristic-pare branch 2 times, most recently from 735b806 to afd4bb6 Compare February 9, 2026 00:47
antiguru and others added 5 commits February 9, 2026 19:55
Introduce `CompiledMirScalarExpr`, a flat instruction sequence compiled
from `MirScalarExpr` trees. This replaces recursive AST descent during
evaluation with a linear instruction stream using a value stack and
program counter, improving evaluation performance.

Key changes:

1. Split `LazyUnaryFunc::eval` into `eval` + `eval_input`, where
   `eval_input` takes a pre-evaluated `Result<Datum, EvalError>` instead
   of requiring expression evaluation. Updated all 27 direct
   `LazyUnaryFunc` implementations.

2. Add `embedded_exprs()` / `embedded_exprs_mut()` methods to
   `LazyUnaryFunc` for the 9 compound-cast functions that contain
   embedded `MirScalarExpr` sub-expressions.

3. Extract `VariadicFunc::eval_eager()` from `VariadicFunc::eval()` for
   calling eager variadic functions with pre-evaluated datum slices.

4. Implement `CompiledMirScalarExpr` in `static_eval.rs` with:
   - Flat instruction sequence with label-based compilation
   - Inline evaluation for unary (via `eval_input`) and binary
     (via static column references) functions
   - Sub-program compilation for variadic operands and compound-cast
     inner expressions (MapListElements, MapArrayElements, MapRecord,
     MapListToJsonb, MapArrayToJsonb, ParseAndCast)
   - Correct If/else handling with two-offset SkipIfNotTrue for proper
     error propagation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the CallVariadic instruction (which compiled each operand into
a sub-program) with 7 dedicated inline instructions: AndStep, OrStep,
SkipIfNotNull, GreatestStep, LeastStep, RaiseIfNullError, and
CallEagerVariadic. This eliminates sub-program indirection for all
variadic functions and enables short-circuit branching for And, Or,
Coalesce, Greatest, Least, and ErrorIfNull.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-compute max_stack_depth during compilation and expose
eval_with_stack so callers can amortize the stack allocation across
rows. Also optimize accumulator patterns (And/Or/Greatest/Least) to
use in-place mutation instead of pop+push, remove unused return_ty
from MapRecord, and add benchmarks comparing tree vs compiled eval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Mirror the existing LazyUnaryFunc::eval_input pattern for binary
functions. Add eval_input to LazyBinaryFunc trait (taking pre-evaluated
Result<Datum, EvalError> inputs), update the EagerBinaryFunc blanket
impl, and add a hand-written BinaryFunc::eval_input dispatch method.

This lets the stack machine in static_eval call binary functions
directly with pre-evaluated datums instead of going through fake
Column(0)/Column(1) expression references and a temporary datum slice.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a benchmark for a 1024-deep left-leaning add chain
((((1 + 1) + 1) + 1) ... + 1) to measure compiled vs tree eval
on deeply nested expressions where recursion overhead dominates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@antiguru antiguru force-pushed the claude/heuristic-pare branch from afd4bb6 to d2fe34e Compare February 10, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant