Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented Dec 28, 2025

Which issue does this PR close?

Rationale for this change

When defining function signatures in DataFusion, parameter types (via TypeSignature) and parameter names (via Signature::with_parameter_names) are currently specified in separate places. This split makes signatures harder to read, increases boilerplate, and risks names/types drifting out of sync when signatures evolve (for example, when adding optional parameters).

This PR introduces an ergonomic constructor that lets call sites define each variant’s parameter names and types together, producing the appropriate TypeSignature and inferred parameter_names in one place.

What changes are included in this PR?

  • Added a small helper enum ParameterKind to allow the new builder API to accept either:

    • concrete DataType values (for TypeSignature::Exact), or
    • Coercion rules (for TypeSignature::Coercible).
  • Added Signature::from_parameter_variants:

    • accepts multiple parameter variants (supporting optional/trailing parameters naturally),

    • infers parameter_names from the longest variant,

    • builds the appropriate TypeSignature automatically:

      • TypeSignature::Nullary via an empty parameter list (vec![]),
      • TypeSignature::Exact via DataType parameters,
      • TypeSignature::Coercible via Coercion parameters,
      • TypeSignature::OneOf when multiple variants are provided,
    • validates that DataType and Coercion are not mixed within a single variant.

  • For other signature kinds (e.g. TypeSignature::Variadic, Uniform, Numeric, String, Comparable, Any, ArraySignature, UserDefined), existing dedicated constructors such as Signature::variadic, Signature::uniform, Signature::numeric, etc. should continue to be used.

  • Refactored the Unicode substr scalar UDF to use Signature::from_parameter_variants instead of manually constructing TypeSignature::OneOf(...) plus .with_parameter_names(...).

Here are other examples of possible refactoring:

Example 1: Pad Functions with Optional Fill Character

// lpad(str, length) OR lpad(str, length, fill)
let sig = Signature::from_parameter_variants(
    &[
        vec![("str", string.clone()), ("length", int64.clone())],
        vec![("str", string.clone()), ("length", int64.clone()), ("fill", string.clone())],
    ],
    Volatility::Immutable
)?;

// rpad works the same way
let sig = Signature::from_parameter_variants(
    &[
        vec![("str", string.clone()), ("length", int64.clone())],
        vec![("str", string.clone()), ("length", int64.clone()), ("fill", string.clone())],
    ],
    Volatility::Immutable
)?;

Example 2: Round with Optional Precision

use datafusion_common::types::{logical_float64, NativeType};

let float64 = Coercion::new_exact(TypeSignatureClass::Native(logical_float64()));

// round(value) OR round(value, precision)
let sig = Signature::from_parameter_variants(
    &[
        vec![("value", float64.clone())],
        vec![("value", float64.clone()), ("precision", int64.clone())],
    ],
    Volatility::Immutable
)?;

Example 3: Date/Time Extraction with Different Precisions

use datafusion_common::types::{logical_timestamp, NativeType};

let timestamp = Coercion::new_exact(TypeSignatureClass::Native(logical_timestamp()));

// date_trunc(precision, timestamp) with exact types
let sig = Signature::from_parameter_variants(
    &[vec![
        ("precision", DataType::Utf8),
        ("timestamp", timestamp.clone().into()),
    ]],
    Volatility::Immutable
)?;

Example 4: Replace with Optional Occurrence Count

// replace(str, from, to) OR replace(str, from, to, n)
let sig = Signature::from_parameter_variants(
    &[
        vec![
            ("str", string.clone()),
            ("from", string.clone()),
            ("to", string.clone()),
        ],
        vec![
            ("str", string.clone()),
            ("from", string.clone()),
            ("to", string.clone()),
            ("n", int64.clone()),
        ],
    ],
    Volatility::Immutable
)?;

Example 5: Array Functions with Optional Parameters

// array_position(array, element) OR array_position(array, element, start_index)
let any_type = Coercion::new_any();

let sig = Signature::from_parameter_variants(
    &[
        vec![("array", any_type.clone()), ("element", any_type.clone())],
        vec![
            ("array", any_type.clone()),
            ("element", any_type.clone()),
            ("start_index", int64.clone()),
        ],
    ],
    Volatility::Immutable
)?;

Example 6: Split with Optional Delimiter and Limit

// split(str, delimiter) OR split(str, delimiter, limit)
let sig = Signature::from_parameter_variants(
    &[
        vec![("str", string.clone()), ("delimiter", string.clone())],
        vec![
            ("str", string.clone()),
            ("delimiter", string.clone()),
            ("limit", int64.clone()),
        ],
    ],
    Volatility::Immutable
)?;

Example 7: To Timestamp with Optional Format

// to_timestamp(str) OR to_timestamp(str, format)
let sig = Signature::from_parameter_variants(
    &[
        vec![("str", string.clone())],
        vec![("str", string.clone()), ("format", string.clone())],
    ],
    Volatility::Immutable
)?;

Example 8: Aggregate Functions with Filtering

// sum(expression) OR sum(expression FILTER (WHERE condition))
// Note: This shows the pattern - actual implementation may vary
let numeric_any = Coercion::new_any();

let sig = Signature::from_parameter_variants(
    &[
        vec![("expression", numeric_any.clone())],
        vec![("expression", numeric_any.clone()), ("filter", DataType::Boolean.into())],
    ],
    Volatility::Immutable
)?;

Example 9: Regex Functions with Optional Flags

// regexp_replace(str, pattern, replacement) OR regexp_replace(str, pattern, replacement, flags)
let sig = Signature::from_parameter_variants(
    &[
        vec![
            ("str", string.clone()),
            ("pattern", string.clone()),
            ("replacement", string.clone()),
        ],
        vec![
            ("str", string.clone()),
            ("pattern", string.clone()),
            ("replacement", string.clone()),
            ("flags", string.clone()),
        ],
    ],
    Volatility::Immutable
)?;

Example 10: Window Functions with Optional Frame

use datafusion_common::types::{logical_int32, NativeType};

let int32 = Coercion::new_exact(TypeSignatureClass::Native(logical_int32()));

// lag(expression) OR lag(expression, offset) OR lag(expression, offset, default)
let any_type = Coercion::new_any();

let sig = Signature::from_parameter_variants(
    &[
        vec![("expression", any_type.clone())],
        vec![("expression", any_type.clone()), ("offset", int32.clone())],
        vec![
            ("expression", any_type.clone()),
            ("offset", int32.clone()),
            ("default", any_type.clone()),
        ],
    ],
    Volatility::Immutable
)?;
  • Added unit tests covering:

    • single-variant Exact signatures,
    • multiple variants producing OneOf,
    • inclusion of a Nullary variant,
    • coercion-based variants,
    • empty input error,
    • error on mixing DataType and Coercion within a variant,
    • volatility propagation.

Are these changes tested?

Yes.

  • Added new unit tests in datafusion/expr-common/src/signature.rs for Signature::from_parameter_variants.
  • Updated substr to use the new API and covered behavior via the signature-building tests.

Are there any user-facing changes?

No end-user behavior changes are intended.

  • This is an internal API ergonomics improvement for defining function signatures.
  • No query semantics or function results should change.

If there are any breaking changes to public APIs, please add the api change label.

LLM-generated code disclosure

This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.

Add ParameterKind plus with_parameter/with_parameters builders
on Signature to pair parameter names with types or coercions,
reusing arity validation. Document the migration path for
ergonomic signature construction.

Expand signature tests to include new builder success cases
and address mismatched counts, duplicate names, and
variadic failures. Refactor substr function signature setup
to utilize the new parameter builder, reducing duplication
in specifying coercions and names.
Eliminate duplicate iterations over the parameters array by
consolidating the extraction logic in substr.rs.
@github-actions github-actions bot added logical-expr Logical plan and expressions functions Changes to functions implementation labels Dec 28, 2025
@kosiew kosiew marked this pull request as ready for review December 29, 2025 12:45
Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can take this approach; it doesn't seem very ergonomic to essentially require the signature to be specified twice, once via the original way and then again via the parameters 🤔

@kosiew kosiew marked this pull request as draft December 30, 2025 09:12
…parameter handling and type signature building
…ameter variants

- Implemented `from_parameter_variants` method for the `Signature` struct to allow the creation of function signatures that accept multiple parameter configurations.
- Added internal methods for extracting parameter names, building type signatures for variants, and consolidating multiple type signatures.
- Enhanced documentation with examples on how to use the new method effectively and constraints on parameter name inference and variant requirements.
@kosiew
Copy link
Contributor Author

kosiew commented Dec 31, 2025

hi @Jefffrey

doesn't seem very ergonomic to essentially require the signature to be specified twice, once via the original way and then again via the parameters

How about:

       // signature for
       // substr(str, start_pos) OR substr(str, start_pos, length)
        Self {
            signature: Signature::from_parameter_variants(
                &[
                    vec![
                        ("str", string.clone()), 
                        ("start_pos", int64.clone()),
                    ],
                    vec![
                        ("str", string.clone()),
                        ("start_pos", int64.clone()),
                        ("length", int64.clone()),
                    ],
                ],
                Volatility::Immutable,
            )
       

@kosiew kosiew marked this pull request as ready for review December 31, 2025 06:39
@kosiew kosiew requested a review from Jefffrey December 31, 2025 06:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation logical-expr Logical plan and expressions

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More ergnomic way to specify (named) paramters in signature

2 participants