Skip to content

Add support for DISTINCT with sort order in array_agg function#2808

Open
fulghum wants to merge 2 commits into
mainfrom
fulghum/2334
Open

Add support for DISTINCT with sort order in array_agg function#2808
fulghum wants to merge 2 commits into
mainfrom
fulghum/2334

Conversation

@fulghum
Copy link
Copy Markdown
Contributor

@fulghum fulghum commented Jun 4, 2026

Fixes: #2334

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Main PR
covering_index_scan_postgres 2475.29/s 2477.59/s 0.0%
groupby_scan_postgres 170.44/s 172.91/s +1.4%
index_join_postgres 873.38/s 869.31/s -0.5%
index_join_scan_postgres 1093.24/s 1101.02/s +0.7%
index_scan_postgres 31.67/s 31.74/s +0.2%
oltp_delete_insert_postgres 595.80/s 627.24/s +5.2%
oltp_insert 446.38/s 460.90/s +3.2%
oltp_point_select 3845.68/s 3840.93/s -0.2%
oltp_read_only 3902.03/s 3965.98/s +1.6%
oltp_read_write 2363.09/s ${\color{red}1071.89/s}$ ${\color{red}-54.7\%}$
oltp_update_index 310.26/s ${\color{red}166.88/s}$ ${\color{red}-46.3\%}$
oltp_update_non_index 506.22/s ${\color{lightgreen}597.81/s}$ ${\color{lightgreen}+18.0\%}$
oltp_write_only 1695.80/s ${\color{red}1411.90/s}$ ${\color{red}-16.8\%}$
select_random_points 2419.96/s 2461.59/s +1.7%
select_random_ranges 1443.15/s 1444.51/s 0.0%
table_scan_postgres 29.74/s 30.12/s +1.2%
types_delete_insert_postgres 566.36/s 548.09/s -3.3%
types_table_scan_postgres 13.23/s 13.48/s +1.8%

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Jun 4, 2026

Ito Test Report ❌

13 test cases ran. 1 failed, 1 additional finding, 11 passed.

Across 13 array_agg-focused tests, 11 passed and 2 failed, indicating that most DISTINCT/ORDER BY behavior (including ALL semantics, per-group dedup isolation, high-cardinality runs from 20k to 200k values, and row-order invariance) is functioning correctly in real execution paths. The two confirmed production defects are a high-severity PR-introduced bug in server/expression/array_agg.go where DISTINCT deduplication uses fmt.Sprintf("%v") string keys and can incorrectly merge semantically distinct typed values, and a medium-severity bug where array_agg(... ) OVER () is parsed but window metadata is dropped so the query executes as a non-window aggregate instead of returning an explicit unsupported-window error.

❌ Failed (1)
Category Summary Screenshot
Distinct ⚠️ Real bug confirmed: DISTINCT deduplication in array_agg uses string rendering as identity key, so semantically different typed values with identical %v output can be dropped. DISTINCT-1
⚠️ Distinct typed values collapse during array_agg deduplication
  • What failed: The implementation deduplicates on a string key instead of value identity, so different typed values that render the same string are treated as duplicates and one can be silently removed; expected behavior is to keep all semantically distinct values.
  • Impact: DISTINCT aggregation can return silently incorrect results, which can corrupt query correctness for analytics and application logic that depend on complete distinct sets.
  • Steps to reproduce:
    1. Create a fixture producing semantically distinct values that stringify the same.
    2. Run SELECT array_agg(DISTINCT crafted_value ORDER BY crafted_value) on that fixture.
    3. Compare returned count against logically distinct source values.
  • Stub / mock context: Real SCRAM sign-in was bypassed so the local SQL server could run fixture queries, but this failure comes from production DISTINCT keying logic in server/expression/array_agg.go rather than a mocked query path.
  • Code analysis: I inspected the production aggregation implementation and found DISTINCT uses a map[string]struct{} keyed by fmt.Sprintf("%v", evalRow[0]), which is collision-prone across typed values. I also verified the PR wiring enables this DISTINCT path for array_agg, so the defect is on a real production code path rather than test harness behavior.
  • Why this is likely a bug: Using formatted string output as DISTINCT identity is not type-safe, so it can collapse different values into one result, violating SQL DISTINCT semantics.

Relevant code:

server/expression/array_agg.go (lines 158-161)

type arrayAggBuffer struct {
	elements []sql.Row
	seen     map[string]struct{}
	a        *ArrayAgg
}

server/expression/array_agg.go (lines 202-208)

if a.a.Distinct {
		key := fmt.Sprintf("%v", evalRow[0])
		if _, exists := a.seen[key]; exists {
			return nil
		}
		a.seen[key] = struct{}{}
	}

server/ast/func_expr.go (lines 116-119)

return &vitess.OrderedInjectedExpr{
			InjectedExpr: vitess.InjectedExpr{
				Expression:         &pgexprs.ArrayAgg{Distinct: distinct},
				SelectExprChildren: exprs,
✅ Passed (11)
Category Summary Screenshot
Distinct Query returned sorted unique array {bar,baz,foo} from unsorted duplicated input rows. DISTINCT-2
Distinct Duplicate foo entries were deduplicated; result contained foo and bar once each. DISTINCT-3
Distinct Grouped result matched expectation: A -> {foo}, B -> {bar,foo}, showing no cross-group dedup leakage. DISTINCT-4
Distinct With 20,000 near-unique values, repeated DISTINCT aggregations completed successfully and returned complete counts without instability. DISTINCT-5
Group 200000-row DISTINCT aggregation completed with full count and responsive server. GROUP-1
Group Grouped DISTINCT ordering returned expected per-category arrays. GROUP-2
Group Repeated grouped runs stayed stable with no cross-group state leakage. GROUP-3
Group ASC/DESC input-order variants produced identical DISTINCT ordered signatures. GROUP-4
Parser DISTINCT + ORDER BY parsed and returned sorted unique output {bar,baz,foo}. PARSER-1
Parser ALL-qualified array_agg with ORDER BY executed and preserved duplicates ({bar,foo,foo}). PARSER-3
Parser DISTINCT-only, ORDER BY-only, and DISTINCT+ORDER BY matrix matched expected semantics. PARSER-4
ℹ️ Additional Findings (1)

These findings are unrelated to the current changes but were observed during testing.

Category Summary Screenshot
Parser 🟠 array_agg(DISTINCT ... ORDER BY ...) OVER () is accepted but executes as a plain aggregate instead of surfacing unsupported window behavior. PARSER-2
🟠 Window clause is dropped for array_agg
  • What failed: The query succeeds and returns a non-window aggregate result instead of failing with an explicit unsupported window-function error.
  • Impact: Windowed array_agg queries can silently return incorrect semantics, which can mislead analytical query consumers. Users receive successful but logically wrong behavior instead of a clear unsupported-path error.
  • Steps to reproduce:
    1. Connect to localhost:5432 and create a simple test_items table with text values.
    2. Run SELECT array_agg(DISTINCT name ORDER BY name) OVER () FROM test_items.
    3. Observe that the query succeeds and returns a plain aggregate output.
    4. Compare this behavior against the expected explicit unsupported window-function error.
  • Stub / mock context: No stubs, mocks, or bypasses were applied for this test in the recorded run.
  • Code analysis: In AST lowering, node.WindowDef is parsed, but the array_agg branch builds OrderedInjectedExpr without binding Over. Runtime ArrayAgg explicitly marks window mode unsupported via panics, confirming the unsupported path exists but is bypassed.
  • Why this is likely a bug: Production code parses window metadata and has an explicit unsupported window path, but array_agg lowering drops the window binding and incorrectly executes the query as a regular aggregate.

Relevant code:

server/ast/func_expr.go (lines 64-67)

windowDef, err := nodeWindowDef(ctx, node.WindowDef)
if err != nil {
    return nil, err
}

server/ast/func_expr.go (lines 116-123)

return &vitess.OrderedInjectedExpr{
    InjectedExpr: vitess.InjectedExpr{
        Expression:         &pgexprs.ArrayAgg{Distinct: distinct},
        SelectExprChildren: exprs,
        Auth:               vitess.AuthInformation{},
    },
    OrderBy: orderBy,
}, nil

server/expression/array_agg.go (lines 133-141)

func (a *ArrayAgg) NewWindowFunction(ctx *sql.Context) (sql.WindowFunction, error) {
    panic("window functions not yet supported for array_agg")
}

func (a *ArrayAgg) WithWindow(ctx *sql.Context, window *sql.WindowDefinition) sql.WindowAdaptableExpression {
    panic("window functions not yet supported for array_agg")
}

Commit: 868cb62

View Full Run


Tell us how we did: Give Ito Feedback

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Main PR
Total 42090 42090
Successful 18203 18204
Failures 23887 23886
Partial Successes1 5305 5305
Main PR
Successful 43.2478% 43.2502%
Failures 56.7522% 56.7498%

${\color{lightgreen}Progressions (1)}$

subselect

QUERY: select count(*) from tenk1 t
where (exists(select 1 from tenk1 k where k.unique1 = t.unique2) or ten < 0);

Footnotes

  1. These are tests that we're marking as Successful, however they do not match the expected output in some way. This is due to small differences, such as different wording on the error messages, or the column names being incorrect while the data itself is correct.

@fulghum fulghum requested a review from Hydrocharged June 4, 2026 21:34
Copy link
Copy Markdown
Collaborator

@Hydrocharged Hydrocharged left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread server/expression/array_agg.go Outdated
@itoqa
Copy link
Copy Markdown

itoqa Bot commented Jun 5, 2026

Ito Diff Report ❌

Tested: 868cb62e0493bd
14 test cases ran this commit: 10 passed ✅, 1 fixed 🔧, 3 failed ❌.
↪️ Carried forward from prior run (not retested this commit): 6 passing.

Across 14 total tests, 11 passed and 3 failed, confirming that array_agg(DISTINCT ...) is generally correct and deterministic for composite and grouped deduplication, ORDER BY sorting, cross-run stability, empty-input NULL behavior, large-volume workloads, legacy compatibility, and explicit fail-fast handling when IoOutput key generation errors are intentionally triggered. The key defects were a high-severity runtime failure on NULL inputs (IoOutput nil/type handling), a medium-severity correctness issue where semantically equal numerics like 1.0 and 1.00 are not deduplicated because keys are based on textual output, and a medium-severity parser/lowering bug where windowed array_agg OVER(...) executes instead of returning the expected unsupported-window error.

❌ Failures (3)
Origin Category Severity Summary Screenshot
🆕 New Distinct ⚠️ High array_agg(DISTINCT v) over (NULL),(NULL),('x') errors with IoOutput nil handling instead of returning one NULL and one non-NULL value. DISTINCT-7
🆕 New Distinct 🟠 Medium DISTINCT dedup keys use string output, so semantically equal numerics like 1.0 and 1.00 are retained as separate values. DISTINCT-8
🔻 Still broken (verified) Parser 🟠 Medium array_agg window usage is accepted and executed instead of returning an unsupported window error. PARSER-2
⚠️ NULL values trigger DISTINCT key generation error
  • What failed: The query raises an IoOutput type error for NULL instead of deduplicating to a valid array containing one NULL and 'x'.
  • Impact: DISTINCT aggregations over nullable columns can fail at runtime, breaking a core query path. Users cannot complete affected analytics or reporting queries without rewriting data or SQL.
  • Steps to reproduce:
    1. Create a table containing values NULL, NULL, and 'x'.
    2. Run SELECT array_agg(DISTINCT v) FROM that table.
    3. Observe that the query returns an IoOutput type error instead of a deduplicated array.
  • Stub / mock context: Authentication and startup checks were temporarily bypassed so the local server could run, then DISTINCT SQL assertions were executed directly against the engine. No query-result stubs or route-level response mocks were used for this test's aggregate outcomes.
  • Code analysis: I inspected DISTINCT key generation in server/expression/array_agg.go and type output conversion in server/types/type.go. The DISTINCT branch always calls IoOutput for keying and returns immediately on error, and IoOutput rejects non-string output (including nil-path outcomes), matching the observed runtime failure.
  • Why this is likely a bug: The production DISTINCT path has no NULL-safe key strategy and explicitly propagates IoOutput errors, directly explaining the failure for valid nullable input.

Relevant code:

server/expression/array_agg.go (lines 201-211)

if a.a.Distinct {
    exprType := a.a.selectExprs[0].Type(ctx).(*types.DoltgresType)
    key, err := exprType.IoOutput(ctx, evalRow[0])
    if err != nil {
        return err
    }
    if _, exists := a.seen[key]; exists {
        return nil
    }
    a.seen[key] = struct{}{}
}

server/types/type.go (lines 599-617)

func (t *DoltgresType) IoOutput(ctx *sql.Context, val any) (string, error) {
    var o any
    var err error
    if t.ModInFunc != 0 || t.IsArrayType() || t.IsCompositeType() {
        send := globalFunctionRegistry.GetFunction(ctx, t.OutputFunc)
        resolvedTypes := send.ResolvedTypes()
        resolvedTypes[0] = t
        o, err = send.WithResolvedTypes(resolvedTypes).(QuickFunction).CallVariadic(ctx, val)
    } else {
        o, err = globalFunctionRegistry.GetFunction(ctx, t.OutputFunc).CallVariadic(ctx, val)
    }
    var ok bool
    os, ok, err := sql.Unwrap[string](ctx, o)
    if !ok {
        return "", errors.Errorf("unexpected type for io output, expected string, got %T", val)
    }
}
🟠 Numeric semantic duplicates survive DISTINCT deduplication
  • What failed: DISTINCT membership is keyed by textual output rather than numeric semantic equality, so equal numeric values with different formatting are treated as different keys.
  • Impact: Numeric DISTINCT results can contain duplicates that should be collapsed, reducing correctness of query outputs. Users may make wrong decisions based on inflated distinct-value sets.
  • Steps to reproduce:
    1. Insert numeric values including semantically equal textual forms such as 1.0 and 1.00.
    2. Run SELECT array_agg(DISTINCT v) FROM the numeric fixture table.
    3. Observe that semantically equal numbers are retained as separate array members.
  • Stub / mock context: Authentication and startup checks were temporarily bypassed so the local server could run, then DISTINCT SQL assertions were executed directly against the engine. No query-result stubs or route-level response mocks were used for this test's aggregate outcomes.
  • Code analysis: I inspected the DISTINCT map key path in server/expression/array_agg.go and numeric output formatting in server/functions/numeric.go. The key is a string from IoOutput, and numeric_out returns formatted decimal text (res.Text('f')), which can preserve formatting differences (for example 1.0 vs 1.00) even when values are semantically equal.
  • Why this is likely a bug: DISTINCT correctness should follow SQL value equality, but production code currently deduplicates by output text and therefore can keep semantically duplicate numeric values.

Relevant code:

server/expression/array_agg.go (lines 201-210)

if a.a.Distinct {
    exprType := a.a.selectExprs[0].Type(ctx).(*types.DoltgresType)
    key, err := exprType.IoOutput(ctx, evalRow[0])
    if err != nil {
        return err
    }
    if _, exists := a.seen[key]; exists {
        return nil
    }
    a.seen[key] = struct{}{}
}

server/functions/numeric.go (lines 65-74)

Callable: func(ctx *sql.Context, t [2]*pgtypes.DoltgresType, val any) (any, error) {
    typ := t[0]
    dec := val.(*apd.Decimal)
    tm := typ.GetAttTypMod()
    res, err := pgtypes.GetNumericValueWithTypmod(dec, tm)
    if err != nil {
        return nil, err
    }
    return res.Text('f'), nil
},
🟠 Windowed array_agg silently executes
  • What failed: The query executes as a normal aggregate and returns {alpha,beta}; expected behavior is an explicit unsupported window-function error for array_agg.
  • Impact: Users get incorrect feature signaling and query semantics for windowed array_agg, which can hide unsupported behavior behind successful results. This can mislead application logic that expects a clear unsupported error.
  • Steps to reproduce:
    1. Create a simple test_items table with duplicate text values.
    2. Run SELECT array_agg(DISTINCT name ORDER BY name) OVER () FROM test_items;.
    3. Observe that the query returns a normal aggregate result instead of an unsupported window-function error.
  • Stub / mock context: Authentication was bypassed for this run by disabling SCRAM checks and hardcoding local postgres/password credentials, and required roles were force-restored so the local server could start for query verification.
  • Code analysis: I reviewed parser/lowering and aggregate execution paths. nodeFuncExpr computes windowDef but the array_agg branch returns OrderedInjectedExpr without attaching Over, so window metadata is dropped before execution. The aggregate implementation explicitly panics for window adaptation, confirming window mode is intentionally unsupported but never reached for this path.
  • Why this is likely a bug: Production code computes window metadata but drops it for array_agg, bypassing the explicit unsupported-window path and producing incorrect successful execution.

Relevant code:

server/ast/func_expr.go (lines 64-67)

windowDef, err := nodeWindowDef(ctx, node.WindowDef)
if err != nil {
  return nil, err
}

server/ast/func_expr.go (lines 107-123)

case "array_agg":
  var orderBy vitess.OrderBy
  if len(node.OrderBy) > 0 {
    orderBy, err = nodeOrderBy(ctx, node.OrderBy)
    if err != nil {
      return nil, err
    }
  }

  return &vitess.OrderedInjectedExpr{
    InjectedExpr: vitess.InjectedExpr{
      Expression:         &pgexprs.ArrayAgg{Distinct: distinct},
      SelectExprChildren: exprs,
      Auth:               vitess.AuthInformation{},
    },
    OrderBy: orderBy,
  }, nil

server/expression/array_agg.go (lines 132-140)

// NewWindowFunction implements sql.WindowAdaptableExpression
func (a *ArrayAgg) NewWindowFunction(ctx *sql.Context) (sql.WindowFunction, error) {
  panic("window functions not yet supported for array_agg")
}

// WithWindow implements sql.WindowAdaptableExpression
func (a *ArrayAgg) WithWindow(ctx *sql.Context, window *sql.WindowDefinition) sql.WindowAdaptableExpression {
  panic("window functions not yet supported for array_agg")
}
✅ Bugs Fixed (1)
Category Summary Screenshot
Distinct Composite DISTINCT aggregation retained semantically distinct rows without collapse. DISTINCT-1
✅ Verified Passing (10)
Category Summary Screenshot
Distinct DISTINCT + ORDER BY returned a deduplicated ascending array {bar,baz,foo}. DISTINCT-2
Distinct Grouped DISTINCT aggregation kept deduplication state isolated per category. DISTINCT-4
Distinct DISTINCT aggregation correctly failed fast with an explicit IoOutput error on the error-triggering fixture. DISTINCT-6
Distinct Representative legacy scalar and composite DISTINCT workloads matched expected outputs without silent regression. DISTINCT-9
Group Grouped DISTINCT arrays stayed deduplicated and sorted per category without cross-group leakage. GROUP-2
Group Grouped DISTINCT arrays stayed isolated per category across repeated runs. GROUP-3
Group Equivalent logical inputs with different row arrival orders produced the same DISTINCT ordered arrays across repeated runs. GROUP-4
Order Composite point_t DISTINCT aggregation returned the same sorted unique array in repeated executions. ORDER-1
Order Empty ordered DISTINCT aggregate returned SQL NULL as expected. ORDER-2
Order Large composite DISTINCT ORDER BY workload stayed deterministic and deduplicated (280 unique composites). ORDER-3
↪️ Inherited from Prior Run (6)

Tests that passed in the prior run. c3 judged them unaffected by the diff and did not retest.

Category Summary Screenshot
Distinct Duplicate foo entries were deduplicated; result contained foo and bar once each. N/A
Distinct With 20,000 near-unique values, repeated DISTINCT aggregations completed successfully and returned complete counts without instability. N/A
Group 200000-row DISTINCT aggregation completed with full count and responsive server. N/A
Parser DISTINCT + ORDER BY parsed and returned sorted unique output {bar,baz,foo}. N/A
Parser ALL-qualified array_agg with ORDER BY executed and preserved duplicates ({bar,foo,foo}). N/A
Parser DISTINCT-only, ORDER BY-only, and DISTINCT+ORDER BY matrix matched expected semantics. N/A

View Full Run


Tell us how we did: Give Ito Feedback

@Hydrocharged
Copy link
Copy Markdown
Collaborator

Those looks like fairly major performance regressions, but your changes don't seem like they would have that much impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parser error: ORDER BY inside array_agg aggregate function

2 participants