Add support for DISTINCT with sort order in array_agg function#2808
Add support for DISTINCT with sort order in array_agg function#2808fulghum wants to merge 2 commits into
DISTINCT with sort order in array_agg function#2808Conversation
|
Ito Test Report ❌13 test cases ran. 1 failed, 1 additional finding, 11 passed. Across 13 array_agg-focused tests, 11 passed and 2 failed, indicating that most DISTINCT/ORDER BY behavior (including ALL semantics, per-group dedup isolation, high-cardinality runs from 20k to 200k values, and row-order invariance) is functioning correctly in real execution paths. The two confirmed production defects are a high-severity PR-introduced bug in server/expression/array_agg.go where DISTINCT deduplication uses fmt.Sprintf("%v") string keys and can incorrectly merge semantically distinct typed values, and a medium-severity bug where array_agg(... ) OVER () is parsed but window metadata is dropped so the query executes as a non-window aggregate instead of returning an explicit unsupported-window error. ❌ Failed (1)
|
| Category | Summary | Screenshot |
|---|---|---|
| Parser | 🟠 array_agg(DISTINCT ... ORDER BY ...) OVER () is accepted but executes as a plain aggregate instead of surfacing unsupported window behavior. |
![]() |
🟠 Window clause is dropped for array_agg
- What failed: The query succeeds and returns a non-window aggregate result instead of failing with an explicit unsupported window-function error.
- Impact: Windowed
array_aggqueries can silently return incorrect semantics, which can mislead analytical query consumers. Users receive successful but logically wrong behavior instead of a clear unsupported-path error. - Steps to reproduce:
- Connect to localhost:5432 and create a simple test_items table with text values.
- Run SELECT array_agg(DISTINCT name ORDER BY name) OVER () FROM test_items.
- Observe that the query succeeds and returns a plain aggregate output.
- Compare this behavior against the expected explicit unsupported window-function error.
- Stub / mock context: No stubs, mocks, or bypasses were applied for this test in the recorded run.
- Code analysis: In AST lowering,
node.WindowDefis parsed, but thearray_aggbranch buildsOrderedInjectedExprwithout bindingOver. RuntimeArrayAggexplicitly marks window mode unsupported via panics, confirming the unsupported path exists but is bypassed. - Why this is likely a bug: Production code parses window metadata and has an explicit unsupported window path, but
array_agglowering drops the window binding and incorrectly executes the query as a regular aggregate.
Relevant code:
server/ast/func_expr.go (lines 64-67)
windowDef, err := nodeWindowDef(ctx, node.WindowDef)
if err != nil {
return nil, err
}server/ast/func_expr.go (lines 116-123)
return &vitess.OrderedInjectedExpr{
InjectedExpr: vitess.InjectedExpr{
Expression: &pgexprs.ArrayAgg{Distinct: distinct},
SelectExprChildren: exprs,
Auth: vitess.AuthInformation{},
},
OrderBy: orderBy,
}, nilserver/expression/array_agg.go (lines 133-141)
func (a *ArrayAgg) NewWindowFunction(ctx *sql.Context) (sql.WindowFunction, error) {
panic("window functions not yet supported for array_agg")
}
func (a *ArrayAgg) WithWindow(ctx *sql.Context, window *sql.WindowDefinition) sql.WindowAdaptableExpression {
panic("window functions not yet supported for array_agg")
}Commit: 868cb62
Tell us how we did: Give Ito Feedback
|
Ito Diff Report ❌Tested: Across 14 total tests, 11 passed and 3 failed, confirming that array_agg(DISTINCT ...) is generally correct and deterministic for composite and grouped deduplication, ORDER BY sorting, cross-run stability, empty-input NULL behavior, large-volume workloads, legacy compatibility, and explicit fail-fast handling when IoOutput key generation errors are intentionally triggered. The key defects were a high-severity runtime failure on NULL inputs (IoOutput nil/type handling), a medium-severity correctness issue where semantically equal numerics like 1.0 and 1.00 are not deduplicated because keys are based on textual output, and a medium-severity parser/lowering bug where windowed array_agg OVER(...) executes instead of returning the expected unsupported-window error. ❌ Failures (3)
|
| Category | Summary | Screenshot |
|---|---|---|
| Distinct | Composite DISTINCT aggregation retained semantically distinct rows without collapse. | ![]() |
✅ Verified Passing (10)
↪️ Inherited from Prior Run (6)
Tests that passed in the prior run. c3 judged them unaffected by the diff and did not retest.
| Category | Summary | Screenshot |
|---|---|---|
| Distinct | Duplicate foo entries were deduplicated; result contained foo and bar once each. | N/A |
| Distinct | With 20,000 near-unique values, repeated DISTINCT aggregations completed successfully and returned complete counts without instability. | N/A |
| Group | 200000-row DISTINCT aggregation completed with full count and responsive server. | N/A |
| Parser | DISTINCT + ORDER BY parsed and returned sorted unique output {bar,baz,foo}. |
N/A |
| Parser | ALL-qualified array_agg with ORDER BY executed and preserved duplicates ({bar,foo,foo}). |
N/A |
| Parser | DISTINCT-only, ORDER BY-only, and DISTINCT+ORDER BY matrix matched expected semantics. | N/A |
Tell us how we did: Give Ito Feedback
|
Those looks like fairly major performance regressions, but your changes don't seem like they would have that much impact. |



























Fixes: #2334