Skip to content

fix: Avoid overlay panic on valid Unicode input, Postgres compatibility#22046

Open
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/fix-overlay-unicode
Open

fix: Avoid overlay panic on valid Unicode input, Postgres compatibility#22046
neilconway wants to merge 1 commit intoapache:mainfrom
neilconway:neilc/fix-overlay-unicode

Conversation

@neilconway
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

overlay mixed up character-based and byte-based indexing, which resulted in either panicking or returning incorrect results on multi-byte Unicode input. While looking at this, I also fixed a few places where overlay's behavior was inconsistent with Postgres and the SQL spec, refactored the code, and added a benchmark.

What changes are included in this PR?

  • Fix panic or wrong results on multibyte Unicode input
  • Align edge-case behavior with Postgres: start past end appends replacement after the full input, start <= 0 errors
  • Refactored the code to reduce redundancy and clarify variable names
  • Add SLT tests for Unicode input and behavior changes for Postgres compatibility
  • Add benchmark, in advance of subsequent performance optimization work

Are these changes tested?

Yes, and new tests added.

Are there any user-facing changes?

Yes. The behavior of overlay has changed for multibyte Unicode inputs, as well as the edge cases enumerated above.

@github-actions github-actions Bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

overlay behavior inconsistent with Postgres / SQL spec overlay panics on valid utf-8 input

1 participant