Add TextDecoder support for x-user-defined encoding (fixes #6039)#6040
Merged
danlapid merged 2 commits intocloudflare:mainfrom Feb 14, 2026
Merged
Conversation
Implements the x-user-defined decoder per WHATWG Encoding Standard. - Map bytes 0x00–0x7F to identical ASCII code points - Map bytes 0x80–0xFF to Unicode PUA U+F780–U+F7FF - Add dedicated XUserDefinedDecoder with ASCII fast path (no ICU) - Register "x-user-defined" label - Wire through TextDecoder constructor, getImpl(), and decodePtr() - Add unit tests for decoding, streaming, and fatal mode Fixes cloudflare#6039
|
All contributors have signed the CLA ✍️ ✅ |
Contributor
Author
|
I have read the CLA Document and I hereby sign the CLA |
Collaborator
Member
|
Linter and some tests seem to be failing. Can you look into it? |
anonrig
reviewed
Feb 7, 2026
Collaborator
|
@JosephDoUrden ... to run linting, if you have |
Collaborator
I think only the lint issues are at issue. The test appear to have been a ci glitch. @JosephDoUrden ... the "run internal build" one is one we'll have to run ourselves, just fyi. Thank you for the contribution! |
…flare#6039) Replace manual byte loop with simdutf::validate_ascii() when detecting high bytes in XUserDefinedDecoder::decode. Fix JSG_REQUIRE line break in TextDecoder::constructor to satisfy clang-format.
Contributor
Author
anonrig
approved these changes
Feb 8, 2026
Collaborator
|
Thanks @JosephDoUrden ! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds support for the x-user-defined encoding to
TextDecoder, as required by the WHATWG Encoding Standard and requested in #6039.Behavior
0xF700 + byte).This gives a simple, reversible single-byte mapping useful for legacy binary-over-string use cases (e.g. when you need an isomorphic byte↔code point mapping;
latin1is not suitable because it is mapped to windows-1252 and is not isomorphic).Implementation
XUserDefinedDecoderinencoding.h/encoding.c++, with an ASCII-only fast path and a slow path for bytes ≥ 0x80."x-user-defined"is registered in the encoding label table and handled in the TextDecoder constructor (no ICU).x-user-definedinallTheDecoders, plus dedicated tests inencoding-test.jsfor decoding, streaming, and fatal mode.Tests
api/tests/encoding-test.js:xUserDefinedDecode,xUserDefinedFatal, andx-user-definedinallTheDecoders.Fixes #6039