Skip to content

Comments

Add TextDecoder support for x-user-defined encoding (fixes #6039)#6040

Merged
danlapid merged 2 commits intocloudflare:mainfrom
JosephDoUrden:fix/textdecoder-x-user-defined-encoding
Feb 14, 2026
Merged

Add TextDecoder support for x-user-defined encoding (fixes #6039)#6040
danlapid merged 2 commits intocloudflare:mainfrom
JosephDoUrden:fix/textdecoder-x-user-defined-encoding

Conversation

@JosephDoUrden
Copy link
Contributor

Summary

Adds support for the x-user-defined encoding to TextDecoder, as required by the WHATWG Encoding Standard and requested in #6039.

Behavior

  • 0x00–0x7F: Decoded as the same code point (ASCII identity).
  • 0x80–0xFF: Decoded to Unicode Private Use Area U+F780–U+F7FF (i.e. 0xF700 + byte).

This gives a simple, reversible single-byte mapping useful for legacy binary-over-string use cases (e.g. when you need an isomorphic byte↔code point mapping; latin1 is not suitable because it is mapped to windows-1252 and is not isomorphic).

Implementation

  • New XUserDefinedDecoder in encoding.h / encoding.c++, with an ASCII-only fast path and a slow path for bytes ≥ 0x80.
  • Label "x-user-defined" is registered in the encoding label table and handled in the TextDecoder constructor (no ICU).
  • Tests: x-user-defined in allTheDecoders, plus dedicated tests in encoding-test.js for decoding, streaming, and fatal mode.

Tests

  • api/tests/encoding-test.js: xUserDefinedDecode, xUserDefinedFatal, and x-user-defined in allTheDecoders.

Fixes #6039

Implements the x-user-defined decoder per WHATWG Encoding Standard.

- Map bytes 0x00–0x7F to identical ASCII code points
- Map bytes 0x80–0xFF to Unicode PUA U+F780–U+F7FF
- Add dedicated XUserDefinedDecoder with ASCII fast path (no ICU)
- Register "x-user-defined" label
- Wire through TextDecoder constructor, getImpl(), and decodePtr()
- Add unit tests for decoding, streaming, and fatal mode

Fixes cloudflare#6039
@JosephDoUrden JosephDoUrden requested review from a team as code owners February 7, 2026 10:45
@github-actions
Copy link

github-actions bot commented Feb 7, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@JosephDoUrden
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Feb 7, 2026
@danlapid
Copy link
Collaborator

danlapid commented Feb 7, 2026

Thanks for your contribution!
@jasnell @anonrig appreciate your review please

@anonrig
Copy link
Member

anonrig commented Feb 7, 2026

Linter and some tests seem to be failing. Can you look into it?

@jasnell
Copy link
Collaborator

jasnell commented Feb 7, 2026

@JosephDoUrden ... to run linting, if you have just installed you can run the linter with a simple just f command, otherwise you can use python3 tools/cross/format.py (which is what just f does)

@jasnell
Copy link
Collaborator

jasnell commented Feb 7, 2026

@anonrig:

Linter and some tests seem to be failing. Can you look into it?

I think only the lint issues are at issue. The test appear to have been a ci glitch.

@JosephDoUrden ... the "run internal build" one is one we'll have to run ourselves, just fyi. Thank you for the contribution!

…flare#6039)

Replace manual byte loop with simdutf::validate_ascii() when detecting
high bytes in XUserDefinedDecoder::decode. Fix JSG_REQUIRE line break
in TextDecoder::constructor to satisfy clang-format.
@JosephDoUrden
Copy link
Contributor Author

Formatting has been checked with just f (clang-format, Prettier, ruff, buildifier). All checks passed
@jasnell @anonrig

@JosephDoUrden JosephDoUrden requested a review from anonrig February 8, 2026 09:22
@danlapid danlapid merged commit 67f5d25 into cloudflare:main Feb 14, 2026
29 of 31 checks passed
@danlapid
Copy link
Collaborator

danlapid commented Feb 14, 2026

Thanks @JosephDoUrden !

@JosephDoUrden JosephDoUrden deleted the fix/textdecoder-x-user-defined-encoding branch February 15, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TextDecoder is missing x-user-defined encoding

4 participants