fix: Validate the presence of null unicode in output by gregfurman · Pull Request #3164 · hatchet-dev/hatchet

gregfurman · 2026-03-04T23:35:30Z

Description

This PR addresses the hatchet server and python SDK incorrectly interpreting escaped unicode as invalid.

For example, validations would interpret "\\u0000" (which is actually [\, u, 0, 0, 0, 0]) as the unicode for\u0000.

Fixes #3153

Type of change

Bug fix (non-breaking change which fixes an issue)

What's Changed

Fixes ValidateJSONB to account for properly escaped unicode (i.e "\\u0000").
Updates python SDK to check for null unicode before serialization using new contains_null_unicode_character util.

vercel · 2026-03-04T23:35:35Z

@gregfurman is attempting to deploy a commit to the Hatchet Team on Vercel.

A member of the Team first needs to authorize it.

promptless-for-oss · 2026-03-06T08:46:41Z

📝 Documentation updates detected!

New suggestion: Add troubleshooting section for null unicode character errors

Tip: Enable auto-create PR in your Docs Collection to publish suggestions automatically 🤖

mrkaye97 · 2026-03-06T16:58:57Z

-        if "\\u0000" in serialized_output:
-            raise IllegalTaskOutputError(dedent(f"""
-                Task outputs cannot contain the unicode null character \\u0000
-


it might be easier to leave the check as it was before (using the serialized_output) instead of moving it up, because the way this is implemented we need to have the serialization code in multiple places now. WDYT?

The issue here is that this dump_json serialises the unicode character in the output string, making it difficult to tell whether this is an illegal character ("\u0000" == "\x00") or a correctly escaped string ("\\u0000").

Like, the following cases are false-positives that are incorrectly rejected by the string-based check, despite the unicode char not being present

import json assert "\\u0000" in json.dumps({"foo\\u0000":"bar"}) assert "\\u0000" in json.dumps({"foo\\\\u0000":"bar"})

TBH all of this makes me think that we shouldn't even be handling this validation on the SDK/client side, instead just allowing the server to reject invalid requests...

ah yeah, makes sense 🤔

I think that doing this only server-side makes sense if we can raise this same sort of error on the SDK, but it might be complicated because of how we're sending payloads to the server, which I don't think will allow us to raise an error from the task. there's also an argument that raising the error from the task doesn't really help at all anyways, so if you subscribe to that theory, I'm open to removing this check on the SDK!

The error response from the server describes this. Unsure how clear it'll be to a user though 👇

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.INVALID_ARGUMENT details = "Invalid request: encoded jsonb contains invalid null character \u0000 in field `payload`" debug_error_string = "UNKNOWN:Error received from peer {grpc_status:3, grpc_message:"Invalid request: encoded jsonb contains invalid null character \\u0000 in field `payload`"}" >

yeah, I think the issue here is that if you e.g. had OTel set up or were using something like Sentry, you'd see this error in your workers logs but I don't think (need to verify) that it'd be tied to a task, which might make it hard to debug. I'm sure we can work around this somehow though

Ah sorry yeah I think that's what I was suggesting might not work - you mean catching the error from the server on the worker, right? We can do that, but I think the issue is in that case, the error won't be tied to the task that caused it without a bigger refactor (assuming I'm not missing anything here, happy to be wrong on this one!)

Makes sense. Sounds like exclusively handing validations server-side is out-of-scope for this PR. Also, we should probably be additionally validating the input payload for this \u0000 value, unless we're OK with a gRPC error being returned.

Otherwise, are you happy with the current approach? Or do you think we need some e2e testing to be safe?

yep! happy with it but don't love the contains_null_unicode_character as-is since it'll be annoying to maintain in the future I think since it differs from the TypeAdapter logic we have elsewhere heh

So should we try and include this null unicode check as a pydantic validation step?

if the type adapter can do that, that'd be awesome - if not, there might be no free lunch here and we either need to have it as-is or do it post-serialization on the bytes / string directly which would be unfortunate as you mentioned before

mrkaye97 · 2026-03-06T16:59:35Z

+    assert contains_null_unicode_character(3.14) is False
+    assert contains_null_unicode_character(True) is False
+
+    from pydantic import BaseModel


I think we should pull these imports up to the top level

mrkaye97

qq: did you try creating a task that returns something with the null unicode sequence and allowing it to get through the SDK (can just comment out the check) to make sure it errors correctly instead of being rejected by the db?

gregfurman · 2026-03-09T11:48:06Z

qq: did you try creating a task that returns something with the null unicode sequence and allowing it to get through the SDK (can just comment out the check) to make sure it errors correctly instead of being rejected by the db?

Yeah I did. The server returns a gRPC error that is raised:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INVALID_ARGUMENT
        details = "Invalid request: encoded jsonb contains invalid null character \u0000 in field `payload`"
        debug_error_string = "UNKNOWN:Error received from peer  {grpc_status:3, grpc_message:"Invalid request: encoded jsonb contains invalid null character \\u0000 in field `payload`"}"

Otherwise, without the SDK nor Golang check, Postgres rejects the payload with the following error:

2026-03-09T13:44:28.375+02:00 ERR error pre-acking message error="could not trigger workflows from names: failed to create tasks: failed to create tasks for step id cd80d7e1-00a5-40aa-88e5-02863d3a43c9: ERROR: unsupported Unicode escape sequence (SQLSTATE 22P05)" service=server

Should I add an e2e test case for this?

gregfurman · 2026-03-09T17:47:04Z

@mrkaye97 I've reworked the approach to instead use pydantic's WrapSerializer to automatically validate upon dump_json.

Tbh this is not the cleanest approach, and it'd probably be better to use a BaseModel type with field validation over this Annotated hack, but this approach is backwards compatible + will validate input and output payloads.

Lmk if you're happy with this, else I'm open to suggestions for how to close that original issue 😅

mrkaye97 · 2026-03-09T22:52:58Z

@mrkaye97 I've reworked the approach to instead use pydantic's WrapSerializer to automatically validate upon dump_json.

Tbh this is not the cleanest approach, and it'd probably be better to use a BaseModel type with field validation over this Annotated hack, but this approach is backwards compatible + will validate input and output payloads.

Lmk if you're happy with this, else I'm open to suggestions for how to close that original issue 😅

Hmm... what happens if the validator is a dataclass, or is dict or something? I don't love this but I do prefer it to listing out all the possible validators though. If it works with dataclass, typeddict, etc., I'm good with it

I also have to wonder: Is it possible to use regex or some such with the original solution? I feel like there has to be a more straightforward way to do this.... 😅

jfaust · 2026-03-11T18:31:23Z

I think something like this on the serialized output would work

# Matches \u0000 in JSON that represents an actual null byte, NOT the escaped text
# \\u0000. In JSON, a real null byte is serialized as \u0000, while the literal text
# \u0000 is serialized as \\u0000. We match \u0000 preceded by an even number of
# backslashes (including zero) — an odd count means the backslash before 'u' is itself
# escaped, making it literal text rather than a null escape.
_REAL_NULL_ESCAPE_RE = re.compile(r"(?:^|[^\\])(?:\\\\)*\\u0000")


def _json_contains_null_byte(serialized: str) -> bool:
    """Check if a JSON string contains an escaped null byte (\\u0000) that represents
    an actual null character, as opposed to the literal text "\\u0000".

    JSON serializers escape actual null bytes as \\u0000, but they escape the literal
    text "\\u0000" as \\\\u0000. The naive substring check `"\\u0000" in s` matches
    both, causing false positives.
    """
    return bool(_REAL_NULL_ESCAPE_RE.search(serialized))

gregfurman · 2026-03-12T16:19:14Z

@mrkaye97 OK have pushed a regex change instead of the (ridiculously) convoluted pydantic typing approach.

Also, @jfaust thanks for the suggestion! I've adapted your regex to use a negative look-behind which simplifies things a bit 🙂

jfaust · 2026-03-12T16:22:51Z

Nice! That is definitely cleaner.

mrkaye97

great stuff! mind bumping the patch version of the SDK and writing a changelog entry for this?

and /ht @jfaust for the report + help here too :)

gregfurman self-assigned this Mar 4, 2026

gregfurman force-pushed the fix/repository/jsonb-unicode branch 4 times, most recently from 50e0462 to 7df3a7c Compare March 5, 2026 13:29

gregfurman requested a review from mrkaye97 March 6, 2026 08:40

gregfurman marked this pull request as ready for review March 6, 2026 08:41

gregfurman requested a review from mnafees March 6, 2026 14:52

mrkaye97 reviewed Mar 6, 2026

View reviewed changes

gregfurman requested a review from mrkaye97 March 9, 2026 17:42

gregfurman force-pushed the fix/repository/jsonb-unicode branch from 741bbdb to 7c9820b Compare March 12, 2026 16:13

mrkaye97 approved these changes Mar 12, 2026

View reviewed changes

gregfurman requested a review from mrkaye97 March 12, 2026 16:51

gregfurman force-pushed the fix/repository/jsonb-unicode branch from 67064db to 63020d5 Compare March 12, 2026 16:53

mrkaye97 approved these changes Mar 12, 2026

View reviewed changes

gregfurman force-pushed the fix/repository/jsonb-unicode branch 4 times, most recently from 3a21c40 to def1391 Compare March 13, 2026 16:44

fix: Validate the presence of null unicode in output

718c10f

gregfurman added 2 commits March 14, 2026 13:14

chore: Update changelog and bump version

fea91ac

add: generated output

d20d75c

gregfurman force-pushed the fix/repository/jsonb-unicode branch from def1391 to d20d75c Compare March 14, 2026 11:14

gregfurman merged commit 794fbe7 into hatchet-dev:main Mar 14, 2026
45 of 46 checks passed

gregfurman deleted the fix/repository/jsonb-unicode branch March 14, 2026 13:26

Conversation

gregfurman commented Mar 4, 2026

Description

Type of change

What's Changed

Uh oh!

vercel Bot commented Mar 4, 2026

Uh oh!

promptless-for-oss commented Mar 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregfurman Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregfurman Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gregfurman Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrkaye97 left a comment

Choose a reason for hiding this comment

Uh oh!

gregfurman commented Mar 9, 2026

Uh oh!

gregfurman commented Mar 9, 2026

Uh oh!

mrkaye97 commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jfaust commented Mar 11, 2026

Uh oh!

gregfurman commented Mar 12, 2026

Uh oh!

jfaust commented Mar 12, 2026

Uh oh!

mrkaye97 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gregfurman Mar 9, 2026 •

edited

Loading

gregfurman Mar 9, 2026 •

edited

Loading

gregfurman Mar 9, 2026 •

edited

Loading

mrkaye97 commented Mar 9, 2026 •

edited

Loading