Skip to content

perf: decode text-part transfer encoding only once#90

Merged
kurok merged 1 commit into
masterfrom
perf/decode-text-once
Jun 12, 2026
Merged

perf: decode text-part transfer encoding only once#90
kurok merged 1 commit into
masterfrom
perf/decode-text-once

Conversation

@kurok

@kurok kurok commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

What

text/plain and text/html parts were transfer-decoded twice:

  1. get_body_raw() → bytes for attachments[].content
  2. get_body() → string for text_plain/text_html, which re-runs the same base64/quoted-printable decode before applying the charset.

This decodes the transfer encoding once (get_body_raw()), reuses those bytes for both the attachment content and the text bodies, and applies only the charset step via a new decode_charset() helper — a faithful copy of mailparse's own internal get_body_as_string using the same charset crate (already a transitive dep). Output is byte-identical.

Benchmark

Targeted input — a ~2 MB base64-encoded text/html body (the path this change touches), 200 iterations, same input both builds:

build min median mean
master 8.308 ms 8.841 ms 8.965 ms
this PR 4.390 ms 4.706 ms 4.706 ms

~1.9× faster on base64/quoted-printable-encoded text bodies. The existing tests/benchmark (large_message.eml) is unchanged within noise — that corpus is dominated by base64 attachments, which carry a name param and were only ever decoded once.

Risk

  • Behavior-preserving: decode_charset replicates mailparse's charset logic exactly (same crate, same code path). For 7bit/8bit text it reduces to the prior get_as_string(raw); for base64/QP it reduces to get_decoded_as_string() minus the redundant transfer decode.
  • Transfer-decode errors still surface (now from the single get_body_raw()?), preserving the existing ParseError contract.
  • All 91 correctness tests + RFC corpus pass. cargo clippy --release clean.

text/plain and text/html parts were transfer-decoded twice: once via
get_body_raw() for attachments.content, then again via get_body() for
text_plain/text_html, which re-runs the identical base64/quoted-printable
decode before applying the charset.

Decode the transfer encoding once with get_body_raw() and reuse those bytes
for both the attachment content and the text bodies, applying only the charset
step via decode_charset() (a faithful copy of mailparse's internal
get_body_as_string, using the same charset crate). Output is byte-identical.

~1.9x faster on base64/quoted-printable-encoded text bodies (8.84ms -> 4.71ms
median on a ~2MB base64 text/html part); no measurable change on bodies that
are not transfer-encoded. All 91 correctness tests pass.

Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>
@kurok kurok merged commit b591fc2 into master Jun 12, 2026
7 checks passed
@kurok kurok deleted the perf/decode-text-once branch June 12, 2026 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant