perf: decode text-part transfer encoding only once by kurok · Pull Request #90 · namecheap/fast_mail_parser

kurok · 2026-06-12T22:46:46Z

What

text/plain and text/html parts were transfer-decoded twice:

get_body_raw() → bytes for attachments[].content
get_body() → string for text_plain/text_html, which re-runs the same base64/quoted-printable decode before applying the charset.

This decodes the transfer encoding once (get_body_raw()), reuses those bytes for both the attachment content and the text bodies, and applies only the charset step via a new decode_charset() helper — a faithful copy of mailparse's own internal get_body_as_string using the same charset crate (already a transitive dep). Output is byte-identical.

Benchmark

Targeted input — a ~2 MB base64-encoded text/html body (the path this change touches), 200 iterations, same input both builds:

build	min	median	mean
master	8.308 ms	8.841 ms	8.965 ms
this PR	4.390 ms	4.706 ms	4.706 ms

~1.9× faster on base64/quoted-printable-encoded text bodies. The existing tests/benchmark (large_message.eml) is unchanged within noise — that corpus is dominated by base64 attachments, which carry a name param and were only ever decoded once.

Risk

Behavior-preserving: decode_charset replicates mailparse's charset logic exactly (same crate, same code path). For 7bit/8bit text it reduces to the prior get_as_string(raw); for base64/QP it reduces to get_decoded_as_string() minus the redundant transfer decode.
Transfer-decode errors still surface (now from the single get_body_raw()?), preserving the existing ParseError contract.
All 91 correctness tests + RFC corpus pass. cargo clippy --release clean.

text/plain and text/html parts were transfer-decoded twice: once via get_body_raw() for attachments.content, then again via get_body() for text_plain/text_html, which re-runs the identical base64/quoted-printable decode before applying the charset. Decode the transfer encoding once with get_body_raw() and reuse those bytes for both the attachment content and the text bodies, applying only the charset step via decode_charset() (a faithful copy of mailparse's internal get_body_as_string, using the same charset crate). Output is byte-identical. ~1.9x faster on base64/quoted-printable-encoded text bodies (8.84ms -> 4.71ms median on a ~2MB base64 text/html part); no measurable change on bodies that are not transfer-encoded. All 91 correctness tests pass. Signed-off-by: yuriyryabikov <22548029+kurok@users.noreply.github.com>

kurok merged commit b591fc2 into master Jun 12, 2026
7 checks passed

kurok deleted the perf/decode-text-once branch June 12, 2026 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: decode text-part transfer encoding only once#90

perf: decode text-part transfer encoding only once#90
kurok merged 1 commit into
masterfrom
perf/decode-text-once

kurok commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kurok commented Jun 12, 2026

What

Benchmark

Risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant