Skip to content

Fix qmd reader doubling backslashes in attribute values (issue #161, bd-tpjg)#164

Merged
cscheid merged 2 commits intomainfrom
bugfix/issue-161
May 7, 2026
Merged

Fix qmd reader doubling backslashes in attribute values (issue #161, bd-tpjg)#164
cscheid merged 2 commits intomainfrom
bugfix/issue-161

Conversation

@cscheid
Copy link
Copy Markdown
Member

@cscheid cscheid commented May 7, 2026

Fixes #161.

Summary

The qmd writer emits Pandoc-style escapes (\\\, "\") inside "..." attribute values, but the qmd reader (extract_quoted_text in crates/pampa/src/pandoc/treesitter_utils/text_helpers.rs) only un-escaped \" / \' and left every other \X verbatim. So each qmd → JSON → qmd cycle doubled the backslashes (\[\\[\\\\[ → ...), and any document with tbl-colwidths="\[N,N\]" accumulated backslashes on every round trip.

The asymmetry was on the reader side. The writer already matched Pandoc.

Change

extract_quoted_text now applies Pandoc's CommonMark rule: \X collapses to X when X is ASCII punctuation, otherwise the backslash is preserved literally. A trailing \ is also preserved. The same helper is used for CommonMark link titles, which follow the same escape convention, so the change is consistent for both call sites.

This changes the AST for inputs that contained backslashes inside attribute values — any consumer that relied on backslashes being preserved verbatim will see them collapsed (matching what Pandoc has always produced). Detailed before/after analysis in claude-notes/issue-reports/161/triage.md.

End-to-end verification

$ printf -- '::: {data-foo="\\[1,2\\]"}\nhello\n:::\n' | cargo run --quiet --bin pampa
[ Div ( "" , [] , [("data-foo", "[1,2]")] ) [Para [Str "hello"]] ]

$ printf -- '::: {data-foo="\\[1,2\\]"}\nhello\n:::\n' | cargo run --quiet --bin pampa -- -t qmd
::: {data-foo="[1,2]"}

hello

:::

Round-trip is now stable (matches Pandoc's markdown -> markdown output for the same input).

The original tbl-colwidths="\[40,60\]" pattern from the linked quarto-web docs was also exercised end-to-end and now stabilizes to tbl-colwidths="[40,60]" as expected.

Tests

  • Three new round-trip fixtures under crates/pampa/tests/roundtrip_tests/qmd-json-qmd/:
    • attr_value_backslash_escaped_punct.qmd — the reported case (\[1,2\])
    • attr_value_literal_backslash.qmd — literal backslash via \\
    • attr_value_escaped_quote.qmd — already-working \" case, locked in as a regression test
  • Unit tests in text_helpers.rs for extract_quoted_text covering quoted/unquoted, punctuation/non-punctuation, and trailing-backslash edge cases.

cargo xtask verify --skip-hub-build --skip-hub-tests --skip-trace-viewer-build --skip-trace-viewer-tests is clean (8417 Rust tests passing). The skipped JS steps are unrelated fresh-worktree bootstrap (no Rust→JS code path is touched here).

Test plan

  • Spot-check that the AST change doesn't break any downstream consumer that was working around the previous behavior (none expected, but worth a glance at quarto-web rendering).
  • If a reviewer wants stricter assurance, run cargo nextest run --workspace locally — already green here.

Commits

  • d31ae2dd — Triage record (claude-notes/issue-reports/161/).
  • 7b0f7e91 — Fix + tests.

🤖 Generated with Claude Code

…bute values (bd-tpjg)

Reproduces the round-trip backslash-doubling reported by @rundel. Pandoc
treats \\X as a generic backslash escape inside quoted attribute values;
pampa's reader (extract_quoted_text) only handles \" / \' and leaves \\
verbatim, so the writer's (correct) Pandoc-style escaping survives a
round trip and accumulates an extra backslash.

Includes repro fixtures and a triage note pinpointing the asymmetry in
crates/pampa/src/pandoc/treesitter_utils/text_helpers.rs (reader) vs.
crates/pampa/src/writers/qmd.rs (writer is correct).
@cscheid cscheid changed the title Triage: qmd reader doubles backslashes in attribute values (issue #161, bd-tpjg) qmd reader doubles backslashes in attribute values (issue #161, bd-tpjg) May 7, 2026
…-tpjg)

extract_quoted_text only un-escaped \" and \' inside "..." / '...'
attribute values; it left every other \X verbatim. The qmd writer
already emits Pandoc-style escapes (\ -> \\, " -> \"), so on each
round trip a doubled backslash survived as two backslashes and grew
again on the next cycle. Reported with tbl-colwidths="\[N,N\]" in
quarto-web docs (issue #161).

Replace the regex-based partial un-escape with a Pandoc-style scan:
\X collapses to X when X is ASCII punctuation, otherwise the
backslash is preserved literally. A trailing \ is preserved. Same
helper is also used for CommonMark link titles, which use the same
escape convention, so the change is correct for both call sites.

Add unit tests for extract_quoted_text and round-trip fixtures
(attr_value_backslash_escaped_punct.qmd, attr_value_literal_backslash.qmd,
attr_value_escaped_quote.qmd) under tests/roundtrip_tests/qmd-json-qmd/.

End-to-end: ::: {data-foo="\[1,2\]"} now parses as data-foo="[1,2]"
(was "\[1,2\]") and round-trips stably; tbl-colwidths="\[40,60\]"
in a table caption likewise stabilizes to "[40,60]". This changes
the AST shape relative to the previous (buggy) reader: any consumer
that relied on backslashes being preserved verbatim will see them
collapsed.
@cscheid cscheid changed the title qmd reader doubles backslashes in attribute values (issue #161, bd-tpjg) Fix qmd reader doubling backslashes in attribute values (issue #161, bd-tpjg) May 7, 2026
@cscheid cscheid merged commit e637a2f into main May 7, 2026
4 checks passed
@cscheid cscheid deleted the bugfix/issue-161 branch May 7, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qmd writer doubles backslashes in attribute values

1 participant