Fix qmd reader doubling backslashes in attribute values (issue #161, bd-tpjg)#164
Merged
Fix qmd reader doubling backslashes in attribute values (issue #161, bd-tpjg)#164
Conversation
…bute values (bd-tpjg) Reproduces the round-trip backslash-doubling reported by @rundel. Pandoc treats \\X as a generic backslash escape inside quoted attribute values; pampa's reader (extract_quoted_text) only handles \" / \' and leaves \\ verbatim, so the writer's (correct) Pandoc-style escaping survives a round trip and accumulates an extra backslash. Includes repro fixtures and a triage note pinpointing the asymmetry in crates/pampa/src/pandoc/treesitter_utils/text_helpers.rs (reader) vs. crates/pampa/src/writers/qmd.rs (writer is correct).
…-tpjg) extract_quoted_text only un-escaped \" and \' inside "..." / '...' attribute values; it left every other \X verbatim. The qmd writer already emits Pandoc-style escapes (\ -> \\, " -> \"), so on each round trip a doubled backslash survived as two backslashes and grew again on the next cycle. Reported with tbl-colwidths="\[N,N\]" in quarto-web docs (issue #161). Replace the regex-based partial un-escape with a Pandoc-style scan: \X collapses to X when X is ASCII punctuation, otherwise the backslash is preserved literally. A trailing \ is preserved. Same helper is also used for CommonMark link titles, which use the same escape convention, so the change is correct for both call sites. Add unit tests for extract_quoted_text and round-trip fixtures (attr_value_backslash_escaped_punct.qmd, attr_value_literal_backslash.qmd, attr_value_escaped_quote.qmd) under tests/roundtrip_tests/qmd-json-qmd/. End-to-end: ::: {data-foo="\[1,2\]"} now parses as data-foo="[1,2]" (was "\[1,2\]") and round-trips stably; tbl-colwidths="\[40,60\]" in a table caption likewise stabilizes to "[40,60]". This changes the AST shape relative to the previous (buggy) reader: any consumer that relied on backslashes being preserved verbatim will see them collapsed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #161.
Summary
The qmd writer emits Pandoc-style escapes (
\→\\,"→\") inside"..."attribute values, but the qmd reader (extract_quoted_textincrates/pampa/src/pandoc/treesitter_utils/text_helpers.rs) only un-escaped\"/\'and left every other\Xverbatim. So each qmd → JSON → qmd cycle doubled the backslashes (\[→\\[→\\\\[→ ...), and any document withtbl-colwidths="\[N,N\]"accumulated backslashes on every round trip.The asymmetry was on the reader side. The writer already matched Pandoc.
Change
extract_quoted_textnow applies Pandoc's CommonMark rule:\Xcollapses toXwhenXis ASCII punctuation, otherwise the backslash is preserved literally. A trailing\is also preserved. The same helper is used for CommonMark link titles, which follow the same escape convention, so the change is consistent for both call sites.This changes the AST for inputs that contained backslashes inside attribute values — any consumer that relied on backslashes being preserved verbatim will see them collapsed (matching what Pandoc has always produced). Detailed before/after analysis in
claude-notes/issue-reports/161/triage.md.End-to-end verification
Round-trip is now stable (matches Pandoc's
markdown -> markdownoutput for the same input).The original
tbl-colwidths="\[40,60\]"pattern from the linked quarto-web docs was also exercised end-to-end and now stabilizes totbl-colwidths="[40,60]"as expected.Tests
crates/pampa/tests/roundtrip_tests/qmd-json-qmd/:attr_value_backslash_escaped_punct.qmd— the reported case (\[1,2\])attr_value_literal_backslash.qmd— literal backslash via\\attr_value_escaped_quote.qmd— already-working\"case, locked in as a regression testtext_helpers.rsforextract_quoted_textcovering quoted/unquoted, punctuation/non-punctuation, and trailing-backslash edge cases.cargo xtask verify --skip-hub-build --skip-hub-tests --skip-trace-viewer-build --skip-trace-viewer-testsis clean (8417 Rust tests passing). The skipped JS steps are unrelated fresh-worktree bootstrap (no Rust→JS code path is touched here).Test plan
quarto-webrendering).cargo nextest run --workspacelocally — already green here.Commits
d31ae2dd— Triage record (claude-notes/issue-reports/161/).7b0f7e91— Fix + tests.🤖 Generated with Claude Code