Fix PHP highlighting for comment-like strings#315456
Conversation
|
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Pull request overview
This PR fixes a PHP syntax highlighting edge case in the bundled TextMate grammar where normal quoted strings starting with /* were incorrectly tokenized as slash-delimited regex strings, which could break subsequent highlighting (notably due to [ starting an unclosed “regex character class”).
Changes:
- Update the PHP grammar’s “regex string” begin patterns to exclude strings beginning with
/*via a negative lookahead after the initial/. - Mirror the same patch in the PHP grammar refresh script (
update-grammar.mjs) so the fix is preserved when syncing from upstream.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| extensions/php/syntaxes/php.tmLanguage.json | Adjusts regex-string begin rules to avoid treating /* ... */-like strings as regex strings. |
| extensions/php/build/update-grammar.mjs | Ensures the same rule adjustment is re-applied during upstream grammar updates. |
|
The use of (?!\*) is a solid fix for the PHP comment/regex ambiguity. However, the exact string match for begin in update-grammar.mjs makes the build script fragile; if the upstream grammar changes even slightly (e.g., adding a flag), the fail() function will trigger and break the build. It would be safer to use a regex-based replacement or a partial match to apply this patch more robustly. |
|
Thanks for the review. I addressed this in the latest push.
That should make the patch resilient to small upstream changes, such as added flags, while still avoiding a silent incorrect rewrite. |
Fixes #204065
Summary
PHP slash-delimited strings are highlighted as regex-like strings by the bundled PHP grammar. That works for common PCRE usages such as
preg_match("/foo/", $value), but it also causes normal strings that begin with a block-comment marker to be mis-tokenized.For example, this valid PHP string currently starts a
string.regexp.double-quoted.phpscope:Because the regex rule sees the string as
"/.../", the[in<![CDATA[can leave the tokenizer inside a regex character class and break highlighting for the rest of the call or following code.This change excludes strings beginning with
/*from the regex-string rule, so comment-like string contents are treated as normal quoted strings. Regular slash-delimited regex strings continue to be highlighted as regex strings.The same patch is also added to
extensions/php/build/update-grammar.mjsso it is preserved when the PHP grammar is refreshed from upstream.Testing
node --check extensions/php/build/update-grammar.mjsgit diff --checkvscode-textmate/vscode-oniguruma:str_replace("/* <![CDATA[ */", '', $jsCode)is tokenized asstring.quoted.double.php, notstring.regexp.double-quoted.phpstr_replace("/* ]]> */", '', $jsCode)is tokenized asstring.quoted.double.phppreg_match("/foo[bar]+/i", $jsCode)still receivesstring.regexp.double-quoted.php