Skip to content

Fix PHP highlighting for comment-like strings#315456

Open
Creeper0809 wants to merge 2 commits into
microsoft:mainfrom
Creeper0809:fix/php-string-comment-regex-204065
Open

Fix PHP highlighting for comment-like strings#315456
Creeper0809 wants to merge 2 commits into
microsoft:mainfrom
Creeper0809:fix/php-string-comment-regex-204065

Conversation

@Creeper0809
Copy link
Copy Markdown

Fixes #204065

Summary

PHP slash-delimited strings are highlighted as regex-like strings by the bundled PHP grammar. That works for common PCRE usages such as preg_match("/foo/", $value), but it also causes normal strings that begin with a block-comment marker to be mis-tokenized.

For example, this valid PHP string currently starts a string.regexp.double-quoted.php scope:

str_replace("/* <![CDATA[ */", '', $jsCode);

Because the regex rule sees the string as "/.../", the [ in <![CDATA[ can leave the tokenizer inside a regex character class and break highlighting for the rest of the call or following code.

This change excludes strings beginning with /* from the regex-string rule, so comment-like string contents are treated as normal quoted strings. Regular slash-delimited regex strings continue to be highlighted as regex strings.

The same patch is also added to extensions/php/build/update-grammar.mjs so it is preserved when the PHP grammar is refreshed from upstream.

Testing

  • Ran node --check extensions/php/build/update-grammar.mjs
  • Ran git diff --check
  • Verified tokenization with vscode-textmate / vscode-oniguruma:
    • str_replace("/* <![CDATA[ */", '', $jsCode) is tokenized as string.quoted.double.php, not string.regexp.double-quoted.php
    • str_replace("/* ]]> */", '', $jsCode) is tokenized as string.quoted.double.php
    • preg_match("/foo[bar]+/i", $jsCode) still receives string.regexp.double-quoted.php

Copilot AI review requested due to automatic review settings May 9, 2026 12:27
@Creeper0809
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a PHP syntax highlighting edge case in the bundled TextMate grammar where normal quoted strings starting with /* were incorrectly tokenized as slash-delimited regex strings, which could break subsequent highlighting (notably due to [ starting an unclosed “regex character class”).

Changes:

  • Update the PHP grammar’s “regex string” begin patterns to exclude strings beginning with /* via a negative lookahead after the initial /.
  • Mirror the same patch in the PHP grammar refresh script (update-grammar.mjs) so the fix is preserved when syncing from upstream.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
extensions/php/syntaxes/php.tmLanguage.json Adjusts regex-string begin rules to avoid treating /* ... */-like strings as regex strings.
extensions/php/build/update-grammar.mjs Ensures the same rule adjustment is re-applied during upstream grammar updates.

@Vineshnayak
Copy link
Copy Markdown

The use of (?!\*) is a solid fix for the PHP comment/regex ambiguity. However, the exact string match for begin in update-grammar.mjs makes the build script fragile; if the upstream grammar changes even slightly (e.g., adding a flag), the fail() function will trigger and break the build. It would be safer to use a regex-based replacement or a partial match to apply this patch more robustly.

@Creeper0809
Copy link
Copy Markdown
Author

Thanks for the review. I addressed this in the latest push.

update-grammar.mjs no longer relies on an exact full-string match for the upstream begin pattern. I replaced that with a small helper that:

  • skips the patch if (?!\*) is already present,
  • applies the patch to the expected "/(?=...) / '/(?=...) regex-string begin shape by inserting (?!\*) after the delimiter,
  • still fails if the upstream pattern changes enough that applying the patch would no longer be safe.

That should make the patch resilient to small upstream changes, such as added flags, while still avoiding a silent incorrect rewrite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Syntax highlighting error in PHP function parameter when string literal contains block comment

4 participants