Skip to content

Conversation

@Earlopain
Copy link
Collaborator

Closes #3837

While these lines are whitespace only from a runtime perspective, the line continuation is significant for AST consumers.

Sort of a followup to faab217

Closes ruby#3837

While these lines are whitespace only from a runtime perspective,
the line continuation is significant for AST consumers.

Sort of a followup to ruby@faab217
@kddnewton
Copy link
Collaborator

I'm assuming this has no impact on generated code right?

@Earlopain
Copy link
Collaborator Author

Yeah, from what I tested this gives the exact same instructions.

heredoc_dedent_discard_string_node(pm_parser_t *parser, pm_string_node_t *string_node) {
if (string_node->unescaped.length == 0) {
const uint8_t *cursor = parser->start + PM_LOCATION_START(&string_node->content_loc);
return pm_memchr(cursor, '\\', string_node->content_loc.length, parser->encoding_changed, parser->encoding) == NULL;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've checked this thoroughly, but I'm just uncomfortable with this because it seems like it could include other kinds of strings, but by virtue of the dedent algorithm we're cool.

Just to be super clear, are we saying that we want to include the string node if it matches [\s\t]*\\\r?\n, but otherwise not?

If that's the case, I think I'd prefer that we explicitly check that pattern here, without relying on pm_memchr.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like it could include other kinds of strings, but by virtue of the dedent algorithm we're cool.

Hm, I wouldn't say that is the case, we only take this branch when the unescaped string is empty. That's only possible when the string indeed is dedenting whitespace only, anything else would result in it having some value.

Otherwise, your regexp is pretty much what I am checking for, with the assumption that just checking \ is good enough for the above reason. Otherwise, this is the implementation I came up with:

static inline bool
heredoc_dedent_discard_string_node(pm_parser_t *parser, pm_string_node_t *string_node) {
    if (string_node->unescaped.length == 0) {
        const uint8_t *start = parser->start + PM_LOCATION_START(&string_node->content_loc);
        const uint8_t *end = parser->start + PM_LOCATION_END(&string_node->content_loc);

        while (start < end && pm_char_is_inline_whitespace(*start)) start++;
        if (start < end && *start == '\\') {
            start++;
            if (start < end && *start == '\r') {
                start++;
            }
            if (start < end && *start == '\n' && start + 1 == end) {
                // The string is whitespace only and ends with a line continuation.
                return false;
            }
        }
        // The string may be empty, or all whitespace was removed via dedenting.
        return true;
    }
    // The string has content. It may be whitespace not removed via dedenting or any other characters.
    return false;
}

Do you prefer this still?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parse difference for heredoc with multiple consecutive line continuations

2 participants