Conversation
4ce0000 to
c90c9e2
Compare
9c93a00 to
d86a261
Compare
There was a problem hiding this comment.
Pull request overview
This PR expands the R language server’s LSP feature set (type hierarchy + semantic tokens) and improves responsiveness by adding caching and moving several hot-path operations to C.
Changes:
- Implement
textDocument/prepareTypeHierarchy,typeHierarchy/supertypes, andtypeHierarchy/subtypeswith supporting class/member extraction. - Add
textDocument/semanticTokens/fulland/rangeplus server capability wiring. - Improve performance via parse/diagnostics caching, task scheduling tweaks, and new C implementations for token scanning, UTF-16/code-point conversion, semantic token encoding, and matching.
Reviewed changes
Copilot reviewed 43 out of 45 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/testthat/test-type-hierarchy.R | Adds type hierarchy tests (R6). |
| tests/testthat/test-symbol.R | Adapts symbol range assertions via helper. |
| tests/testthat/test-symbol-blocks.R | Tests symbols in top-level blocks. |
| tests/testthat/test-signature.R | Adds activeParameter test cases. |
| tests/testthat/test-semantic-tokens.R | Adds semantic tokens tests (full/range). |
| tests/testthat/test-lintr.R | Stabilizes lintr tests/config handling. |
| tests/testthat/test-hover.R | Makes hover tests less brittle. |
| tests/testthat/test-highlight-blocks.R | Tests highlights in top-level blocks. |
| tests/testthat/test-definition-blocks.R | Tests definition in top-level blocks. |
| tests/testthat/helper-utils.R | Adds request helpers + tweaks test server startup. |
| src/token.h | Declares C token scanner. |
| src/token.c | Implements fast token scanning in C. |
| src/semantic.h | Declares semantic token encoder. |
| src/semantic.c | Implements semantic token delta encoding in C. |
| src/match.h | Declares C matching helpers. |
| src/match.c | Implements ASCII case-insensitive/fuzzy matching. |
| src/encoding.h | Declares UTF-16/code-point conversion helpers. |
| src/encoding.c | Implements UTF-16/code-point conversion in C. |
| src/languageserver.c | Registers new .Call entry points. |
| README.md | Marks new LSP features as implemented. |
| R/workspace.R | Adds parse/diagnostics caches + cache cleanup. |
| R/utils.R | Adds ASCII detection, C-backed match/fuzzy, C token scan wrapper, and C-backed UTF-16 conversions. |
| R/type_hierarchy.R | Implements type hierarchy logic and class member extraction. |
| R/task.R | Adjusts task scheduling/utilization for parse tasks. |
| R/symbol.R | Adds hierarchical DocumentSymbol support + class member children. |
| R/signature.R | Adds activeParameter computation and parameter range parsing. |
| R/settings.R | Introduces diagnostics cache TTL setting. |
| R/semantic.R | Adds semantic tokens extraction + encoding + handlers. |
| R/selection.R | Formatting/indentation adjustments only. |
| R/rename.R | Formatting/indentation adjustments only. |
| R/references.R | Refactors reference collection to avoid repeated c() growth. |
| R/languageserver.R | Rescales session pool sizes and adds pending reply queues for new requests. |
| R/interfaces.R | Adds document_symbol() constructor for hierarchical symbols. |
| R/hover.R | Updates XPath roots to include more node types. |
| R/handlers-textsync.R | Parses immediately on open (delay=0). |
| R/handlers-langfeatures.R | Adds handlers for type hierarchy + semantic tokens requests. |
| R/formatting.R | Avoids repeated list growth when building edits. |
| R/document.R | Fixes srcref propagation for block parsing + adds parse caching by content hash. |
| R/diagnostics.R | Adds diagnostics caching + explicit linters fallback logic. |
| R/definition.R | Updates XPath roots similarly to hover/signature. |
| R/completion.R | Avoids repeated list growth + updates XPath roots. |
| R/code_action.R | Avoids repeated list growth when collecting actions. |
| R/capabilities.R | Advertises type hierarchy + semantic tokens capabilities. |
| NAMESPACE | Imports digest::digest. |
| DESCRIPTION | Adds digest dependency + updates RoxygenNote. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| leading_space <- nchar(current_param) - nchar(sub("^\\\\s+", "", current_param)) | ||
| trailing_space <- nchar(current_param) - nchar(sub("\\\\s+$", "", current_param)) | ||
|
|
||
| # Calculate the label position as [start, end] in the full signature | ||
| # LSP uses 0-based positions | ||
| param_start <- base_offset + char_pos + leading_space | ||
| param_end <- base_offset + char_pos + nchar(current_param) - trailing_space |
There was a problem hiding this comment.
The leading/trailing whitespace trimming for parameter label offsets uses patterns like "^\\s+" / "\\s+$" without perl=TRUE, so they won't match actual whitespace in base R regex. This will compute incorrect ParameterInformation label ranges. Use [[:space:]]+ (POSIX) or enable perl=TRUE with "^\s+" / "\s+$".
| @@ -0,0 +1,236 @@ | |||
| #include <R.h> | |||
| #include <Rinternals.h> | |||
| #include <string.h> | |||
There was a problem hiding this comment.
This file uses malloc()/free()/calloc() but does not include <stdlib.h>. On many toolchains this is a compile error (implicit declaration) and can break package compilation. Add the proper standard header for these allocation functions.
| #include <string.h> | |
| #include <string.h> | |
| #include <stdlib.h> |
| #include <ctype.h> | ||
| #include <string.h> | ||
| #include <wctype.h> | ||
| #include <stdint.h> |
There was a problem hiding this comment.
WINT_MAX is used later in this file but isn't guaranteed to be defined by <wctype.h> alone (it's typically from <wchar.h>). To avoid build failures on some platforms/toolchains, include the header that defines WINT_MAX (e.g. <wchar.h> or <limits.h>).
| idx <- idx + 1L | ||
| tokens[[idx]] <- list( | ||
| line = as.integer(line1 - 1), # Convert to 0-based, ensure integer | ||
| col = as.integer(col1 - 1), # Convert to 0-based, ensure integer | ||
| length = as.integer(nchar(token_text)), # Ensure integer | ||
| tokenType = as.integer(token_type), # Ensure integer | ||
| tokenModifiers = as.integer(modifiers) # Ensure integer |
There was a problem hiding this comment.
Semantic token positions/lengths need to be in LSP 'character' units (UTF-16 code units). Here col and length are computed using parse-data columns (code points) and nchar() (code points), which will be incorrect for non-BMP characters. Consider converting (col1-1) and col2 to UTF-16 units via document$to_lsp_position()/code_point_to_unit and derive length from the converted start/end columns rather than nchar(token_text).
| idx <- idx + 1L | |
| tokens[[idx]] <- list( | |
| line = as.integer(line1 - 1), # Convert to 0-based, ensure integer | |
| col = as.integer(col1 - 1), # Convert to 0-based, ensure integer | |
| length = as.integer(nchar(token_text)), # Ensure integer | |
| tokenType = as.integer(token_type), # Ensure integer | |
| tokenModifiers = as.integer(modifiers) # Ensure integer | |
| # Convert token start/end from parse-data code points to LSP UTF-16 units | |
| start_pos_lsp <- document$to_lsp_position(list( | |
| line = as.integer(line1 - 1L), | |
| character = as.integer(col1 - 1L) | |
| )) | |
| end_pos_lsp <- document$to_lsp_position(list( | |
| line = as.integer(line2 - 1L), | |
| character = as.integer(col2) | |
| )) | |
| idx <- idx + 1L | |
| tokens[[idx]] <- list( | |
| line = as.integer(start_pos_lsp$line), # Already 0-based | |
| col = as.integer(start_pos_lsp$character), # UTF-16 code units | |
| length = as.integer(end_pos_lsp$character - start_pos_lsp$character), | |
| tokenType = as.integer(token_type), # Ensure integer | |
| tokenModifiers = as.integer(modifiers) # Ensure integer |
| diagnostics_pool_size <- max(floor(pool_size * 0.75), 2) | ||
| diagnostics_pool <- if (pool_size > 0) SessionPool$new(diagnostics_pool_size, "diagnostics") else NULL |
There was a problem hiding this comment.
diagnostics_pool_size can exceed the user-requested pool_size. For example, if R_LANGSVR_POOL_SIZE=1, this computes diagnostics_pool_size=2 and starts more sessions than requested. Consider capping diagnostics_pool_size to pool_size (and allowing 1) to respect the environment setting.
| diagnostics_pool_size <- max(floor(pool_size * 0.75), 2) | |
| diagnostics_pool <- if (pool_size > 0) SessionPool$new(diagnostics_pool_size, "diagnostics") else NULL | |
| # Ensure diagnostics_pool_size does not exceed the user-requested pool_size | |
| diagnostics_pool_size <- if (pool_size > 0) { | |
| min(max(floor(pool_size * 0.75), 1L), pool_size) | |
| } else { | |
| 0L | |
| } | |
| diagnostics_pool <- if (diagnostics_pool_size > 0) SessionPool$new(diagnostics_pool_size, "diagnostics") else NULL |
| "//SYMBOL_FUNCTION_CALL[text() = 'setMethod']/following-sibling::expr[STR_CONST[text() = '\"'{token_quote}'\"']]", | ||
| token_quote = token_text |
There was a problem hiding this comment.
The XPath built for the setMethod() pattern is not valid XPath because it tries to concatenate string literals and the class name without using concat()/contains(). This will likely cause xml_find_all() to error and can break prepareTypeHierarchy for tokens containing dots (e.g. S3 method names). Consider rewriting this XPath to use contains(text(), '""') (like other helpers in this file) or build a single quoted literal via xml_single_quote().
| "//SYMBOL_FUNCTION_CALL[text() = 'setMethod']/following-sibling::expr[STR_CONST[text() = '\"'{token_quote}'\"']]", | |
| token_quote = token_text | |
| "//SYMBOL_FUNCTION_CALL[text() = 'setMethod']/following-sibling::expr[STR_CONST[text() = {class_literal}]]", | |
| class_literal = xml_single_quote(glue('"{token_text}"')) |
|
|
||
| # Check if this is a named argument (pattern: name = ...) | ||
| # Match identifier followed by =, with optional whitespace | ||
| named_match <- regexec("^([a-zA-Z._][a-zA-Z0-9._]*)\\s*=\\s*", current_arg) |
There was a problem hiding this comment.
This named-argument regex uses "\s" but regexec() defaults to POSIX ERE (perl = FALSE), where \s is not whitespace. As a result, patterns like "z =" won't match and activeParameter will fall back to comma counting (failing the new named-arg tests). Use a POSIX-safe whitespace class (e.g. [[:space:]]*) or set perl = TRUE.
| named_match <- regexec("^([a-zA-Z._][a-zA-Z0-9._]*)\\s*=\\s*", current_arg) | |
| named_match <- regexec("^([a-zA-Z._][a-zA-Z0-9._]*)[[:space:]]*=[[:space:]]*", current_arg) |
This PR implements the following handlers:
and makes the following improvements:
activeParametersincompletionis supported.hierarchicalDocumentSymbolSupportis implemented so that class members are displayed in outline.{}with or withoutifwhileetc) have correct ranges.P.S: The code written in this PR is mostly done by @copilot.