feat: Allow identifiers to begin with _#7243
Conversation
SIP-04x relaxes the `ClarityName` rules so that identifiers may begin with `_` (and the bare `_` is a valid identifier). The relaxation is wire-level unconditional but the language-level rule is gated at `ClarityVersion::Clarity6` via a new AST pass. * `clarity-types/src/representations.rs` — add a fourth alternation arm to `CLARITY_NAME_REGEX_STRING` for `^_([a-zA-Z0-9]|[-_!?+<>=/*])*$`. Tests in `clarity-types/src/tests/representations.rs` and the matching proptests in `clarity/src/vm/tests/representations.rs` are extended for the new arm and the bare-`_` case. * `clarity/src/vm/ast/parser/v2/lexer/mod.rs` — accept `_` as an identifier first-character. * `clarity/src/vm/ast/underscore_checker.rs` (new) — `BuildASTPass` that walks `pre_expressions` and rejects atoms / trait references / field identifiers beginning with `_` when `ClarityVersion < Clarity6`. Wired into `inner_build_ast` between the depth checks and `ExpressionIdentifier`. * `clarity/src/vm/ast/errors.rs` — new `UnderscoreIdentifierNotAllowed(String)` variant with a precise diagnostic. * `stackslib/src/chainstate/tests/parse_tests.rs` — exhaustive `variant_coverage_report` arm for the new variant, marked `Ignored` because it is covered by the clarity-side unit tests rather than by a consensus snapshot.
Per SIP-04x: the bare identifier `_` is a discard pattern in `let` and `match` binding positions. The bound expression is still evaluated (preserving the short-circuit effects of `try!`/`unwrap!`) but the name is not placed in scope, and repeated `_` bindings in the same form do not raise `NameAlreadyUsed`. Both the runtime and the type checker treat `_` as discard only when the contract's `ClarityVersion >= Clarity6`. Memory and cost accounting are preserved: a discard binding charges the same as a non-discard one, avoiding any gas-side-channel asymmetry. The value drops naturally at scope exit because it is never inserted into `inner_context.variables`. * `clarity/src/vm/functions/mod.rs` — `special_let` gates the conflict checks and the scope insertion on `!is_discard`. * `clarity/src/vm/functions/options.rs` — `eval_with_new_binding` (used by `match` opt/resp arms) gates the same way. * `clarity/src/vm/analysis/type_checker/v2_1/natives/mod.rs` — `check_special_let` skips `check_name_used`/`lookup_variable_type` and the `add_variable_type` for discards while still type-checking the bound expression. * `clarity/src/vm/analysis/type_checker/v2_1/natives/options.rs` — analyzer `eval_with_new_binding` gates similarly; new `use crate::vm::ClarityVersion`.
* `clarity/src/vm/tests/simple_apply_eval.rs` — five end-to-end tests:
- `test_let_discard_bare_underscore`: multiple `_` in one `let` form
coexist without `NameAlreadyUsed`.
- `test_let_discard_underscore_not_referenceable`: a `_` binding does
not place the name in scope; the body referring to `_` fails as
unbound.
- `test_let_underscore_prefix_is_regular_binding`: `_admin` (the
leading-`_` convention from the SIP example) binds normally and is
readable in the body — the leading `_` is convention only.
- `test_match_opt_discard_bare_underscore` / `_resp_…`: `_` discard
semantics extend to `match` on `(some …)` and `(ok …)/(err …)`.
* `changelog.d/underscore-identifiers.added` — release note.
The dedicated `^_(...)$` alternation arm added in bca48790fe duplicated the body character class from the letter arm. Since the body class `[-_!?+<>=/*]` already accepts `_`, the cleaner factoring is to widen the leading character class from `[a-zA-Z]` to `[a-zA-Z_]` and drop the second arm entirely. The bare-`_` case still works because the body quantifier is `*` (zero or more). Old: ^[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^_([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$ New: ^[a-zA-Z_]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$ The matched language is identical — purely a regex simplification. Tests in `clarity-types/src/tests/representations.rs` are unchanged; the proptest strategies in `clarity/src/vm/tests/representations.rs` collapse their two identifier branches into one and the `assert_regex_unchanged` literals are updated.
While auditing coverage on `underscore_checker.rs` I noticed two gaps:
1. The lexer dispatch for `<` only enters `read_trait_identifier` when
the next char `is_ascii_alphabetic()`. So `<_foo>` was being lexed as
three tokens (`Less`, the atom `_foo`, `Greater`) rather than as one
`Token::TraitIdent("_foo")`. The pre-Clarity-6 rejection test for
`<_foo>` was passing for the wrong reason — the underscore checker
was catching the `_foo` `Atom` branch, never the `TraitReference`
branch. Fix: also enter the trait-identifier reader when `<` is
followed by `_`, matching the relaxation done earlier in the main
identifier dispatch.
2. Missing coverage in `underscore_checker::tests`:
- The `Tuple` recursion arm of `check_one` was unexercised. New tests
`underscore_in_tuple_key_{rejected,accepted}` cover it.
- The `SugaredFieldIdentifier` and `FieldIdentifier` arms (the
`.contract.trait` and `'<addr>.contract.trait` desugarings) were
unexercised. New rejection tests cover both.
- Symmetric Clarity-6 acceptance tests for `_x` in let bindings,
`_addr` function arguments, and `<_foo>` trait references were
missing — added.
- No test verified the *specific* `ParseErrorKind` emitted. Added
`rejection_emits_underscore_identifier_not_allowed_kind`, which
uses `build_ast` (with `error_early=true`) to assert
`ParseErrorKind::UnderscoreIdentifierNotAllowed("_admin")`.
- Added `leading_operator_names_unaffected_pre_clarity6` as a
regression guard: the pass must leave `+`, `<=`, `*` etc. alone.
Test count goes from 10 to 19.
| // 3) `[<>]=?` — comparison operator name. | ||
| pub static ref CLARITY_NAME_REGEX_STRING: String = | ||
| "^[a-zA-Z]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$".into(); | ||
| "^[a-zA-Z_]([a-zA-Z0-9]|[-_!?+<>=/*])*$|^[-+=/*]$|^[<>]=?$".into(); |
There was a problem hiding this comment.
I'm concerned about changing this without a gate at this level. I think this could cause an accidental hard fork due to its use in wire-level validation. Like could a transaction with a leading _ somewhere get into a block that is accepted by updated nodes and rejected by unupdated nodes, before the epoch transition?
There was a problem hiding this comment.
You've got a point here, I can try to tighten this up and make better use the type system to "make invalid states unrepresentable"
There was a problem hiding this comment.
I'm planning to fix by leaving the current ClarityName alone and creating a new struct, ClarityNameV6, which allows the leading underscore. This should be the safest and most correct way to proceed, as we'll never be able to use a leading _ in the legacy code paths, but will also be a large diff
@brice-stacks Let me know if this sounds right to you
There was a problem hiding this comment.
Yeah I think that sounds good. Thanks.
Coverage Report for CI Build 26543228798Coverage decreased (-38.4%) to 47.577%Details
Uncovered Changes
Coverage Regressions88607 previously-covered lines in 344 files lost coverage.
Coverage Stats
💛 - Coveralls |
Description
Allow all Clarity identifiers to start with
_as of Clarity 6.Additionally, allow bare
_name inletandmatchbindings, which discards the result of the expression and doesn't create a binding that can be referenced laterThis will allow users to signal to the Clarinet linter that an identifier is intentionally unused
Applicable issues
_#7097Additional info (benefits, drawbacks, caveats)
Checklist
docs/property-testing.md)changelog.d/README.md)rpc/openapi.yamlfor RPC endpoints,event-dispatcher.mdfor new events)clarity-benchmarkingrepo