Skip to content

Add Mojo language support#502

Open
Tokarzewski wants to merge 1 commit into
DeusData:mainfrom
Tokarzewski:feat/mojo-language-support
Open

Add Mojo language support#502
Tokarzewski wants to merge 1 commit into
DeusData:mainfrom
Tokarzewski:feat/mojo-language-support

Conversation

@Tokarzewski

Copy link
Copy Markdown

What does this PR do?

Adds Mojo (Modular's Python-superset systems language) as the 159th language.

Vendors the lsh/tree-sitter-mojo grammar (forked from tree-sitter-python, MIT, ABI 15, C scanner — no libstdc++) and wires it through the standard language path:

  • CBM_LANG_MOJO enum (appended, no renumbering of persisted DBs)
  • grammar_mojo.c wrapper + vendored internal/cbm/vendored/grammars/mojo/
  • extraction spec in lang_specs.c
  • .mojo and .🔥 extensions + LANG_NAMES in language.c

Spec design

The grammar's node types mirror Python's, so the spec reuses the py_* arrays and overrides only the class types (same reuse pattern already used for CFScript→js_*). Mojo-specific divergences:

  • fn/deffunction_definition
  • struct/classclass_definition; trait and __extension get their own nodes (trait_definition / extension_definition), so traits map to the Interface label
  • compile-time alias NAME = value has no dedicated grammar node — the upstream grammar recovers it as an assignment (name still captured)

Verification

Indexed real Mojo corpora end-to-end with the resulting binary:

  • NuMojo (pure Mojo, 135 files): functions, methods, structs, traits (Interface), decorators, calls, imports all extract; core NDArray struct correctly surfaces as the most-referenced type (in-degree 306).
  • EnergyPlusMojo + 7,354 machine-generated Mojo files: 166K+ nodes, no crashes — robust to malformed/partial generated Mojo.

Adds a test_grammar_regression.c case (fn→Function, struct→Class, trait→Interface) and the matching test_grammar_labels.c golden. Updates the grammar MANIFEST.md (recorded as community/unverified — not in nvim-treesitter/Helix registries), THIRD_PARTY.md, new-languages.json, and the README language count.

Checklist

  • Every commit is signed off (git commit -s)
  • Tests pass locally — grammar_label_goldens, grammar_code_extracts_defs, grammar_imports_extracted, and the 53-language CALLS-breadth check all pass for Mojo. (Unrelated to this PR: test_incremental RSS-budget assertion trips under ASAN, and pre-existing UBSan null-arg warnings in the vendored crystal/rescript/purescript scanners.)
  • Lint — clang-format clean on all hand-written files (lang_specs.c, language.c, grammar_mojo.c, tests)
  • New behavior covered by a test (regression case + label golden)

Heads-up (not part of this PR): on bleeding-edge GCC 16, src/cli/cli.c:1461 trips -Werror=discarded-qualifiers on a strstr const-discard, which blocks the build regardless of this change. Worth a one-line cast in a separate PR.

🤖 Generated with Claude Code

Mojo (Modular) is a Python-superset systems language. Vendor the
lsh/tree-sitter-mojo grammar (forked from tree-sitter-python, MIT, ABI 15,
C scanner) and wire it through the standard language path: enum, grammar
wrapper, extraction spec, and the .mojo / .🔥 extensions.

The grammar's node types mirror Python's, so the spec reuses the py_*
arrays and overrides only the class types — "struct"/"class" both parse as
class_definition, while "trait" and "__extension" get their own nodes
(trait_definition / extension_definition), mapping traits to Interface.
"fn"/"def" both parse as function_definition.

Verified end-to-end on real Mojo corpora (NuMojo, EnergyPlusMojo): functions,
methods, structs/classes, traits, decorators, calls, and imports all extract,
with a resolved call/usage graph. Adds a regression case and label golden;
updates the grammar manifest, third-party notices, and language count.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Tokarzewski <bartlomiej.tokarzewski@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant