Add Mojo language support#502
Open
Tokarzewski wants to merge 1 commit into
Open
Conversation
Mojo (Modular) is a Python-superset systems language. Vendor the lsh/tree-sitter-mojo grammar (forked from tree-sitter-python, MIT, ABI 15, C scanner) and wire it through the standard language path: enum, grammar wrapper, extraction spec, and the .mojo / .🔥 extensions. The grammar's node types mirror Python's, so the spec reuses the py_* arrays and overrides only the class types — "struct"/"class" both parse as class_definition, while "trait" and "__extension" get their own nodes (trait_definition / extension_definition), mapping traits to Interface. "fn"/"def" both parse as function_definition. Verified end-to-end on real Mojo corpora (NuMojo, EnergyPlusMojo): functions, methods, structs/classes, traits, decorators, calls, and imports all extract, with a resolved call/usage graph. Adds a regression case and label golden; updates the grammar manifest, third-party notices, and language count. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Tokarzewski <bartlomiej.tokarzewski@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds Mojo (Modular's Python-superset systems language) as the 159th language.
Vendors the lsh/tree-sitter-mojo grammar (forked from tree-sitter-python, MIT, ABI 15, C scanner — no libstdc++) and wires it through the standard language path:
CBM_LANG_MOJOenum (appended, no renumbering of persisted DBs)grammar_mojo.cwrapper + vendoredinternal/cbm/vendored/grammars/mojo/lang_specs.c.mojoand.🔥extensions +LANG_NAMESinlanguage.cSpec design
The grammar's node types mirror Python's, so the spec reuses the
py_*arrays and overrides only the class types (same reuse pattern already used for CFScript→js_*). Mojo-specific divergences:fn/def→function_definitionstruct/class→class_definition;traitand__extensionget their own nodes (trait_definition/extension_definition), so traits map to the Interface labelalias NAME = valuehas no dedicated grammar node — the upstream grammar recovers it as anassignment(name still captured)Verification
Indexed real Mojo corpora end-to-end with the resulting binary:
NDArraystruct correctly surfaces as the most-referenced type (in-degree 306).Adds a
test_grammar_regression.ccase (fn→Function,struct→Class,trait→Interface) and the matchingtest_grammar_labels.cgolden. Updates the grammarMANIFEST.md(recorded as community/unverified — not in nvim-treesitter/Helix registries),THIRD_PARTY.md,new-languages.json, and the README language count.Checklist
git commit -s)grammar_label_goldens,grammar_code_extracts_defs,grammar_imports_extracted, and the 53-language CALLS-breadth check all pass for Mojo. (Unrelated to this PR:test_incrementalRSS-budget assertion trips under ASAN, and pre-existing UBSan null-arg warnings in the vendored crystal/rescript/purescript scanners.)clang-formatclean on all hand-written files (lang_specs.c,language.c,grammar_mojo.c, tests)🤖 Generated with Claude Code