Refactor model logic to include all information in one folder#308
Conversation
LennartPurucker
left a comment
There was a problem hiding this comment.
Great first step, some comments:
- We still have several places we need to edit for each model and several files. Can we consolidate this even more? Maybe moving the generate function into the abstract model class would be best. Can we not move the model code into the same folder? Or, in some way or form, make it all one folder for all models.
- Do we have documentation for how to fill in the MethodMetadata for a new model submission? How much of it could be stored in the abstract model class in some way or form?
- nit: let us wait to merge iLTM before we go ahead with this.
- nit: We need to update the add-model skill later.
- Can you (with Calude) add a short explanation somewhere (maybe in the PR) and tests for how the discovery functions and should be understood?
| @@ -24,37 +24,40 @@ def convert_numpy_dtypes(data: dict) -> dict: | |||
|
|
|||
|
|
|||
| def get_configs_generator_from_name(model_name: str): | |||
There was a problem hiding this comment.
note: we might want to move to a function that returns the model info now in future refactor steps.
6e722c9 to
d99ca6b
Compare
|
@LennartPurucker Finished the refactor, addressed comments, and updated PR description |
deaab65 to
dc47de4
Compare
LennartPurucker
left a comment
There was a problem hiding this comment.
Very cool progress, add more comments/thoguhts
| display_name="{ModelName}", | ||
| compute="gpu", # or "cpu" | ||
| date="YYYY-MM-DD", # date of the benchmarking run (or planning date if unbenchmarked) | ||
| ag_key="{ag_key_without_TA}", # e.g. "TABSTAR" (matches {ClassName}Model.ag_key without the TA- prefix) |
There was a problem hiding this comment.
Why do we want to have it without TA prefix here? It seems a bit confusing now. Can we simplify this / rename the field in the metadata? Moreover, if possible, could we rely solely on ag_key or ag_name in the future?
There was a problem hiding this comment.
Would prefer to save skill edits like this to post-refactor once we have a clearer view on what is still left to be refined.
There was a problem hiding this comment.
Sure, just commented on this as the skill was added/changed in the PR already
There was a problem hiding this comment.
Can we already think about a way/structure of how we want to deprecate models now? We can only depreacte them on the LB and keep them as is here, or add some flag for it.
There was a problem hiding this comment.
Prefer to think about this post-merge
| ag_key="{ag_key_without_TA}", # e.g. "TABSTAR" (matches {ClassName}Model.ag_key without the TA- prefix) | ||
| model_key="{ag_key_without_TA}", | ||
| config_default="{ModelName}_c1_BAG_L1", | ||
| can_hpo=True, |
There was a problem hiding this comment.
Maybe ask Claude to go over these again from examples in models/ as some of this is wrong, like the postfix is always _c1_BAG_L1 or is_bag and can_hpo is True
There was a problem hiding this comment.
Would prefer to wait on this, we anyways may want to change MethodMetadata itself to not have certain keys or require certain information.
| from tabarena.models import RealMLPModel | ||
| from autogluon.tabular.models import LGBModel |
There was a problem hiding this comment.
Could I do "from tabarena.models import LGBModel" and it would do the same as "from autogluon.tabular.models import LGBModel"?
There was a problem hiding this comment.
Not currently, but we could support this if we want to.
|
|
||
|
|
||
| # FIXME: Implement `best` and `best-N` | ||
| class MethodMetadata: |
There was a problem hiding this comment.
Ask Claude to document this class and check that it aligns with your understanding of the class. Also, feel free to refactor this class as needed, as this might be a good time now
There was a problem hiding this comment.
I'd prefer as a separate PR. This is a massive refactor already and I'm mainly just having it move files, not alter them. Otherwise it will become impossible to know if a bug sneaks in.
There was a problem hiding this comment.
We need these files for backward compatibility?
Or could we move reference models and portfolios also to the /models part somehow?
E.g. /mdeols/reference or models/portfolios
There was a problem hiding this comment.
And things like TabPrep artifacts to models/experimental?
There was a problem hiding this comment.
Prefer to think about this for a future PR.
There was a problem hiding this comment.
Can we move the test code / function also to the /models? And here, just have one file calling all tests (or a subset)? I guess we are not testing most models in CI or so, but it would be good to have all code that is relevant in a bundle for contributors
There was a problem hiding this comment.
100% but I prefer to do in a separate PR.
697cbc4 to
ffca867
Compare
Apply the TabSTAR Stage 1 recipe to limix, mitra, orionmsp, sap_rpt_oss, tabdpt, and tabpfn_3: each gets `hpo.py` and `info.py`, with `generate.py` reduced to a thin shim so the legacy `models/utils.py:name_to_import_map` dispatch continues to work. The corresponding `<model>_metadata` entries in `_tabarena_method_metadata_*.py` are now sourced from each model's `info.py` (single source of truth, legacy names preserved as aliases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same TabSTAR recipe: `hpo.py` becomes the canonical home for each model's search-space generator (including the `generate_configs_*` / `generate_single_config_*` helpers that the previous `generate.py` defined inline); `info.py` carries the MethodMetadata + ModelInfo bundle. `generate.py` shims preserve the legacy `models/utils.py:name_to_import_map` dispatch. The `realmlp_gpu_metadata` and `xrfm_metadata` entries in `_tabarena_method_metadata_2025_09_03.py` are now sourced from each model's `info.py` (legacy names kept as aliases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the TabSTAR Stage 1 recipe to ebm, knn, lr, and perpetual_booster. Each gets `hpo.py` (search space + `generate_configs_*` helpers) and `info.py` (MethodMetadata + ModelInfo bundle); `generate.py` becomes a thin shim so `models/utils.py:name_to_import_map` keeps working. The `ebm_metadata`, `knn_metadata`, `lr_metadata`, and `perpetualbooster_metadata` entries in the dated metadata files are now sourced from each model's `info.py` (legacy names kept as aliases). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These folders host multiple AutoGluon model classes: - tabicl/: TabICLModel + TabICLv2Model (only v2 has dedicated MethodMetadata) - tabpfnv2_5/: RealTabPFNv25Model + TabPFNv26Model `hpo.py` holds the shared search-space machinery and exports both `gen_*` objects. `info.py` declares one `ModelInfo` per model class with a dedicated MethodMetadata entry; auto-discovery picks them up by `ag_name`. The dated metadata files (_2026_02_16, _2025_11_12, _2026_03_18) now source `tabiclv2_metadata`, `realtabpfn25_metadata`, and `tabpfn26_metadata` from each model's `info.py`. Legacy aliases preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the TabSTAR Stage 1 recipe to the three GBDT models that previously relied on the factory loop in `_tabarena_method_metadata_2025_06_12.py`. Each gets `hpo.py` (search-space generator + `generate_configs_*` helper) and `info.py` (standalone MethodMetadata + ModelInfo bundle). The factory file now imports each model's `*_method_metadata` from its `info.py` and skips them in the loop, so there's a single source of truth per model and no duplicate registry entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same recipe: random_forest, extra_trees, fastai, and nn_torch each get `hpo.py` + `info.py` with a standalone MethodMetadata. The factory loop in `_tabarena_method_metadata_2025_06_12.py` now imports each metadata from its `info.py` and skips the corresponding method in the loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These three models have a single `model_cls` shared between a CPU and a GPU MethodMetadata variant. Each `info.py` declares two `ModelInfo` instances (e.g. `tabm_info` + `tabm_gpu_info`) that share `model_cls` and `search_space` but differ in `method_metadata`. To support distinct keys for shared-model-class variants, the registry now keys on `method_metadata.method` (guaranteed unique) instead of `model_cls.ag_name`. No external consumers depend on the prior keying scheme. The factory in `_tabarena_method_metadata_2025_06_12.py` now sources all 12 migrated entries (3 GBDTs + 4 AG tabular + 5 multi-compute variants) from their respective `info.py` modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The tabicl/ folder already hosted both `gen_tabicl` (TabICLModel) and `gen_tabiclv2` (TabICLv2Model) since chunk 4, but only TabICLv2 had a dedicated `MethodMetadata` entry — the older TabICL_GPU lived in the 2025_06_12 factory loop. This commit adds `tabicl_method_metadata` and `tabicl_info` so TabICL_GPU is a first-class registry entry alongside TabICLv2. `TabPFNv2_GPU` is left in the factory loop: there is no corresponding `tabarena/models/tabpfnv2/` wrapper (the `models/utils.py` dispatch entry points at a non-existent module), so it has no model_cls to attach to a ModelInfo. That dispatch entry was dead before this refactor and stays dead after; investigating it is out of scope here. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each migrated model previously kept a thin `generate.py` re-exporting `gen_<key>` from `hpo.py`, solely so `models/utils.py:name_to_import_map` and a handful of scripts could keep working. With the migration complete: - `name_to_import_map` now imports `tabarena.models.<key>.hpo.gen_<key>` directly. The dead `TabPFNv2_GPU` entry (its target module never existed) is also removed. - 5 external consumers (`tabflow/scripts/run_jobs_*`, `examples/...`, and `tst/benchmark/test_yaml_experiment_serialization.py`) updated from `<key>.generate` to `<key>.hpo`. - 24 `generate.py` shim files deleted. Verified: top-level import succeeds, registry still holds 29 entries, `get_configs_generator_from_name` dispatches correctly for all 26 friendly names, and the metadata aggregator is unchanged at 35 methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each dated `_tabarena_method_metadata_*.py` previously kept a `<key>_metadata = <key>_method_metadata` alias for back-compat. With Stage 2 complete, the canonical home is per-model `info.py` — the dated files become near-empty placeholders. - 2 active consumers updated to import from `tabarena.models.<key>.info`: - `tabflow/scripts/run_evaluate_linear_model.py` (lr_metadata → lr_method_metadata) - `examples/!old/run_download_url_and_cache_to_s3_2025_09_03.py` (8 aliases) - `examples/!old/run_limix_upload.py` (3 aliases) - The aggregator `_tabarena_method_metadata.py` now imports every migrated entry directly from its per-model `info.py` (`<key>_method_metadata as <key>_metadata` re-aliasing keeps internal references stable). - Legacy aliases dropped from 6 dated files (_2025_09_03, _2025_10_20, _2025_11_12, _2026_02_16, _2026_03_18, _2026_05_13). Each now either carries only the unmigrated factory entries or is a placeholder comment. Verified: aggregator size unchanged at 35 methods, zero duplicates, registry still holds 29 entries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`model_registry.py` no longer maintains a hand-curated list of TabArena custom model classes — it derives them from each `ModelInfo` in `tabarena/models/<key>/info.py`, deduplicated by `model_cls` and filtered to skip AG-builtins (whose `ag_key` is already in `ag_model_registry`). To avoid a circular import (this module is transitively required by `experiment_constructor` → `config_utils` → per-model `hpo.py`, which `get_model_registry()` triggers via `discover_models()`), the derivation runs lazily via PEP 562 `__getattr__`. `tabarena_model_registry` and `_models_to_add` are built on first access. Verified: 17 TabArena-custom classes auto-derived (matching the prior hand-curated count exactly), all registered with the expected ag_keys. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ag_130_metadata` (AutoGluon 1.3 baseline) and `portfolio_metadata` (Portfolio-N200-4h) aren't per-model wrappers — they're standalone MethodMetadata entries for a baseline and a portfolio result, with no `model_cls` / `search_space` to attach. They previously lived inline in the 2025_06_12 factory file; move them to a dedicated `tabarena/baselines/info.py` to clarify the separation between per-model contributions and non-model baselines/portfolios. The 5 historical config entries the factory still produces (ExplainableBM/KNeighbors/LinearModel/RealMLP_GPU/TabDPT_GPU at `artifact_name="tabarena-2025-06-12"`) intentionally stay — they represent the original-paper artifact snapshot, distinct from each model's newer artifact-name entry. The aggregator's existing `replaced_methods` filter drops them from the latest collection while the complete collection keeps them. Verified: aggregator size unchanged at 35; complete collection still 54; `AutoGluon_v130` and `Portfolio-N200-4h` continue to flow through the complete collection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each per-model `info.py` is now self-describing about its pip dependencies via the `pip_extra` tuple on `ModelInfo`. Backfilled for 13 models with non-empty extras: ebm, limix, modernnca, orionmsp, perpetual_booster, realmlp (both CPU + GPU), sap_rpt_oss, tabdpt, tabicl (both), tabm (both), tabpfn_3, tabpfnv2_5 (both), xrfm. `tabarena/tabarena/tools/sync_pyproject_extras.py` compares the aggregated `ModelInfo.pip_extra` against `pyproject.toml` `[project.optional-dependencies]` and reports drift. Run with `--check` to exit non-zero on mismatch (suitable for CI / precommit). Current report flags legitimate naming differences (e.g. `perpetual_booster` folder vs `perpetualboosting` extra, `sap_rpt_oss` vs `sap-rpt-oss`) — a follow-up can either align names or add synonym handling to the comparator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`betatabpfn`, `tabflex`, `TabPFNv2_GPU`, and the tabprep variants (`PrepLightGBM`, `PrepLinearModel`, `PrepTabM`, `PrepRealTabPFN-v2.5`) have benchmark-result MethodMetadata but no tabarena-side model wrapper class. `ModelInfo` requires a `model_cls` and `search_space`, so these entries can't be migrated to per-model `info.py` modules as-is. Stage D is therefore a documentation-only stage: add module docstrings to `_tabarena_method_metadata_2025_09_03.py` and `_tabarena_method_metadata_2026_01_23_tabprep.py` explaining why these entries stay there rather than moving to `tabarena/models/<key>/info.py`. If wrappers are ever added for these models (or if the tabprep entries are folded into the underlying model's `info.py` as additional `ModelInfo` instances), Stage D can revisit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Full Stage A (physically moving 15 wrapper directories from `benchmark/models/ag/<key>/` to `tabarena/models/<key>/` and updating ~69 import paths) is high blast-radius for a single commit. This commit takes the smaller intermediate step: each per-model folder gains a `model.py` that re-exports the wrapper class(es) from their canonical location. The per-model layout is now uniform on the import surface (`tabarena.models.<key>` has `hpo.py` + `info.py` + `model.py` + `__init__.py`); 69 legacy import paths keep working unchanged. 15 shims added: - ebm, knn, limix, modernnca, orionmsp, perpetual_booster, realmlp, sap_rpt_oss, tabdpt, tabm, tabpfn_3, tabstar, xrfm - tabicl (re-exports both TabICLModel + TabICLv2Model) - tabpfnv2_5 (re-exports both RealTabPFNv25Model + TabPFNv26Model) Follow-up to fully complete Stage A: physically relocate the wrapper files to `tabarena/models/<key>/`, flip the shim direction so the legacy path re-imports from the new location, then phase out the legacy benchmark/models/ag/ tree once all 69 consumers migrate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Exposes `tabarena.models.register_model_info(info: ModelInfo)` so external
packages (e.g. `tabarena_extensions`) can declare additional models
without needing `discover_models()` to walk their package tree.
Extensions sometimes redeclare a method already in the core registry
(e.g. a re-benchmarked LinearModel with a different `artifact_name`).
When that happens, the function keys the new entry as
`f"{method}@{artifact_name}"`, preserving the core entry under the bare
method name.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous lazy access pattern predefined `_tabarena_model_registry` and `_models_to_add` as module-level `None` globals. That short-circuits Python's `__getattr__` fallback (which only fires on missing attributes), so `from tabarena.benchmark.models.model_registry import _models_to_add` returned `None` instead of building the list. Move the lazy cache into a module-level `_lazy_state: dict` and remove the predefined globals. Now first access via `__getattr__` builds and caches both values; subsequent accesses read from the dict. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`ConfigSpace` is declared as an optional extra (`search_spaces`) — base installs don't include it. Pre-refactor, the per-model `generate.py` files that needed ConfigSpace were only loaded on demand via `name_to_import_map` lambdas, so the import never fired during a plain `import tabarena`. Stage 2's per-model migration moved the same code into `hpo.py`, which `info.py` imports eagerly so `discover_models()` can build the registry. That made the top-level `from ConfigSpace import ...` fire on every `import tabarena`, breaking CI (`ModuleNotFoundError: No module named 'ConfigSpace'`). Fix: in catboost, lightgbm, xgboost, extra_trees, random_forest, and xrfm hpo.py, move the `from ConfigSpace import ...` inside the `generate_configs_*` function body. Module import is now ConfigSpace-free; the actual search-space construction (which only fires when someone calls `gen_<key>.generate_all_bag_experiments(...)` or `generate_configs_<key>(...)`) still requires it as before. Verified: with ConfigSpace blocked at import, all 6 hpo.py modules load, `MODEL_REGISTRY` builds (29 entries), and `_models_to_add` resolves (17 TabArena-custom classes). With ConfigSpace available, the config generators still produce configs unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the 6 S3/R2 transfer modules out of nips2025_utils/artifacts/ and into a new tabarena/models/_artifacts/ sub-package, co-located with the MethodMetadata they depend on. Drop the unused AbstractArtifactLoader / AbstractArtifactUploader bases. Module names drop the redundant `method_` prefix since the package scope already implies it. No shims: all real consumers (the lazy imports inside _method_metadata.py and method_artifact_manager.py) are updated in-place. Includes ruff --fix cleanups on the touched files.
Tests for tabarena/tabarena/models/<key>/ now live at tst/models/test_<key>.py, matching the existing flat layout that already houses test_lazy_imports.py, test_registry.py, and test_utils.py. tst/benchmark/models/ is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e0aff69 to
a1d33b8
Compare
LennartPurucker
left a comment
There was a problem hiding this comment.
Merge as you see fit, I think the general new workflow is great. I leave it up to you what to add and what to add in antoher PR
|
Big one, let's goooo! |
Refactor: per-model folder layout, auto-discovered registry, foundation/aggregation layering
Summary
This branch reorganises everything about how models are declared, discovered, and exposed in TabArena. Every model now lives in a single canonical folder; the registry is auto-built from those folders; the top-level
tabarena.modelsnamespace is a stable public surface; and a long-standing latent circular import has been removed at the root.The change is mechanical at the file-system level but eliminates a class of hidden bugs and removes ~five hand-maintained lists that used to drift.
40 commits, ~220 files touched (net +1.9k LOC of which a large fraction is tests + per-model
__init__/hpo/infoboilerplate that replaces an older diffuse structure).Why
The pre-refactor layout had grown three problems that compounded over time:
Each model was scattered across multiple locations. The AutoGluon wrapper class lived under
tabarena/benchmark/models/ag/<key>/, the search-space generator undertabarena/models/<key>/generate.py, and theMethodMetadataartefact entry insidetabarena/nips2025_utils/artifacts/. Adding a model meant editing 7+ unrelated places and keeping several hand-coded import lists in sync.The registry was hand-maintained. Multiple parallel mappings — friendly-name → generator, AG-key → wrapper class, model → pip extras — each had to be updated for every new model. Drift between them silently dropped models from one surface while keeping them in another.
A latent circular import was masking real bugs. Per-model
info.pyfiles importedMethodMetadatafrom a package whose__init__.pyeagerly aggregated metadata back from every per-modelinfo.py. The discovery walk swallowed the resultingImportErrorsilently and moved on. As ofmain, CatBoost was the alphabetically-first model in the walk and was therefore the model that hit the cycle — meaning the most-used GBDT had been silently missing fromMODEL_REGISTRYfor an extended period without anyone noticing.What changed
A single canonical home per model
Every model now lives in exactly one folder at
tabarena/models/<key>/, with a uniform contract:hpo.pyreplaces the oldergenerate.pyconvention.model.pyis the canonical home of the wrapper class — the previoustabarena/benchmark/models/ag/location has been retired entirely, including its top-level namespace. Multi-file models follow a standardised layout where private helpers go in_internal/and copied-upstream code (currently only LimiX) goes in_vendor/next to its license. Both subfolders can coexist in the same model folder for models that mix hand-written wrappers around vendored libraries.Registry auto-discovery as the single source of truth
tabarena.models.discover_models()walks every per-model folder, imports eachinfo.py, and collects theModelInfoinstances declared there. Everything downstream —tabarena_model_registry(AutoGluon registry),get_configs_generator_from_name(friendly-name lookup),pip_extraaggregation for pyproject extras — now derives from this single source instead of maintaining parallel lists.Adding a new model is now a 2-place edit (the
_LAZY_CLASSESmap intabarena/models/__init__.py+ the relevantpyproject.tomlextra) plus the per-model folder; the rest is auto-derived. Theadd-modelskill has been rewritten to match.The friendly-name lookup (
get_configs_generator_from_name) collapsed from a hand-coded 27-entry dict into a 6-line registry lookup keyed bydisplay_nameormethod. A parametrised regression test locks identity against the old behaviour: every previously-supported friendly name still returns exactly the sameConfigGeneratorobject.Top-level
tabarena.modelsnamespacetabarena.modelsis now a stable public surface. Consumers can write:Model wrapper classes and
MethodMetadataare exposed via PEP 562__getattr__soimport tabarena.modelsstays cheap — heavy ML libraries are loaded only on first attribute access and cached thereafter. The previoustabarena.benchmark.models.agtop-level surface was removed (it was already a thin re-export shim by the end of the migration); deep imports through that namespace are also gone.__all__is derived from_LAZY_CLASSESplus an explicit eager-export tuple, so it can't drift from the actual surface.Foundation/aggregation layering: the cycle is fixed at the root
MethodMetadatamoved out of the aggregator's package and into a foundation-layer module alongsideModelInfo. The structural consequence:info.pyfiles no longer transit a package whose__init__.pyaggregates from them. The cycle is severed; the silent-skip indiscover_models()is no longer load-bearing.model.pywork without adiscover_models()warm-up — a workaround that several prior tests had to invoke.A back-compat shim at the legacy
MethodMetadatapath is kept so external imports remain stable.Hardened discovery
discover_models()now logs aWARNINGvia stdlibloggingwhen a per-modelinfo.pyfails to import, instead of silently dropping the model. The skip-and-continue behaviour is preserved — a broken model doesn't take down the rest of the registry — but the failure surfaces in logs so future regressions of the CatBoost-bug shape become immediately visible.Extension registration surface
A
register_model_info()API was added for extension packages whose model folders aren't reachable bydiscover_models()'s walk overtabarena.models. Extensions can now ship their own models that join the same registry, with automatic disambiguation when they re-declare a method name that already exists in the core registry.Skill + documentation refresh
The
.claude/skills/add-model/skill (used by Claude to add new models) was rewritten end-to-end:benchmark/models/ag/namespacemodel.py,hpo.py,info.py,__init__.py, and the test file_internal/vs_vendor/convention for multi-file modelsTest coverage
A new
tst/models/test area was added with three focused suites:test_registry.py— unit-tests the discovery walk (caching, duplicate detection, ignored-symbol filtering, extension registration, the new warning-on-failure behaviour) with a fully synthetic package fixture so the tests don't depend on the actual installed models.test_lazy_imports.py— locks in the PEP 562 lazy property oftabarena.models, verifiesMethodMetadataworks through the lazy surface, confirms__all__derivation is correct, guards against eager-re-export regressions.test_utils.py— parametrises every previously-supported friendly model name and asserts the new registry-driven lookup returns the sameConfigGeneratorobject as the old hand-coded dict did.Behaviours preserved
is-identity checks in tests).MethodMetadatais still importable from its legacytabarena.nips2025_utils.artifacts.method_metadatalocation for external callers, via a back-compat re-export shim.get_configs_generator_from_namestill resolve to the sameConfigGeneratorinstance as before, including CPU/GPU variant ties.tabarena_model_registry) still surfaces the same model classes; it just now auto-derives them instead of needing manual list edits.Behaviours intentionally removed
tabarena.benchmark.models.agnamespace and every per-model shim under it. Code that imported from these paths needs to switch tofrom tabarena.models import …orfrom tabarena.models.<key>.model import …. All in-repo call sites have already been updated.Follow-ups deferred to subsequent PRs
These were considered in scope but explicitly deferred so this PR could land as a focused refactor:
_method_metadata.py(currently 962 lines mixing the foundation dataclass with S3/paper-runner/repository orchestration). After the split,MethodMetadatacould be exposed eagerly at the top level without the cost concerns that motivated lazy loading.nips2025_utils/artifacts/__init__.pylazy. The cycle is structurally fixed, but the eager aggregation in that package's init still forces every consumer of downloaders/uploaders to load the full metadata aggregator on first touch.MethodMetadatapairs inrealmlp/info.py,modernnca/info.py,tabm/info.pyinto acpu_gpu_pair(...)helper.pyproject.toml: a couple of model extras (sap-rpt-oss,perpetualboosting) don't match theirModelKey, which trips up the pip-extras drift-detection.tueplots,autorank,seaborn,matplotlib,plotly) are currently in the base install and could move into an optionalpaperextra.Test results
The full model-area test sweep (
tst/models/+ per-model tests + yaml-serialization tests) passes locally. Failures observed in the broader sweep are pre-existing and reproduce onmainwithout any of these changes; none are caused by the refactor.The two pre-existing failure modes (cold-leaf import of any per-model module without warm-up, and CatBoost silently missing from the registry) are now positively resolved by the layering fix.