Skip to content

feat: configurable overview script#58

Merged
AlexanderLanin merged 2 commits into
eclipse-score:mainfrom
etas-contrib:metrics
May 19, 2026
Merged

feat: configurable overview script#58
AlexanderLanin merged 2 commits into
eclipse-score:mainfrom
etas-contrib:metrics

Conversation

@AlexanderLanin
Copy link
Copy Markdown
Member

Introduce a configuration file for the overview script, allowing users to specify organization settings and tracked dependencies. Update the collection and rendering commands to utilize this configuration, enhancing flexibility and usability.

@AlexanderLanin AlexanderLanin requested a review from Copilot May 18, 2026 09:25
Copy link
Copy Markdown
Member Author

@AlexanderLanin AlexanderLanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-assisted review (Claude):

Critical: 0 | Important: 1 | Suggestions: 2

Good PR overall — the generalization from hardcoded docs-as-code / daily-workflow constants to config-driven deps and signals is clean, well-tested, and the TOML schema is straightforward. The JS sort verified generic (column-index based), so dynamic dep/signal columns sort correctly.

One correctness issue to fix before merge, two design observations worth discussing.

current_signal_labels = tuple(
s.label for s in config.workflow_signals
)
if current_signal_labels == existing_snapshot.workflow_signal_labels:
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-assisted review (Claude):

[Important] Cache invalidation compares labels only — reference-string changes are invisible

The comparison uses only the labels stored in the snapshot:

current_signal_labels = tuple(s.label for s in config.workflow_signals)
if current_signal_labels == existing_snapshot.workflow_signal_labels:

workflow_signal_labels in the snapshot contains only human-readable labels, not reference strings. If an operator changes a reference (e.g. pins daily.yml@v1daily.yml@v2) while keeping the same label, the cache is not invalidated, and every repo keeps its stale matched_workflow_signals from the previous run.

Fix: store the reference strings in the snapshot alongside the labels and include them in the comparison.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI fix-commit (Claude): ceb882eRepoSnapshot.workflow_signal_labels replaced by workflow_signals: tuple[WorkflowSignal, ...]; the snapshot now stores both label and reference, and cache invalidation compares the full tuple.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Replaces hard-coded organization assumptions (eclipse-score, docs-as-code, cicd-workflows, bazel_registry, reference_integration) with a TOML-driven org_config.toml. Adds OrgConfig, generic TrackedDep and WorkflowSignal models, refactors signal detection and renderers to be data-driven from the config, and bumps the snapshot schema to 17.

Changes:

  • New org_config.py module + org_config.toml defining org name, repo filters, tracked Bazel deps, workflow signals, registry/ref-int repos; collect CLI now requires --org-config.
  • Generalized docs_as_code_version/uses_cicd_daily_workflow into bazel_deps lookups and matched_workflow_signals; metrics/HTML renderers iterate over configured deps and signal labels.
  • Cache invalidation when workflow signal labels change; reference-integration org name and registry repo lookup made config-driven.

Reviewed changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Unrelated downgrades of pygments/pytest/urllib3
tests/test_repo_overview.py Updates tests for new models, adds cache-invalidation and multi-dep tests
tests/test_org_config.py New test module covering OrgConfig loading/validation
tests/test_generator.py Threads org_name through render_readme calls
tests/test_cli_render.py Adds tracked-dep detail-page rendering test
src/generate_repo_overview/org_config.py New OrgConfig dataclass + TOML loader/validator
src/generate_repo_overview/models.py Adds TrackedDep, WorkflowSignal, schema v17, removes legacy fields
src/generate_repo_overview/metrics_report.py Renders Versions/Automation tables from config-driven deps/signals
src/generate_repo_overview/metrics_html.py Passes tracked-dep latest versions through to HTML pages
src/generate_repo_overview/_html_index.py Index page rebuilt around tracked deps / signal labels
src/generate_repo_overview/_html_detail.py Detail page renders per-dep version badges and signal rows
src/generate_repo_overview/_html_common.py version_badge arg renamed latest_daclatest_dep_version
src/generate_repo_overview/collector/init.py Threads OrgConfig through collection; conditional registry/ref-int fetches; cache invalidation on signal label change
src/generate_repo_overview/collector/signal_detection.py Replaces daily-workflow detection with multi-signal matcher
src/generate_repo_overview/collector/reference_integration.py Takes org_name parameter instead of hard-coded value
src/generate_repo_overview/collector/repo_entry.py Threads workflow_signals into deep inspection
src/generate_repo_overview/collector/traceability.py Generalizes docs-as-code repo detection to tracked deps; drops pr-484 fallback
src/generate_repo_overview/cli.py Adds --org-config arg, loads OrgConfig
src/generate_repo_overview/constants.py Removes DEFAULT_ORG
src/generate_repo_overview/profile_readme.py Makes org_name a required kw-only arg
README.md / AGENTS.md / CLAUDE.md / docs/repo-overview-tool-design.md / src/generate_repo_overview/README.md Doc updates for --org-config
.github/workflows/refresh-profile.yml Passes --org-config org_config.toml
org_config.toml New default org configuration file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +87 to +103
def _parse_tracked_deps(value: object) -> tuple[TrackedDep, ...]:
if not isinstance(value, list):
return ()
result: list[TrackedDep] = []
for item in cast("list[Any]", value):
if not isinstance(item, dict):
continue
repo = item.get("repo")
module_name = item.get("module_name")
if (
isinstance(repo, str)
and repo.strip()
and isinstance(module_name, str)
and module_name.strip()
):
result.append(TrackedDep(repo=repo.strip(), module_name=module_name.strip()))
return tuple(result)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI fix-commit (Claude): ceb882e_parse_tracked_deps in org_config.py now rejects entries where repo doesn't contain a /, consistent with the validation for reference_integration_repo and registry_repo.

Comment thread src/generate_repo_overview/models.py Outdated
Comment on lines +294 to +301
tracked_deps=tuple(
dep
for item in (data.get("tracked_deps") or ())
if isinstance(item, dict)
and (dep := TrackedDep.from_dict(cast("Mapping[str, Any]", item)))
and dep.repo
and dep.module_name
),
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI fix-commit (Claude): ceb882e — replaced the walrus-as-filter pattern in RepoSnapshot.from_dict with explicit isinstance checks; the walrus was always truthy because frozen dataclass instances are never falsy.

Comment on lines +344 to +355
cached_by_name: dict[str, RepoEntry] = {}
if existing_snapshot is not None:
current_signal_labels = tuple(
s.label for s in config.workflow_signals
)
if current_signal_labels == existing_snapshot.workflow_signal_labels:
cached_by_name = {repo.name: repo for repo in existing_snapshot.repos}
else:
print_status(
"Workflow signal definitions changed — ignoring content cache",
prefix=status_prefix,
)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI fix-commit (Claude): ceb882e — same fix: snapshot now stores full WorkflowSignal objects; changing a reference string correctly invalidates the content cache.


def is_docs_as_code_repo(entry: RepoEntry) -> bool:
return bool(entry.content.docs_as_code_version) or entry.name == _DAC_REPO_NAME
def is_tracked_dep_repo(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-assisted review (Claude):

[Suggestion] _is_tracked_dep_repo is duplicated — same logic in two files

collector/traceability.py:is_tracked_dep_repo and _html_index.py:_is_tracked_dep_repo contain identical logic. If the definition of "tracked dep repo" ever changes (e.g. matching on org prefix too), it has to be updated in both places and tests only cover one of them.

Move the function to models.py (or metrics_report.py) and import it in both call sites.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI fix-commit (Claude): ceb882eis_tracked_dep_repo moved into models.py alongside the data it operates on; both _html_index.py and collector/traceability.py now import from there.

) -> dict[str, tuple[TraceabilityTypeMetrics, ...]]:
dac_repos = [r for r in repos if is_docs_as_code_repo(r)]
if not dac_repos:
eligible_repos = [r for r in repos if is_tracked_dep_repo(r, tracked_deps)]
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-assisted review (Claude):

[Suggestion] Traceability fetching is still docs-as-code-specific, but now runs for all tracked deps

The URL being probed — https://{org_name}.github.io/{repo_name}/main/metrics.json — only exists for repos that publish docs-as-code traceability data. With the current config (one tracked dep = docs-as-code) this works fine.

But as soon as a second tracked dep is added (e.g. a Bazel toolchain), repos that use that dep will appear in the Traceability section with no data, shown as "not reporting." The section title and summary stats will be confusing to readers who do not know which dep triggers it.

Consider either keeping the traceability filter explicitly tied to docs-as-code (separate from tracked deps), or adding a config flag like traceability_dep to org_config.toml so only the intended dep drives this section.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mtrics.json is provided by docs-as-code tooling.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI fix-commit (Claude): No code change. The user confirmed that metrics.json is published by all repos running docs-as-code tooling, so the filter on tracked-dep repos is correct as-is.

- Store full WorkflowSignal (label + reference) in RepoSnapshot instead
  of labels-only; cache invalidation now compares both fields so changing
  a reference string correctly busts the content cache
- Consolidate is_tracked_dep_repo into models.py; remove the duplicate
  copy from _html_index.py and the one from collector/traceability.py
- Validate that tracked_deps[].repo is in org/repo format in org_config,
  matching the existing validation for reference_integration_repo
- Fix always-truthy walrus expression in RepoSnapshot.from_dict; filter
  tracked_deps with explicit isinstance checks instead
- Bump SNAPSHOT_SCHEMA_VERSION 17 → 18
Copy link
Copy Markdown
Member Author

@AlexanderLanin AlexanderLanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-assisted review (Claude):

All findings from this review resolved in ceb882e. Summary of what changed:

Cache invalidation (my comment + external): RepoSnapshot.workflow_signal_labels: tuple[str, ...] replaced by workflow_signals: tuple[WorkflowSignal, ...]. The snapshot now stores both label and reference for each signal. The cache comparison uses the full tuple, so changing a reference string (e.g. pinning daily.yml@v1daily.yml@v2) correctly busts the content cache. Schema version bumped 17 → 18.

Walrus-as-filter (external): RepoSnapshot.from_dict was filtering tracked_deps with (dep := TrackedDep.from_dict(...)) and dep.repo and dep.module_name. Frozen dataclass instances are always truthy, so the walrus clause did nothing. Replaced with explicit isinstance guards on the raw dict fields before constructing the object.

tracked_deps[].repo validation (external): _parse_tracked_deps in org_config.py now rejects entries where repo doesn't contain a /, matching the existing validation applied to reference_integration_repo and registry_repo.

is_tracked_dep_repo duplication (my suggestion): Identical function existed in both collector/traceability.py and _html_index.py. Moved to models.py next to the data it operates on; both callers import from there.

Traceability scope (my suggestion, no fix): Closed as non-issue — metrics.json is published by all repos running docs-as-code tooling, so filtering the traceability section by tracked-dep repos is correct.

@AlexanderLanin AlexanderLanin merged commit 905a18d into eclipse-score:main May 19, 2026
1 check passed
@AlexanderLanin AlexanderLanin deleted the metrics branch May 19, 2026 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants