Skip to content

Create a KPI dashboard for quality numbers#448

Draft
ahmed0mousa wants to merge 9 commits into
eclipse-score:mainfrom
ahmed0mousa:ahmo_add_nightly_quality_kpi
Draft

Create a KPI dashboard for quality numbers#448
ahmed0mousa wants to merge 9 commits into
eclipse-score:mainfrom
ahmed0mousa:ahmo_add_nightly_quality_kpi

Conversation

@ahmed0mousa
Copy link
Copy Markdown
Contributor

@ahmed0mousa ahmed0mousa commented May 18, 2026

Add a nightly CI pipeline that runs three quality jobs in parallel (coverage, CodeQL, and clang-tidy) and publishes all results to GitHub Pages under latest/quality/. A Jinja2-based dashboard aggregates the findings into a single page with KPI trend tracking across runs. The Sphinx documentation is extended with a dedicated quality reports page and a version switcher navbar, and on every push to main the docs automatically pull the latest nightly KPI numbers so they stay current without waiting for another nightly run.

Issue: SWP-262453

@ahmed0mousa ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch from a6a2efd to 7380bb8 Compare May 18, 2026 14:20
Copy link
Copy Markdown
Contributor

@castler castler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet fully through

env:
ANDROID_HOME: ""
ANDROID_SDK_ROOT: ""
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have things different in the release workflow then in others. So either, we add this everywhere or nowhere.

Can you please also state in the commit message why this change is needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node 20 will be deprecated next month on GitHub Actions runners, I can add this to the commit message
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

os.system(
f"{code_ql_path} database analyze -j=0 {database_location} --format=sarifv2.1.0 --output={output_base}/codeql.sarif")
os.system(
f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should keep the CSV output for direct human readibility

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Comment thread quality/static_analysis/codeql_lint.py Outdated
f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv")

# Analyze: run MISRA/AUTOSAR queries and produce SARIF.
# --ram: cap at 5 GB to prevent swap thrashing on GitHub runners (7 GB total)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not make this by default. We should add an extra option for github runners that we then only enable in the CI where these parameters are changed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied

Comment on lines +98 to +106
- name: Set conclusion
id: set-conclusion
run: |
if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then
echo "conclusion=success" >> $GITHUB_OUTPUT
else
echo "conclusion=failure" >> $GITHUB_OUTPUT
fi

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed, can we try to remove this again please?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the coverage step has continue-on-error: true meaning if bazel coverage fails, the job doesn't stop, it keeps running. Without the "Set conclusion" step, the caller nightly_quality.yml has no way to know whether coverage actually passed or failed; it only sees the job as success because continue-on-error suppresses the failure. So In continue-on-error hides the failure from GitHub's job status, "Set conclusion" exists to un-hide it for the dashboard. For a nightly quality pipeline, partial data is actually useful, if coverage fails at night, you want something to look at the next morning rather than an empty artifact.

@@ -0,0 +1,179 @@
# *******************************************************************************
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we first just take care of the code coverage please to reduce the scope of the PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during the discussion we were always talking about three jobs not only coverage, and I already implemented that.
if you insist I can add a commit on top to remove them so I can just apply it in reverse on another PR

Comment thread .github/workflows/nightly_quality.yml Outdated

# Restore KPI history from the previous gh-pages deployment so the
# dashboard can show delta badges and trend sparklines across runs.
- name: Restore KPI history
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed that we do not want history at the moment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Comment thread .github/workflows/nightly_quality.yml Outdated
# Deploy to GitHub Pages
# ------------------------------------------------------------------
- name: Deploy quality reports to GitHub Pages
uses: peaceiris/actions-gh-pages@v4
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way it is not integrated into our Sphinx build, maybe you can talk with Jochen about that

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that was deploys quality reports directly to gh-pages as a separate, uncoordinated publish. I made nightly_quality.yml upload quality reports as an artifact instead of deploying, then have docs.yml trigger on its completion and deploy everything in one shot.

@ahmed0mousa ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch 3 times, most recently from 364db19 to bff61c4 Compare May 19, 2026 12:28
… only

- Use subprocess.run instead of os.system for all CodeQL commands so
  errors are properly captured and logged.
- Add --ram 5000 --timeout 20 -j 2 to database analyze to prevent
  OOM and hung queries on GitHub runners.
- Remove CSV output; SARIF is sufficient for the CI quality report.
- Add FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env var.
- Add 'conclusion' output (success/failure) to match the interface of
  clang_tidy.yml and codeql.yml.
- Add id and continue-on-error to the bazel coverage step so the job
  can report a conclusion even on test failures.
- Gate genhtml and archive steps on run-coverage outcome so they are
  skipped cleanly when coverage fails.
- Fix cache-save condition to also fire on scheduled (nightly) runs.
- Include raw LCOV .dat file in the artifact so the quality dashboard
  can read coverage percentages without re-running genhtml.
Both workflows are triggered only via workflow_call from nightly_quality.yml.
Each exposes artifact-name and conclusion outputs so the caller can
conditionally download reports and build a unified dashboard.

clang_tidy.yml:
- Runs 'bazel test --config=clang-tidy //...' with continue-on-error.
- Collects per-target *.AspectRulesLintClangTidy.out files and generates
  an HTML summary with error/warning counts and a findings table.

codeql.yml:
- Runs 'bazel run --config=codeql //quality/static_analysis:codeql_lint'.
- Collects SARIF output and generates an HTML summary from it.
- Sets a 180-minute job timeout to guard against hung analyses.
Runs every night at midnight UTC (and on workflow_dispatch).
Executes coverage, codeql, and clang-tidy in parallel as reusable
workflow calls, then deploys all reports plus a unified KPI dashboard
to GitHub Pages.
quality/dashboard/generate_dashboard.py:
- Parses CodeQL SARIF files, clang-tidy *.AspectRulesLintClangTidy.out
  files, and LCOV .dat data into a single Jinja2-rendered HTML page.
- Maintains a quality_history.json for KPI trend tracking across runs
- Writes a GitHub Actions step summary with markdown KPI tables.

quality/dashboard/dashboard.html.j2:
- Dark-themed single-page dashboard with tabbed panels for CodeQL,
  Clang-Tidy and Coverage.
- Sortable/filterable findings tables, coverage progress bars, and
  a run-history table with trend indicators.
@ahmed0mousa ahmed0mousa force-pushed the ahmo_add_nightly_quality_kpi branch from bff61c4 to bb988f3 Compare May 19, 2026 14:36
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this Jinja Template maybe generate a RST File which we can include in the sphinx build?

f"{code_ql_path} database analyze"
f"{_analyze_flags}"
f" {database_location}"
f" --format=sarifv2.1.0 --output={output_base}/codeql.sarif",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe upload the sarif file if we have it already? Could be used for local development

env:
DOCS_VERSION: ${{ steps.vars.outputs.version }}
DOCS_BASE_URL: ${{ steps.vars.outputs.base_url }}
run: sphinx-build docs/sphinx _sphinx_output
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why you run without bazel?

touch _deploy/.nojekyll

# Generate versions.json for the pydata-sphinx-theme version switcher
python3 docs/sphinx/utils/update_versions_json.py \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mixing here again python and bazel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants