Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
a10fcca
Bump rollup from 4.55.2 to 4.59.0 in /helm-frontend (#4094)
dependabot[bot] Mar 4, 2026
a92f286
Bump the npm group in /helm-frontend with 2 updates (#4093)
dependabot[bot] Mar 4, 2026
30cd74c
Add Mistral Large 3, Mistral Medium 3.1, Mistral Small 3.2, and Minis…
yifanmai Mar 5, 2026
5e5c010
Add GPT-5.4 (#4099)
yifanmai Mar 6, 2026
c61a7ef
Add model_deployment_generator decorator (#4100)
yifanmai Mar 6, 2026
a3be634
Bump the npm group in /helm-frontend with 2 updates (#4097)
dependabot[bot] Mar 6, 2026
c5bf714
Update Arabic content generation and finance scenarios (#4095)
yifanmai Mar 9, 2026
9de0cbb
Temporary experimental changes to Arabic Enterprise (#4102)
yifanmai Mar 9, 2026
1e56c94
Remove the human-evaluation optional dependency (#4101)
yifanmai Mar 11, 2026
c69baf7
Modularize model deployment generators (#4103)
yifanmai Mar 11, 2026
932245a
Bump cryptography from 44.0.1 to 46.0.5 (#4105)
dependabot[bot] Mar 11, 2026
0f124a6
Switch llama-4-scout-17b-16e-instruct to use Vertex AI instead of Tog…
yifanmai Mar 11, 2026
34ea961
Allow using OpenAI models without configuration (#4107)
yifanmai Mar 11, 2026
aff1a1a
Use TogetherChatClient instead of TogetherClient for generated model …
yifanmai Mar 11, 2026
6f4d25c
Remove tokenization from AnthropicMessagesClient (#4108)
yifanmai Mar 11, 2026
b80479f
Allow using Anthropic models without configuration (#4109)
yifanmai Mar 11, 2026
83c92df
Bump @types/node from 25.3.3 to 25.4.0 in /helm-frontend in the npm g…
dependabot[bot] Mar 13, 2026
91d2e5d
Rename GitHub Action workflows (#4111)
yifanmai Mar 13, 2026
785d256
Remove tokenization from OpenAIClient (#4113)
yifanmai Mar 13, 2026
00e9ad4
Remove completed TODOs in pyproject.toml (#4112)
yifanmai Mar 13, 2026
1178c79
Allow using OpenRouter models without configuration (#4114)
yifanmai Mar 13, 2026
e2db9e7
Bump pyasn1 from 0.6.2 to 0.6.3 (#4116)
dependabot[bot] Mar 17, 2026
1e08ddf
Modularize tokenizer config generators (#4117)
yifanmai Mar 17, 2026
d639d03
Add Mistral model deployments (#4118)
yifanmai Mar 17, 2026
92c0216
Add Writer model deployments (#4119)
yifanmai Mar 17, 2026
c8f6561
Add model deployments for xAI (#4120)
yifanmai Mar 18, 2026
579f617
Bump @types/node from 25.4.0 to 25.5.0 in /helm-frontend in the npm g…
dependabot[bot] Mar 19, 2026
eefc96b
Add Arabic legal scenario (#4122)
yifanmai Mar 19, 2026
7c2f30b
Add metric for Arabic legal scenarios (#4123)
yifanmai Mar 19, 2026
5f21a0c
Bump flatted from 3.2.9 to 3.4.2 in /helm-frontend (#4124)
dependabot[bot] Mar 23, 2026
5b08a86
Automatically import model deployment and tokenizer config generators…
yifanmai Mar 23, 2026
ffa6bee
Allow using auto-generated model deployments for models with two-part…
yifanmai Mar 23, 2026
63631d2
Add Cohere model deployment generator (#4130)
yifanmai Mar 23, 2026
4b73429
Fix link for MedHELM paper (#4133)
yifanmai Mar 24, 2026
1409293
Add Mistral Small 4 (#4134)
yifanmai Mar 24, 2026
1433de0
Use GPT-5.4 for annotators for Arabic Enterprise scenarios (#4136)
yifanmai Mar 25, 2026
0135737
Fix MadinahQA scenario to include Context field for reading comprehen…
aaabulkhair Mar 25, 2026
2fd9c30
Update documentation for Efficient Benchmarking paper (#4132)
yifanmai Mar 25, 2026
db12909
Add model deployments for Google Vertex AI and Gemini API (#4131)
yifanmai Mar 26, 2026
d97df41
Bump cryptography from 46.0.5 to 46.0.6 (#4137)
dependabot[bot] Mar 26, 2026
13650f7
Bump requests from 2.32.5 to 2.33.0 (#4138)
dependabot[bot] Mar 26, 2026
c158ab9
Update MedHELM paper title (#4139)
yifanmai Mar 26, 2026
96eadb2
Install dependencies for Read the Docs with uv (#4140)
yifanmai Mar 26, 2026
6f49fd3
Remove requirements.txt and install-dev.sh (#4142)
yifanmai Mar 26, 2026
781c08b
Update changelog (#4135)
yifanmai Mar 27, 2026
b0739ed
Release v0.5.14 (#4143)
yifanmai Mar 27, 2026
99be551
Bump requests from 2.33.0 to 2.33.1 (#4146)
dependabot[bot] Mar 31, 2026
87131cf
Add GPT-5.4 mini and nano (#4145)
yifanmai Mar 31, 2026
d53b084
Minor fix to changelog (#4144)
yifanmai Mar 31, 2026
ff98626
Add HELM Arabic Enterprise landing (#4147)
yifanmai Mar 31, 2026
0b373b7
Suppress duplicate warnings from truncate_sequence (#4151)
yifanmai Apr 1, 2026
7234e64
Add Llama 4 Maverick on Vertex AI Llama 4 API Service (#4150)
yifanmai Apr 1, 2026
bddb97e
Add Qwen3.5 models on Together (#4152)
yifanmai Apr 1, 2026
91ca75a
Remove requirements.txt (#4154)
yifanmai Apr 1, 2026
53625a3
Add pkg_resources to tool.uv.extra-build-dependencies for openai-whis…
yifanmai Apr 3, 2026
12a8ba3
Bump lodash from 4.17.23 to 4.18.1 in /helm-frontend (#4158)
dependabot[bot] Apr 3, 2026
0b69500
Set extra-build-dependencies for openai-whisper (#4159)
yifanmai Apr 3, 2026
9a9eaec
Default suite to "default" in helm-run and helm-summarize (#4155)
yifanmai Apr 3, 2026
c3d6c18
Ignore type errors caused by upgrading transformers (#4160)
yifanmai Apr 3, 2026
561be34
Upgrade dependencies (#4162)
github-actions[bot] Apr 3, 2026
1120f90
Ignore type errors in model_summac (#4171)
yifanmai Apr 6, 2026
f189d51
Lazily load model in HuggingFaceClient (#4172)
yifanmai Apr 6, 2026
d76090e
Parse Arabic content generation annotator using a more permissive reg…
yifanmai Apr 7, 2026
987e96c
Parse int in Arabic conten generation annotator (#4174)
yifanmai Apr 7, 2026
133edde
Add metadata for HEIM scenarios (#4169)
yifanmai Apr 7, 2026
0107488
Add arabic_legal_rag and arabic_legal_qa to Arabic Enterprise run ent…
yifanmai Apr 7, 2026
220fd69
Add metadata for Arabic scenarios (#4170)
yifanmai Apr 7, 2026
1c451fb
Add metadata to mmlu_clinical_afr_scenario and winogrande_afr_scenari…
yifanmai Apr 9, 2026
f842f19
Switch documentation to use gcloud cp instead of gcloud rsync (#4188)
yifanmai Apr 10, 2026
5fbdab8
Change Dependabot schedule interval to monthly (#4190)
yifanmai Apr 14, 2026
b4c9d73
Bump mkdocs-include-markdown-plugin from 4.0.0 to 7.1.8 (#4163)
dependabot[bot] Apr 15, 2026
5ba8a5b
Bump vite from 6.4.1 to 6.4.2 in /helm-frontend (#4175)
dependabot[bot] Apr 15, 2026
742c4e8
Add metadata for Ewok scenario (#4179)
yifanmai Apr 15, 2026
1071710
Bump cryptography from 46.0.6 to 46.0.7 (#4180)
dependabot[bot] Apr 15, 2026
5f5f975
Bump gdown from 5.2.1 to 5.2.2 (#4195)
dependabot[bot] Apr 15, 2026
db78ee4
Add metadata for audio scenarios (#4185)
yifanmai Apr 15, 2026
094d12b
Bump pytest from 7.2.2 to 9.0.3 (#4194)
dependabot[bot] Apr 15, 2026
5dede25
Fix _apply_output_mapping_pattern returning wrong match results (#4192)
Chessing234 Apr 15, 2026
bfd9b96
Add link to per_instance_stats.json on frontend (#4166)
Chessing234 Apr 15, 2026
d4ba787
Fix print_summary() showing relevance scores under COHERENCE (#4183)
Chessing234 Apr 15, 2026
10a23d2
Add metadata for VHELM scenarios (#4189)
yifanmai Apr 15, 2026
36e3c5c
Bump the npm group across 1 directory with 4 updates (#4200)
dependabot[bot] Apr 15, 2026
f8341fe
Build frontend (#4148)
github-actions[bot] Apr 15, 2026
83eabd3
Fix run_executable returning 0 for failed compilations in CodeInsight…
Chessing234 Apr 15, 2026
3971bb1
Fix BBQ metrics in HELM Safety schema (#4202)
yifanmai Apr 15, 2026
caa7eb6
Add Gemini 3 Flash and Gemini 3.1 Flash-Lite (#4204)
yifanmai Apr 16, 2026
94a7eab
Fix PrivacyMetric leakage_email_domain_rate using local_correct_count…
Chessing234 Apr 16, 2026
56a5d63
Add metadata for MELT scenarios (#4205)
yifanmai Apr 20, 2026
83bde5c
Update dataset for Arabic Finance scenario (#4220)
yifanmai Apr 21, 2026
a978225
Merge upstream HELM main: sync 89 commits from stanford-crfm/helm:main
iulianigas Apr 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ updates:
- package-ecosystem: "uv"
directory: "/"
schedule:
interval: "weekly"
interval: "monthly"
ignore:
- dependency-name: "*"
update-types: ["version-update:semver-major"]
Expand All @@ -14,7 +14,7 @@ updates:
- package-ecosystem: "npm"
directory: "/helm-frontend/"
schedule:
interval: "weekly"
interval: "monthly"
ignore:
- dependency-name: "*"
update-types: ["version-update:semver-major"]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,47 +1,13 @@
name: Frontend
name: Deploy Frontend

on:
push:
branches:
- '*'
paths:
- 'helm-frontend/**'
pull_request:
branches:
- '*'
paths:
- 'helm-frontend/**'
on: workflow_dispatch

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Use Node.js
uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install Yarn
run: npm install --global yarn
- name: Install dependencies
working-directory: ./helm-frontend
run: yarn install
- name: Run pre-commit
run: ./pre-commit-frontend.sh
- name: Build
working-directory: ./helm-frontend
run: yarn build
- name: Run tests
working-directory: ./helm-frontend
run: yarn test

build:
runs-on: ubuntu-latest
# Deploy to only run on pushes to master
# if: github.event_name == 'push' && github.ref == 'refs/heads/main'
if: github.event_name == 'push' && github.ref == 'refs/heads/react_frontend'
needs: test
environment:
name: github-pages
env:
Expand Down
48 changes: 48 additions & 0 deletions .github/workflows/publish-pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# This workflow will upload a Python Package using Twine when a release is created
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#publishing-to-package-registries

name: Publish Python package to PyPI

on:
release:
types: [published]

permissions:
contents: read

jobs:
pypi-publish:
name: Publish Python package to PyPI
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/crfm-helm
permissions:
id-token: write
steps:
- name: Check out repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.10"
- name: Install uv
uses: astral-sh/setup-uv@v6
with:
version: "0.9.4"
- name: Build
run: uv build
- name: helm-run (wheel)
run: uv run --isolated --no-project --with dist/*.whl helm-run --run-entries simple1:model=simple/model1 --max-eval-instances 10 --suite test
- name: helm-summarize (wheel)
run: uv run --isolated --no-project --with dist/*.whl helm-summarize --suite test
- name: helm-server (wheel)
run: uv run --isolated --no-project --with dist/*.whl helm-server --help
- name: helm-run (source distribution)
run: uv run --isolated --no-project --with dist/*.tar.gz helm-run --run-entries simple1:model=simple/model1 --max-eval-instances 10 --suite test
- name: helm-summarize (source distribution)
run: uv run --isolated --no-project --with dist/*.tar.gz helm-summarize --suite test
- name: helm-server (source distribution)
run: uv run --isolated --no-project --with dist/*.tar.gz helm-server --help
- name: Publish package
uses: pypa/gh-action-pypi-publish@release/v1
37 changes: 37 additions & 0 deletions .github/workflows/test-frontend.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Test Frontend

on:
push:
branches:
- '*'
paths:
- 'helm-frontend/**'
pull_request:
branches:
- '*'
paths:
- 'helm-frontend/**'

jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Use Node.js
uses: actions/setup-node@v4
with:
node-version: '18'
- name: Install Yarn
run: npm install --global yarn
- name: Install dependencies
working-directory: ./helm-frontend
run: yarn install
- name: Run pre-commit
run: ./pre-commit-frontend.sh
- name: Build
working-directory: ./helm-frontend
run: yarn build
- name: Run tests
working-directory: ./helm-frontend
run: yarn test
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Test
name: Test Python
on:
push:
branches: [ main ]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Scenario tests
name: Test Scenarios
on:
push:
branches: [ main ]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,7 @@
# for all Python dependencies (including transitive dependencies)
# whenever pyproject.toml is modified or the workflow is manually triggered.

name: Update requirements

permissions:
contents: write
pull-requests: write
name: Update dependencies

on:
push:
Expand All @@ -15,16 +11,13 @@ on:
paths:
- "pyproject.toml"
- "constraints.txt"
- '.github/workflows/update-requirements.yml'
- '.github/workflows/update-dependencies.yml'
workflow_dispatch:

jobs:
update-requirements:
name: Update requirements
update-dependencies:
name: Update dependencies
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write
strategy:
matrix:
python-version: ["3.10"]
Expand All @@ -39,17 +32,10 @@ jobs:
uses: astral-sh/setup-uv@v6
with:
version: "0.9.4"
enable-cache: true
- name: Create build env with setuptools (for openai-whisper)
run: |
python -m venv .build-venv
.build-venv/bin/pip install setuptools wheel --quiet
- name: Lock dependencies
run: UV_PYTHON=".build-venv/bin/python" uv lock --no-build-isolation-package openai-whisper
run: uv lock
- name: Install dependencies
run: UV_PYTHON=".build-venv/bin/python" uv sync --extra ci --no-build-isolation-package openai-whisper
- name: Write requirements.txt
run: uv pip freeze --exclude-editable > requirements.txt
run: uv sync --extra ci
# Need to manually run tests here because the pull request opened later will not
# run the test workflow.
#
Expand All @@ -66,14 +52,11 @@ jobs:
run: uv run pytest
- name: Run helm-run
run: uv run helm-run --suite test --run-entries simple1:model=simple/model1 --max-eval-instances 10 --exit-on-error
- name: Remove build venv before PR (do not commit)
run: rm -rf .build-venv
- name: Create pull request
uses: peter-evans/create-pull-request@v6
with:
token: ${{ secrets.REPO_TOKEN || secrets.GITHUB_TOKEN }}
commit-message: Update requirements
title: "Update requirements"
branch: actions/update-requirements
commit-message: Update dependencies
title: "Update dependencies"
branch: actions/update-dependencies
delete-branch: true
body: Auto-generated from GitHub Actions.
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@
# for all Python dependencies (including transitive dependencies)
# every Monday or whenever the workflow is manually triggered.

name: Upgrade requirements
name: Upgrade dependencies

on:
# schedule:
# - cron: "30 15 * * 1"
workflow_dispatch:

jobs:
upgrade-requirements:
name: Upgrade requirements
upgrade-dependencies:
name: Upgrade dependencies
runs-on: ubuntu-latest
strategy:
matrix:
Expand All @@ -27,17 +27,10 @@ jobs:
uses: astral-sh/setup-uv@v6
with:
version: "0.9.4"
enable-cache: true
- name: Create build env with setuptools (for openai-whisper)
run: |
python -m venv .build-venv
.build-venv/bin/pip install setuptools wheel --quiet
- name: Upgrade dependencies
run: UV_PYTHON=".build-venv/bin/python" uv lock --upgrade --no-build-isolation-package openai-whisper
run: uv lock --upgrade
- name: Install dependencies
run: UV_PYTHON=".build-venv/bin/python" uv sync --extra ci --no-build-isolation-package openai-whisper
- name: Write requirements.txt
run: uv pip freeze --exclude-editable > requirements.txt
run: uv sync --extra ci
# Need to manually run tests here because the pull request opened later will not
# run the test workflow.
#
Expand All @@ -57,8 +50,8 @@ jobs:
- name: Create pull request
uses: peter-evans/create-pull-request@v6
with:
commit-message: Upgrade requirements
title: "Upgrade requirements"
branch: actions/upgrade-requirements
commit-message: Upgrade dependencies
title: "Upgrade dependencies"
branch: actions/upgrade-dependencies
delete-branch: true
body: Auto-generated from GitHub Actions.
15 changes: 9 additions & 6 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ build:
os: "ubuntu-20.04"
tools:
python: "3.10"

python:
install:
- requirements: docs/requirements.txt
- method: pip
path: .
jobs:
pre_create_environment:
- asdf plugin add uv
- asdf install uv latest
- asdf global uv latest
create_environment:
- uv venv "${READTHEDOCS_VIRTUALENV_PATH}"
install:
- UV_PROJECT_ENVIRONMENT="${READTHEDOCS_VIRTUALENV_PATH}" uv sync --frozen --group docs
Loading
Loading