Replace `glob` with `compare-changes` by anttiharju · Pull Request #113 · mpalmer/action-validator

anttiharju · 2025-12-01T09:50:27Z

What / Why

Fix Performance issue in repositories with lots of gitignored files #105
and paths wildcard validation seems off #27
(probably also path glob weird bug!? #77?)

How

I had some free time on my hands so I spent Nov 21 to Nov 30 on building a Rust library called compare-changes that is a reimplementation of GitHub's workflow syntax.

While setting up a trivial parser/pattern matcher is easy enough, one will soon start to learn about things like memoization, backtracking, pathological patterns, automatas and such, which are interesting, but would take me ages to use / account for, so instead the library relies on chumsky and regex for the difficult runtime bits.

Overview of compare-changes

It is a library intended to replace glob for action-validator
It is a CLI / a GitHub Action that replaces my previous (poorly implemented) Python-based compare-changes. It is not that Python is bad (arguably better when targeting GitHub Actions), it is that it was a quick hack that worked well enough that until now wasn't worth improving.

and how the library works:

lib.rs defines the public API, and does some input normalization on the provided path string [L41-45] to match GitHub's behavior.
path/mod.rs uses chumsky to parse the input strings into Rust structs/enums so the data is easier to work with down the line. It also performs some validations on e.g. brackets and such. An alternative would have been nom but I understood chumsky's a bit better at emitting errors. Most of the integration tests focus on validation errors.

Chumsky is a parser library for Rust that makes writing expressive, high-performance parsers easy.

convert/mod.rs ingests the Rust structs/enums to build regex patterns to match the Github Actions workflow syntax filter patterns spec. It has (imho) lots of unit tests at convert/test.rs and it even performs some validation on the patterns by emitting errors on invalid see L300-324.

While GitHub's Documentation is helpful, it doesn't cover various edge cases of how pattern validation should behave. Therefore I've used various temporary repositories to run dummy workflows to see the actual behavior and eventually settled on the use of anttiharju/tmp for the purpose.

Now while I am quite happy with the state of the library, it of course could be that I've missed some edge cases and such, even if I am aware of a decent chunk of them. In case people encounter further issues with action-validator in regards to path patterns, feel free to point them to open an issue in anttiharju/compare-changes. These are fairly easy to provide repros for and it should be easy enough for me to update the library.

Testing

Library has plenty of automated tests. Also tested manually:

Usage as a Git hook (via Lefthook) still catches glob issues in pre-commit
Works well in a monorepo (nixpkgs). Since the difficult runtime bits are handled by other crates I didn't expect issues here, but a hand-written implementation could have struggled.

Benchmarks

Main motivator for me here was fixing the slow Git hook so here are some numbers from a repo that uses a lot of glob patterns (anttiharju/relcheck@v1.8.13).

worst case: from 7.883 s to 118.5 ms, a 66x improvement (7883/118=66,8050847458)
best case: from 92.4 ms to 119.0 ms, a 26.6 ms overhead

the multiplier may change a lot depending on the repo picked for running the benchmarks. In the performance issue discussion the multiplier was 19x [comment] because the worst case was not as bad. I did check that the numbers here were from 0.8.0 and not 0.6.0, with 0.6.0 worst case went up to ~10s.

Benchmarks ran on a 2021 14" MBP using hyperfine:

hyperfine --warmup 3 "git ls-files -z '.github/workflows/*.yml' '*/action.yml' | xargs -0 action-validator --verbose"

details

worst-case

with a gitignored .direnv from nix-direnv

`action-validator@27-fix-wildcards-105-fix-gitignore-perf`

  Time (mean ± σ):     118.5 ms ±   0.7 ms    [User: 80.5 ms, System: 36.4 ms]
  Range (min … max):   117.3 ms … 120.2 ms    23 runs

`action-validator@v0.8.0`

  Time (mean ± σ):      7.883 s ±  0.137 s    [User: 0.814 s, System: 7.062 s]
  Range (min … max):    7.717 s …  8.195 s    10 runs

best case

no gitignored files ensured with

git reset --hard && git clean -dfx

`action-validator@27-fix-wildcards-105-fix-gitignore-perf`

  Time (mean ± σ):     119.0 ms ±   1.0 ms    [User: 80.6 ms, System: 36.4 ms]
  Range (min … max):   117.8 ms … 121.3 ms    24 runs

`action-validator@v0.8.0`

  Time (mean ± σ):      92.4 ms ±   0.8 ms    [User: 66.6 ms, System: 26.2 ms]
  Range (min … max):    91.3 ms …  94.3 ms    30 runs

Anything else

Noticed that some tests (tests/fixtures/012_github_glob_syntax/glob.yml) did not match GitHub's behavior, so went ahead and fixed them.
action-validator now shells out to git for ls-files to solve the performance issue when a repository has a lot of gitignored files.

Comments on the code in the library (lib.rs, path/mod.rs, convert/mod.rs) are also welcome, but not expected.

anttiharju · 2025-12-24T18:06:27Z

Broken cache setup is causing the CI failures, it is referring to output of a step that hasn't been executed yet, defaulting to empty. See:

action-validator/.github/workflows/release.yml

Line 41 in c708997

    
                     key: ${{ runner.os }}-${{ steps.rust-install.outputs.cachekey }}-${{ matrix.rust_target }}-binary-release

and Linux--build, meaning, ${{ steps.rust-install.outputs.cachekey }} is empty

Run actions/cache@v4
Cache hit for: Linux--build
Received 25165824 of 303862892 (8.3%), 24.0 MBs/sec
Received 150994944 of 303862892 (49.7%), 71.9 MBs/sec
Received 299668588 of 303862892 (98.6%), 95.2 MBs/sec
Received 303862892 of 303862892 (100.0%), 94.2 MBs/sec
Cache Size: ~290 MB (303862892 B)
/usr/bin/tar -xf /home/runner/work/_temp/6dc3820f-ec58-409f-b2ad-c40a4dc4eb50/cache.tzst -P -C /home/runner/work/action-validator/action-validator --use-compress-program unzstd
Cache restored successfully
Cache restored from key: Linux--build

I would just manually delete the caches at https://github.com/mpalmer/action-validator/actions/caches and rerun the ci, but I can't.

For a proper fix something more involved is required and I don't have easy answers for that.

CI passes fine in my fork:

This unlocks helpful syntax in compare-changes edition 2024 mandates minimum 1.86 as well, but had some CI issues so bumping directly to 1.92

anttiharju · 2026-01-01T10:10:34Z

Fixed CI with 0de554d, it may need further work later but for now should be ok

anttiharju · 2026-01-01T10:13:50Z

@mpalmer happy new year, this PR is ready for review

The library hasn't really changed, just packaging & other updates.

mpalmer · 2026-04-04T06:26:07Z

OK, back in the saddle... thoughts:

Good catch on the ordering problems with the CI jobs leaving the cachekey blank. I'm not a fan of hashing Cargo.* to make a cache key, as that (a) doesn't capture when the Rust version changes, and (b) switches to a new cache every time dependencies change. Instead, I've just pushed a change that corrects the step ordering, and now things seem to work better.
Your cargo fmt commit is very strange, as it makes a change that my own cargo fmt runs revert back to how it is now. I suspect one of us has a rustfmt.toml in a parent directory that contains options that are causing problems.
Can you give more details as to what the "helpful syntax" is that switching action-validator to edition 2024 unlocks? My understanding is that edition is a per-crate setting, and so changing it in one crate shouldn't change anything anywhere else.

After those observations, the important news: the code that switches from glob to compare-changes itself seems entirely fine and reasonable, so I've cherry-picked that into main. If you want any of the changes from the other commits to go in, it's probably best to open a new PR.

anttiharju · 2026-04-04T20:29:54Z

Thanks Matt! Very happy to see the important change merged. Hope it also gets released 😄:

Release latest on npm #119

I have no strong opinions on the rest, and actually learned a few things from your comment! For the sake of completeness:

Good catch on the ordering problems with the CI jobs leaving the cachekey blank. I'm not a fan of hashing Cargo.* to make a cache key, as that (a) doesn't capture when the Rust version changes,

I thought Rust version would be covered by:

action-validator/Cargo.toml

Line 18 in 4249884

rust-version = "1.84.0"

and (b) switches to a new cache every time dependencies change. Instead, I've just pushed a change that corrects the step ordering, and now things seem to work better.

This is indeed quite cumbersome. Glad you found a better fix!

Your cargo fmt commit is very strange, as it makes a change that my own cargo fmt runs revert back to how it is now. I suspect one of us has a rustfmt.toml in a parent directory that contains options that are causing problems.

I think it is related to the attempted Rust edition update. You can reproduce 9055611 by having Cargo.toml rust-version set to 1.94.0, edition to 2024 and running cargo fmt.

Can you give more details as to what the "helpful syntax" is that switching action-validator to edition 2024 unlocks?

With edition 2024 the mod.rs of my library could have had a little bit less nesting, see here, but thankfully as you say these are indeed per-crate so I can do it and action-validator can stay on 2021.

My understanding is that edition is a per-crate setting, and so changing it in one crate shouldn't change anything anywhere else.

So it was just confusion on my part, sorry. Still a bit new to Rust 😄 To avoid the crates.io publish loop for every little change, I had copied over the files locally, which of course makes them part of the same crate and breaks the build due to incompatible syntax. Should have used staging.crates.io to do it properly.

mpalmer · 2026-04-04T23:20:46Z

I thought Rust version would be covered by:

A change to the MSRV (Minimum Supported Rust Version) would be covered by a change to Cargo.toml, but the Rust version actually used by the build run wouldn't be picked up -- a lot of CI jobs just say "use nightly" or "use latest stable", and that'll change on a regular cadence determined by Rust releases, not changes that I make.

I think it is related to the attempted Rust edition update.

Oooh, yeah, that'd do it. I didn't look at the ordering of the commits.

To avoid the crates.io publish loop for every little change, I had copied over the files locally, which of course makes them part of the same crate and breaks the build due to incompatible syntax. Should have used staging.crates.io to do it properly.

Alternately, you could temporarily use a local override to point at a local copy of compare-changes, by adding something like this to action-validator's Cargo.toml:

[patch.crates-io]
# point to a pushed branch, but which is as-yet unreleased
compare-changes = { git = "https://github.com/...", branch = "some-new-feature" }
# or else even point to uncommitted changes
compare-changes = { path = "../compare-changes" }

I use that approach all the time when I'm testing out changes to crates for features and bugfixes that I'll use elsewhere, it's fantastic.

anttiharju · 2026-04-05T09:24:08Z

To avoid the crates.io publish loop for every little change, I had copied over the files locally, which of course makes them part of the same crate and breaks the build due to incompatible syntax. Should have used staging.crates.io to do it properly.

Alternately, you could temporarily use a local override to point at a local copy of compare-changes, by adding something like this to action-validator's Cargo.toml:
[patch.crates-io]
# point to a pushed branch, but which is as-yet unreleased
compare-changes = { git = "https://github.com/...", branch = "some-new-feature" }
# or else even point to uncommitted changes
compare-changes = { path = "../compare-changes" }
I use that approach all the time when I'm testing out changes to crates for features and bugfixes that I'll use elsewhere, it's fantastic.

Oh this one is really cool. Somehow missed it when trying to Google about how to do it 😄 Thanks!!

anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 379fdd3 to 3010328 Compare December 1, 2025 09:51

This was referenced Dec 1, 2025

Performance issue in repositories with lots of gitignored files #105

Closed

paths wildcard validation seems off #27

Closed

anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 9f1c3e0 to a830265 Compare December 5, 2025 21:27

anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch 4 times, most recently from 5fbdf60 to a830265 Compare December 24, 2025 17:53

anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from a830265 to 5fbdf60 Compare December 24, 2025 18:07

anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 5fbdf60 to c0a9294 Compare January 1, 2026 09:58

anttiharju added 4 commits January 1, 2026 12:03

Bump edition to 2024

90f991e

This unlocks helpful syntax in compare-changes edition 2024 mandates minimum 1.86 as well, but had some CI issues so bumping directly to 1.92

Replace broken cachekey with basic hashFiles setup

0de554d

Replace glob with compare-changes

018ef0c

cargo fmt

9055611

anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 6793c47 to 9055611 Compare January 1, 2026 10:06

Bump compare-changes

5f2453d

The library hasn't really changed, just packaging & other updates.

anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from be69f1b to 5f2453d Compare February 6, 2026 05:47

mpalmer closed this Apr 4, 2026

mpalmer mentioned this pull request Apr 4, 2026

path glob weird bug!? #77

Closed

This was referenced Apr 5, 2026

Dependabot config #122

Closed

node/npm: drop fast-glob in favor of compare-changes #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace `glob` with `compare-changes`#113

Replace `glob` with `compare-changes`#113
anttiharju wants to merge 5 commits intompalmer:mainfrom
anttiharju:27-fix-wildcards-105-fix-gitignore-perf

anttiharju commented Dec 1, 2025 •

edited

Loading

Uh oh!

anttiharju commented Dec 24, 2025 •

edited

Loading

Uh oh!

anttiharju commented Jan 1, 2026

Uh oh!

anttiharju commented Jan 1, 2026

Uh oh!

mpalmer commented Apr 4, 2026

Uh oh!

anttiharju commented Apr 4, 2026

Uh oh!

mpalmer commented Apr 4, 2026

Uh oh!

anttiharju commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anttiharju commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What / Why

How

Testing

Benchmarks

worst-case

action-validator@27-fix-wildcards-105-fix-gitignore-perf

action-validator@v0.8.0

best case

action-validator@27-fix-wildcards-105-fix-gitignore-perf

action-validator@v0.8.0

Anything else

Uh oh!

anttiharju commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

anttiharju commented Jan 1, 2026

Uh oh!

anttiharju commented Jan 1, 2026

Uh oh!

mpalmer commented Apr 4, 2026

Uh oh!

anttiharju commented Apr 4, 2026

Uh oh!

mpalmer commented Apr 4, 2026

Uh oh!

anttiharju commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anttiharju commented Dec 1, 2025 •

edited

Loading

`action-validator@27-fix-wildcards-105-fix-gitignore-perf`

`action-validator@v0.8.0`

`action-validator@27-fix-wildcards-105-fix-gitignore-perf`

`action-validator@v0.8.0`

anttiharju commented Dec 24, 2025 •

edited

Loading