Skip to content

Replace glob with compare-changes#113

Closed
anttiharju wants to merge 5 commits intompalmer:mainfrom
anttiharju:27-fix-wildcards-105-fix-gitignore-perf
Closed

Replace glob with compare-changes#113
anttiharju wants to merge 5 commits intompalmer:mainfrom
anttiharju:27-fix-wildcards-105-fix-gitignore-perf

Conversation

@anttiharju
Copy link
Copy Markdown
Contributor

@anttiharju anttiharju commented Dec 1, 2025

What / Why

How

I had some free time on my hands so I spent Nov 21 to Nov 30 on building a Rust library called compare-changes that is a reimplementation of GitHub's workflow syntax.

While setting up a trivial parser/pattern matcher is easy enough, one will soon start to learn about things like memoization, backtracking, pathological patterns, automatas and such, which are interesting, but would take me ages to use / account for, so instead the library relies on chumsky and regex for the difficult runtime bits.

Overview of compare-changes
  1. It is a library intended to replace glob for action-validator
  2. It is a CLI / a GitHub Action that replaces my previous (poorly implemented) Python-based compare-changes. It is not that Python is bad (arguably better when targeting GitHub Actions), it is that it was a quick hack that worked well enough that until now wasn't worth improving.

and how the library works:

  • lib.rs defines the public API, and does some input normalization on the provided path string [L41-45] to match GitHub's behavior.
  • path/mod.rs uses chumsky to parse the input strings into Rust structs/enums so the data is easier to work with down the line. It also performs some validations on e.g. brackets and such. An alternative would have been nom but I understood chumsky's a bit better at emitting errors. Most of the integration tests focus on validation errors.

Chumsky is a parser library for Rust that makes writing expressive, high-performance parsers easy.

  • convert/mod.rs ingests the Rust structs/enums to build regex patterns to match the Github Actions workflow syntax filter patterns spec. It has (imho) lots of unit tests at convert/test.rs and it even performs some validation on the patterns by emitting errors on invalid see L300-324.

While GitHub's Documentation is helpful, it doesn't cover various edge cases of how pattern validation should behave. Therefore I've used various temporary repositories to run dummy workflows to see the actual behavior and eventually settled on the use of anttiharju/tmp for the purpose.

Now while I am quite happy with the state of the library, it of course could be that I've missed some edge cases and such, even if I am aware of a decent chunk of them. In case people encounter further issues with action-validator in regards to path patterns, feel free to point them to open an issue in anttiharju/compare-changes. These are fairly easy to provide repros for and it should be easy enough for me to update the library.

Testing

Library has plenty of automated tests. Also tested manually:

  • Usage as a Git hook (via Lefthook) still catches glob issues in pre-commit
  • Works well in a monorepo (nixpkgs). Since the difficult runtime bits are handled by other crates I didn't expect issues here, but a hand-written implementation could have struggled.

Benchmarks

Main motivator for me here was fixing the slow Git hook so here are some numbers from a repo that uses a lot of glob patterns (anttiharju/relcheck@v1.8.13).

worst case: from 7.883 s to 118.5 ms, a 66x improvement (7883/118=66,8050847458)
best case: from 92.4 ms to 119.0 ms, a 26.6 ms overhead

the multiplier may change a lot depending on the repo picked for running the benchmarks. In the performance issue discussion the multiplier was 19x [comment] because the worst case was not as bad. I did check that the numbers here were from 0.8.0 and not 0.6.0, with 0.6.0 worst case went up to ~10s.

Benchmarks ran on a 2021 14" MBP using hyperfine:

hyperfine --warmup 3 "git ls-files -z '.github/workflows/*.yml' '*/action.yml' | xargs -0 action-validator --verbose"
details

worst-case

with a gitignored .direnv from nix-direnv

action-validator@27-fix-wildcards-105-fix-gitignore-perf

  Time (mean ± σ):     118.5 ms ±   0.7 ms    [User: 80.5 ms, System: 36.4 ms]
  Range (min … max):   117.3 ms … 120.2 ms    23 runs

action-validator@v0.8.0

  Time (mean ± σ):      7.883 s ±  0.137 s    [User: 0.814 s, System: 7.062 s]
  Range (min … max):    7.717 s …  8.195 s    10 runs

best case

no gitignored files ensured with

git reset --hard && git clean -dfx

action-validator@27-fix-wildcards-105-fix-gitignore-perf

  Time (mean ± σ):     119.0 ms ±   1.0 ms    [User: 80.6 ms, System: 36.4 ms]
  Range (min … max):   117.8 ms … 121.3 ms    24 runs

action-validator@v0.8.0

  Time (mean ± σ):      92.4 ms ±   0.8 ms    [User: 66.6 ms, System: 26.2 ms]
  Range (min … max):    91.3 ms …  94.3 ms    30 runs

Anything else

  • Noticed that some tests (tests/fixtures/012_github_glob_syntax/glob.yml) did not match GitHub's behavior, so went ahead and fixed them.
  • action-validator now shells out to git for ls-files to solve the performance issue when a repository has a lot of gitignored files.

Comments on the code in the library (lib.rs, path/mod.rs, convert/mod.rs) are also welcome, but not expected.

@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 379fdd3 to 3010328 Compare December 1, 2025 09:51
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 9f1c3e0 to a830265 Compare December 5, 2025 21:27
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch 4 times, most recently from 5fbdf60 to a830265 Compare December 24, 2025 17:53
@anttiharju
Copy link
Copy Markdown
Contributor Author

anttiharju commented Dec 24, 2025

Broken cache setup is causing the CI failures, it is referring to output of a step that hasn't been executed yet, defaulting to empty. See:

key: ${{ runner.os }}-${{ steps.rust-install.outputs.cachekey }}-${{ matrix.rust_target }}-binary-release

and Linux--build, meaning, ${{ steps.rust-install.outputs.cachekey }} is empty

Run actions/cache@v4
Cache hit for: Linux--build
Received 25165824 of 303862892 (8.3%), 24.0 MBs/sec
Received 150994944 of 303862892 (49.7%), 71.9 MBs/sec
Received 299668588 of 303862892 (98.6%), 95.2 MBs/sec
Received 303862892 of 303862892 (100.0%), 94.2 MBs/sec
Cache Size: ~290 MB (303862892 B)
/usr/bin/tar -xf /home/runner/work/_temp/6dc3820f-ec58-409f-b2ad-c40a4dc4eb50/cache.tzst -P -C /home/runner/work/action-validator/action-validator --use-compress-program unzstd
Cache restored successfully
Cache restored from key: Linux--build

I would just manually delete the caches at https://github.com/mpalmer/action-validator/actions/caches and rerun the ci, but I can't.

For a proper fix something more involved is required and I don't have easy answers for that.

CI passes fine in my fork:

image

@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from a830265 to 5fbdf60 Compare December 24, 2025 18:07
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 5fbdf60 to c0a9294 Compare January 1, 2026 09:58
This unlocks helpful syntax in compare-changes

edition 2024 mandates minimum 1.86 as well, but
had some CI issues so bumping directly to 1.92
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from 6793c47 to 9055611 Compare January 1, 2026 10:06
@anttiharju
Copy link
Copy Markdown
Contributor Author

Fixed CI with 0de554d, it may need further work later but for now should be ok

@anttiharju
Copy link
Copy Markdown
Contributor Author

@mpalmer happy new year, this PR is ready for review

The library hasn't really changed, just packaging & other updates.
@anttiharju anttiharju force-pushed the 27-fix-wildcards-105-fix-gitignore-perf branch from be69f1b to 5f2453d Compare February 6, 2026 05:47
@mpalmer
Copy link
Copy Markdown
Owner

mpalmer commented Apr 4, 2026

OK, back in the saddle... thoughts:

  • Good catch on the ordering problems with the CI jobs leaving the cachekey blank. I'm not a fan of hashing Cargo.* to make a cache key, as that (a) doesn't capture when the Rust version changes, and (b) switches to a new cache every time dependencies change. Instead, I've just pushed a change that corrects the step ordering, and now things seem to work better.
  • Your cargo fmt commit is very strange, as it makes a change that my own cargo fmt runs revert back to how it is now. I suspect one of us has a rustfmt.toml in a parent directory that contains options that are causing problems.
  • Can you give more details as to what the "helpful syntax" is that switching action-validator to edition 2024 unlocks? My understanding is that edition is a per-crate setting, and so changing it in one crate shouldn't change anything anywhere else.

After those observations, the important news: the code that switches from glob to compare-changes itself seems entirely fine and reasonable, so I've cherry-picked that into main. If you want any of the changes from the other commits to go in, it's probably best to open a new PR.

@mpalmer mpalmer closed this Apr 4, 2026
@mpalmer mpalmer mentioned this pull request Apr 4, 2026
@anttiharju
Copy link
Copy Markdown
Contributor Author

Thanks Matt! Very happy to see the important change merged. Hope it also gets released 😄:

I have no strong opinions on the rest, and actually learned a few things from your comment! For the sake of completeness:

Good catch on the ordering problems with the CI jobs leaving the cachekey blank. I'm not a fan of hashing Cargo.* to make a cache key, as that (a) doesn't capture when the Rust version changes,

I thought Rust version would be covered by:

rust-version = "1.84.0"

and (b) switches to a new cache every time dependencies change. Instead, I've just pushed a change that corrects the step ordering, and now things seem to work better.

This is indeed quite cumbersome. Glad you found a better fix!

Your cargo fmt commit is very strange, as it makes a change that my own cargo fmt runs revert back to how it is now. I suspect one of us has a rustfmt.toml in a parent directory that contains options that are causing problems.

I think it is related to the attempted Rust edition update. You can reproduce 9055611 by having Cargo.toml rust-version set to 1.94.0, edition to 2024 and running cargo fmt.

Can you give more details as to what the "helpful syntax" is that switching action-validator to edition 2024 unlocks?

With edition 2024 the mod.rs of my library could have had a little bit less nesting, see here, but thankfully as you say these are indeed per-crate so I can do it and action-validator can stay on 2021.

My understanding is that edition is a per-crate setting, and so changing it in one crate shouldn't change anything anywhere else.

So it was just confusion on my part, sorry. Still a bit new to Rust 😄 To avoid the crates.io publish loop for every little change, I had copied over the files locally, which of course makes them part of the same crate and breaks the build due to incompatible syntax. Should have used staging.crates.io to do it properly.

@mpalmer
Copy link
Copy Markdown
Owner

mpalmer commented Apr 4, 2026

I thought Rust version would be covered by:

A change to the MSRV (Minimum Supported Rust Version) would be covered by a change to Cargo.toml, but the Rust version actually used by the build run wouldn't be picked up -- a lot of CI jobs just say "use nightly" or "use latest stable", and that'll change on a regular cadence determined by Rust releases, not changes that I make.

I think it is related to the attempted Rust edition update.

Oooh, yeah, that'd do it. I didn't look at the ordering of the commits.

To avoid the crates.io publish loop for every little change, I had copied over the files locally, which of course makes them part of the same crate and breaks the build due to incompatible syntax. Should have used staging.crates.io to do it properly.

Alternately, you could temporarily use a local override to point at a local copy of compare-changes, by adding something like this to action-validator's Cargo.toml:

[patch.crates-io]
# point to a pushed branch, but which is as-yet unreleased
compare-changes = { git = "https://github.com/...", branch = "some-new-feature" }
# or else even point to uncommitted changes
compare-changes = { path = "../compare-changes" }

I use that approach all the time when I'm testing out changes to crates for features and bugfixes that I'll use elsewhere, it's fantastic.

@anttiharju
Copy link
Copy Markdown
Contributor Author

To avoid the crates.io publish loop for every little change, I had copied over the files locally, which of course makes them part of the same crate and breaks the build due to incompatible syntax. Should have used staging.crates.io to do it properly.

Alternately, you could temporarily use a local override to point at a local copy of compare-changes, by adding something like this to action-validator's Cargo.toml:

[patch.crates-io]
# point to a pushed branch, but which is as-yet unreleased
compare-changes = { git = "https://github.com/...", branch = "some-new-feature" }
# or else even point to uncommitted changes
compare-changes = { path = "../compare-changes" }

I use that approach all the time when I'm testing out changes to crates for features and bugfixes that I'll use elsewhere, it's fantastic.

Oh this one is really cool. Somehow missed it when trying to Google about how to do it 😄 Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance issue in repositories with lots of gitignored files

2 participants