Skip to content

Use AbstractPPL evaluator interface#255

Open
yebai wants to merge 28 commits into
mainfrom
abstractppl-dynamicppl-migration
Open

Use AbstractPPL evaluator interface#255
yebai wants to merge 28 commits into
mainfrom
abstractppl-dynamicppl-migration

Conversation

@yebai
Copy link
Copy Markdown
Member

@yebai yebai commented Apr 24, 2026

AdvancedVI's AD plumbing now sits on the AbstractPPL evaluator interface. The change mirrors DynamicPPL's recent migration (TuringLang/DynamicPPL.jl#1363), so the whole TuringLang stack funnels AD preparation through a single seam: AbstractPPL.prepare(adtype, f, x; context=...) and AbstractPPL.value_and_gradient!!(prep, x).

Hessian support is preserved via AbstractPPL's order=2 prepare path, falling back to gradient-only on MethodError from backends that don't implement second order.

Compat moves to AbstractPPL 0.15 and DynamicPPL 0.42; the latter is source-pinned to DynamicPPL/main until 0.42 is tagged.

Enzyme failures are not related to this PR — the same failures reproduce on main.

Replacing #249

Replace `DifferentiationInterface` with `AbstractPPL.prepare` /
`AbstractPPL.value_and_gradient` throughout. Key changes:

- `_prepare_gradient` / `_value_and_gradient!` now wrap an `AbstractPPL`
  prepared evaluator in a `_VIGradPrep` struct that holds an `aux_ref` so
  auxiliary inputs can be swapped without re-preparing.
- `DynamicPPLModelLogDensityFunction` stores `model_ref` and
  `loglikeadj_ref` as `Ref`s; `subsample` mutates them in-place instead
  of creating a new struct via `@set`, keeping the prepared evaluator valid
  across subsampling steps.
- Drop second-order (Hessian) support; `use_hessian=true` now warns and is
  ignored.
- Pin dev branches via `[sources]`: `AbstractPPL@evaluator-interface` and
  `DynamicPPL@adproblems-interface`.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yebai yebai force-pushed the abstractppl-dynamicppl-migration branch from 639e031 to f689bd8 Compare April 24, 2026 15:36
yebai and others added 5 commits May 19, 2026 18:02
- Switch `value_and_gradient` → `value_and_gradient!!` per AbstractPPL 0.15.
- In AdvancedVI core, give every `AbstractPPL.Prepared{<:AbstractADType,<:VectorEvaluator}`
  a `LogDensityProblems` interface (order-1 fallback) so any AD-backed prep
  acts as a `LogDensityProblem` without backend-specific wiring.
- In `AdvancedVIMooncakeExt`, promote Mooncake-prepared evaluators to order-2
  and add `logdensity_gradient_and_hessian` via forward-over-reverse Mooncake.
- Simplify `DynamicPPLModelLogDensityFunction` to delegate LDP calls to its
  inner prep; `capabilities` reads off the prep's own capability so the Hessian
  branch is exposed exactly when the backend supports it.
- Bump compat: AbstractPPL 0.15, DynamicPPL 0.42 (with branch pin until
  release); add Bijectors branch pin in `test/` for the same reason.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`git diff main --stat` was full of `return` keyword and `*` spacing
adjustments unrelated to the AbstractPPL 0.15 switch. Restoring those
files to `main`'s state shrinks the review surface to just the
load-bearing changes (AbstractPPL/DynamicPPL bump, LDP+Hessian wiring,
stale-DI comment fixes).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keeps this branch focused on the AbstractPPL evaluator-interface migration:
the LDP fallback methods on `Prepared`, the Mooncake forward-over-reverse
Hessian, and the DynamicPPLModelLogDensityFunction LDP-delegation refactor
have been moved to the `ldp-on-prepared-hessian` branch (branched from
`main` and carrying both the migration and the feature work).

What stays here:
- Switch DI → AbstractPPL.prepare / value_and_gradient!! in core.
- DynamicPPLModelLogDensityFunction uses AbstractPPL.prepare instead of
  DI.prepare_gradient (Hessian path is dropped as in the original
  migration commit; `use_hessian=true` warns and is ignored).
- Compat bumps: AbstractPPL@0.15, DynamicPPL@0.42, plus Bijectors source
  pin in `test/`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pins AbstractPPL to `hg/hessian-order`, which adds `prepare(adtype, f, x;
order=2)` and `value_gradient_and_hessian!!`. `DynamicPPLModelLogDensityFunction`
goes back to main's `prep_grad` + `prep_hess` shape (default `use_hessian=true`),
so the diff reads as a clean DI → AbstractPPL swap rather than a redesign.
`use_hessian` falls back to gradient-only when the AD backend refuses `order=2`
(MethodError only; other errors propagate).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrapping the prepared evaluator and the aux `Ref` in a struct was just
working around the lack of mutable state on `AbstractPPL.Prepared`.
AbstractPPL 0.15's `prepare(...; context=tuple)` keeps the tuple on the
evaluator and threads it through every call, so passing `Ref(aux)` in
the context stores the ref inside the prep itself; callers mutate
`prep.evaluator.context[1][]` before each evaluation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`loglikeadj_ref` now tracks the input numeric type via a `LogLikeAdj<:Real`
struct parameter, restoring `Float32`/`BigFloat`/dual support that the
hard-pinned `Ref{Float64}` quietly dropped. `subsample` computes the
adjustment in the field's eltype to preserve precision across minibatches.

The block comment now also flags why `model_ref::Ref{Any}` cannot be
tightened (subsample-widened `defaults` NamedTuple), so a future reader
doesn't try.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai marked this pull request as ready for review May 20, 2026 13:12
yebai and others added 3 commits May 20, 2026 14:12
AbstractPPL 0.15.1 and Bijectors 0.16.0 are now released. The
DynamicPPL `adproblems-interface` branch has been updated to require
Bijectors 0.16, so keeping the DynamicPPL source pin no longer pulls
in an older Bijectors. Restrict the package and test envs to
Bijectors 0.16; relax docs to "0.15, 0.16" since NormalizingFlows
0.2.2 still requires 0.15.x.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai
Copy link
Copy Markdown
Member Author

yebai commented May 25, 2026

@sunxd3 @Red-Portal, this PR is ready for a look.

yebai and others added 4 commits May 25, 2026 20:31
The adproblems-interface branch was merged into DynamicPPL main (#1363)
and deleted, breaking CI. Track main until DynamicPPL 0.42 is released.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pkg's [sources] field is only honoured from Julia 1.11 onward, so the
LTS (1.10) jobs can't resolve the DynamicPPL main source pin. Move the
benchmark and docs workflows to Julia 1, and add Julia 1 alongside 1.10
in the Buildkite CUDA matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbstractPPL's DI extension routes compiled-tape ReverseDiff through a
closure so the tape can be reused — but that closure bakes `aref[]` in
at preparation time, so mutating the context Ref between iterations
silently feeds the original `aux` (stale `q_stop`, etc.) into AD and
diverges the optimization.

Return `nothing` from `_prepare_gradient` for `AutoReverseDiff{true}`
and re-prepare in `_value_and_gradient!` so the fresh `aux` reaches AD
each call without mutating shared state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AbstractPPL's DI extension provides the AD-aware `prepare` method only
for backends backed by DifferentiationInterface, but doesn't ship a
Zygote path. Drop Zygote from the benchmark matrix and add
DifferentiationInterface so the remaining ReverseDiff/Mooncake rows
resolve. Add `using DifferentiationInterface` to the @setup blocks in
families.md and klminrepgraddescent.md — these blocks call AD-prepared
gradients before any tutorial page loads DI.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Details
Benchmark suite Current: daf504e Previous: d8d2ee9 Ratio
normal/RepGradELBO/fullrank/Mooncake 400574944 ns 295358754 ns 1.36
normal/RepGradELBO/fullrank/ReverseDiff 1087096733 ns 1691083911 ns 0.64
normal/RepGradELBO/meanfield/Mooncake 108040221 ns 83113204 ns 1.30
normal/RepGradELBO/meanfield/ReverseDiff 532684414 ns 1004964089 ns 0.53
normal/RepGradELBO + STL/fullrank/Mooncake 471804144 ns 350958027.5 ns 1.34
normal/RepGradELBO + STL/fullrank/ReverseDiff 2365658912 ns 4098872113.5 ns 0.58
normal/RepGradELBO + STL/meanfield/Mooncake 160803518.5 ns 129223657.5 ns 1.24
normal/RepGradELBO + STL/meanfield/ReverseDiff 1157682030 ns 2042112606 ns 0.57

This comment was automatically generated by workflow using github-action-benchmark.

yebai and others added 2 commits May 25, 2026 22:03
- Compress the compile-tape ReverseDiff rationale at the dispatch site;
  drop the cross-reference duplicate on `_prepare_gradient`.
- Drop the stale "unlike DI's Constant" parenthetical from the
  DynamicPPL ext struct comment.
- Add a brief WHY for the `copy(grad)` / `copy(H)` calls in
  `logdensity_and_gradient` / `logdensity_gradient_and_hessian` — the
  arrays may alias the prep's internal buffers under AbstractPPL's `!!`
  contract.
- Use `!==` instead of `!=` when comparing type parameters in
  `LogDensityProblems.capabilities`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mooncake covers the reverse-mode benchmark slot and is the backend we
actually want to track; ReverseDiff adds noise without informing the
gradient-time numbers we report.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

AdvancedVI.jl documentation for PR #255 is available at:
https://TuringLang.github.io/AdvancedVI.jl/previews/PR255/

@yebai yebai force-pushed the abstractppl-dynamicppl-migration branch from 7265e2e to 61cee75 Compare May 25, 2026 21:10
yebai and others added 2 commits May 25, 2026 22:14
Collapse the single-row AD list onto the opening bracket per the
formatter check on PR #255.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DynamicPPL has a new release covering the AbstractPPL/DynamicPPL migration,
so the source override is no longer needed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread bench/benchmarks.jl
Comment thread docs/src/klminrepgraddescent.md Outdated
Comment thread docs/src/klminrepgraddescent.md Outdated
Comment thread docs/src/tutorials/basic.md Outdated
Comment thread docs/src/tutorials/basic.md Outdated
Comment thread docs/src/tutorials/basic.md Outdated
@Red-Portal
Copy link
Copy Markdown
Member

Thanks let me take a look in the coming days.

- Drop DifferentiationInterface from bench and docs (DI is unused in
  AdvancedVI itself, which routes AD through AbstractPPL). Keep it in
  test/ where it remains load-bearing as the trigger for DI's Enzyme
  extension.
- Replace ReverseDiff with Mooncake everywhere in docs and benchmarks;
  use the no-arg `AutoMooncake()` form consistently.
- In `MvLocationScaleLowRank.entropy`, swap `Hermitian` for `Symmetric`
  so that Mooncake's `logdet ∘ Symmetric` rule (chalk-lab/Mooncake.jl
  PR #1055) fires — `Hermitian` is not covered by that rule and falls
  through to a missing `dsytrf` foreigncall. The two wrappers are
  semantically identical for real element types.
- In the QMC tutorial section, declare `QuasiMonteCarlo.sample` as
  non-differentiable via `Mooncake.@zero_derivative` so Mooncake skips
  the Sobol/Owen bit-twiddling that would otherwise trigger its
  bitcast safety check.
- Add `ENV["GKSwstype"] = "100"` at the top of `docs/make.jl` so GR
  uses the no-display workstation; `savefig` still writes files but
  no interactive plot window is opened during the build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yebai and others added 4 commits May 26, 2026 13:58
Use the shared TuringLang Format action (JuliaFormatter v1, default
paths, suggestions disabled) instead of the custom reviewdog pipeline.
Matches the AbstractPPL.jl workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`LogDensityProblemsAD.ADgradient` has no method for `AutoMooncake`, so
the previous wrapping pattern broke once the docs migrated to Mooncake.
`AdvancedVI.optimize` already differentiates through
`LogDensityProblems.logdensity` using the algorithm's `adtype`, so the
wrapper is no longer needed:

- basic, flows, subsampling: pass `prob_trans`/`prob` directly to
  `AdvancedVI.optimize`. In subsampling, specialize `AdvancedVI.subsample`
  on `LogReg` rather than on the (now-removed) `ADgradient` wrapper.
- constrained: `FisherMinBatchMatch` still requires order-1 capability,
  so give `TransformedLogDensityProblem` its own `logdensity_and_gradient`
  via `AbstractPPL.prepare(AutoForwardDiff(), ...)` and bump
  `capabilities` to `LogDensityOrder{1}`.
- docs/Project.toml: drop `LogDensityProblemsAD`, add `AbstractPPL`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Loading `OpenML` transitively brings in `CategoricalArrays`, which adds
`Base.similar` methods with unbound type parameters. Those break
Mooncake's compilation of `collect(::Base.Generator)` — used by
`Bijectors.Stacked`'s `_with_logabsdet_jacobian` via `map(zip(...)) do ...`.

Place the `Mooncake.@mooncake_overlay` workaround in basic.md (the first
tutorial that triggers the path) inside a `@setup` block so it runs
invisibly during the build, and surface the explanation + listing in a
collapsible `<details>` block so readers can see what it does without
the noise being inline.

The overlay extends `Mooncake.mooncake_method_table` globally, so once
basic.md runs the override is in effect for all subsequent tutorials.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`AbstractPPL.prepare` only ships an `AutoMooncake` extension, so the
previous `AutoForwardDiff()` call threw `MethodError` during
`FisherMinBatchMatch`. Switch the import and the `prepare` call to
`Mooncake`; the `Bijectors.Stacked` overlay defined in basic.md's
`@setup` is already in effect by the time constrained.md runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai force-pushed the abstractppl-dynamicppl-migration branch from f30f697 to 2fbbcdc Compare May 26, 2026 12:59
yebai added 2 commits May 26, 2026 14:13
# Conflicts:
#	docs/src/tutorials/constrained.md
#	docs/src/tutorials/flows.md
#	docs/src/tutorials/subsampling.md
Comment thread ext/AdvancedVIDynamicPPLExt.jl Outdated
Comment thread ext/AdvancedVIDynamicPPLExt.jl
Comment thread ext/AdvancedVIDynamicPPLExt.jl Outdated
Comment thread docs/src/tutorials/basic.md Outdated
Comment thread ext/AdvancedVIDynamicPPLExt.jl Outdated
yebai and others added 2 commits May 28, 2026 19:47
…marks

- ext: switch from deprecated `evaluate!!` to `DynamicPPL.logdensity_internal`,
  using a `WeightedLogJoint` getlogdensity that reads `loglikeadj` through the
  closure's Ref so subsample updates are observed without rebuilding AD prep.
- ext: store the original (unsubsampled) model so repeated `subsample` calls
  compute `n_datapoints` from the full dataset, not the previous batch.
- ext: broaden `capabilities` where-clause to dispatch on `adtype=nothing`.
- ext: drop redundant `varinfo`/`accs` fields; store `dim::Int` directly.
- docs/basic.md: dedent the FullRankGaussian intro so it is body text, not a
  `[^KW2014]` footnote continuation.
- bench: re-add `AutoReverseDiff(compile=true)` to the AD list; pull
  `DifferentiationInterface` into bench deps so non-Mooncake, non-ForwardDiff
  backends find `AbstractPPL.prepare`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim the struct-level WHY block and the inline `prob.model` comment to keep
just the rationale (full-dataset invariant, `Ref{Any}` choice, compiled-tape
caveat) without restating control flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai
Copy link
Copy Markdown
Member Author

yebai commented May 28, 2026

All fixed @sunxd3

JuliaFormatter v1 with `format_markdown=true` re-indented the
`FullRankGaussian` intro line into the `[^KW2014]` footnote because
the footnotes were sandwiched between body text. Moving them to the
end of the document removes the ambiguity, so both Xianda's review
fix and the formatter check are satisfied.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@yebai yebai force-pushed the abstractppl-dynamicppl-migration branch from d8d2ee9 to daf504e Compare May 28, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants