Conversation
There was a problem hiding this comment.
[Edit: This was generated by an experimental agent command.]
Reviewed locally.
Verdict: No blocking issues. A few callouts:
-
create_dsp() / prewarm removal (get_dsp.cpp): Clients that call
get_dsp()and process without callingReset()first will see transient output. Worth documenting thatReset()(orResetAndPrewarm()) must be called before firstprocess(). -
Generic inline GEMM (dsp.cpp ~727): As noted in the PR, a size threshold for switching back to Eigen on larger matrices might be worth benchmarking.
-
film.h: The 3-channel FiLM optimizations are a nice addition; could be mentioned in the PR description for completeness.
Otherwise the optimizations look correct: weight layout and bias fusion logic are consistent with existing paths, and _skip_head_copy correctly handles the no-head1x1 / no-gating case.
sdatkinson
left a comment
There was a problem hiding this comment.
Just if you can get that prewarm crit that's the biggest one for me.
The second is the fallback to normal eigen. The best would be if you can do some profiling to refine the guess, but if you're strapped for time then an "8" and a comment to flag that it's not optimized is ok in a pinch.
| { | ||
| // Fall back to Eigen for larger matrices where it's more efficient | ||
| _output.leftCols(num_frames).noalias() = this->_weight * input.leftCols(num_frames); | ||
| // Generic inline GEMM for any matrix size (avoids Eigen overhead for small matrices) |
There was a problem hiding this comment.
Yeah if you want to maybe make some configurable threshold for going back to Eigen and default it so that maybe 8 starts using Eigen? then that seems reasonable
[I also wonder if this is us just re-inventing Eigen ;) ]
| // "pre-warm" the model to settle initial conditions | ||
| // Can this be removed now that it's part of Reset()? | ||
| out->prewarm(); | ||
| // Prewarm is left to the caller so it can call SetMaxBufferSize() first. |
There was a problem hiding this comment.
[C] For backward compatibility, can you instead make the "4096" configurable at compilation (and keep 4096 the default)?
| std::memcpy(this->_output_head.data(), this->_z.data(), total * sizeof(float)); | ||
| } | ||
| else | ||
| // When _skip_head_copy is true, GetOutputHead() returns _z directly, so no copy needed. |
There was a problem hiding this comment.
Yeah, we can tighten that sort of stuff up...thanks :)
Following previous updates for specific model sizes, this PR optimizes blocks that are in the hot path for models with 3 channels in/3 channels out:
- Added fused k=6 3×3 path: reads all 6 input taps and computes full convolution in one pass per frame. Eliminates setZero + 6 separate ring
buffer reads + 6 GEMM accumulations (reduces 7 passes to 1)
- Added _skip_head_copy flag (set when no head1x1 and no gating)
- GetOutputHead() returns _z directly when flag is set
- Skips 15 memcpys of 576 bytes each per block
Of all of the optimizations, the only one that might not be worth adding is replacing the Eigen fallback for any size of GEMM. We might want to have some kind of threshold where we actually use Eigen. Perhaps worth some more benchmarking.
Additionally, this removes the prewarm call from create_dsp, as that was not ideal for embedded targets. This gives the client code a chance to change MaxBufferSize before calling
reset()(which then callsprewarm().Developed with support and sponsorship from TONE3000