Hey! First off, really appreciate this repo β it's a great resource for
anyone trying to understand modern ML architectures from scratch. π
I was going through the Speculative Decoding notebook and noticed that
while the implementation covers the core algorithm well, it doesn't
include any timing benchmarks β which is kind of the heart of the paper
(Leviathan et al., 2023). The whole motivation behind speculative decoding
is faster inference, so without measuring the actual speedup, it's hard
to appreciate why this technique matters.
Proposed fix: Add a small benchmark cell at the end of the notebook that:
- Generates N tokens using standard autoregressive decoding β records time
- Generates N tokens using speculative decoding β records time
- Prints the speedup ratio
This would make the notebook go from "here's how it works" to "here's
why it actually matters" β a much more complete learning experience.
Happy to open a PR for this if you're open to it! π
Hey! First off, really appreciate this repo β it's a great resource for
anyone trying to understand modern ML architectures from scratch. π
I was going through the Speculative Decoding notebook and noticed that
while the implementation covers the core algorithm well, it doesn't
include any timing benchmarks β which is kind of the heart of the paper
(Leviathan et al., 2023). The whole motivation behind speculative decoding
is faster inference, so without measuring the actual speedup, it's hard
to appreciate why this technique matters.
Proposed fix: Add a small benchmark cell at the end of the notebook that:
This would make the notebook go from "here's how it works" to "here's
why it actually matters" β a much more complete learning experience.
Happy to open a PR for this if you're open to it! π