Add timing benchmarks to Speculative Decoding

Hey! First off, really appreciate this repo — it's a great resource for 
anyone trying to understand modern ML architectures from scratch. 🙌

I was going through the Speculative Decoding notebook and noticed that 
while the implementation covers the core algorithm well, it doesn't 
include any timing benchmarks — which is kind of the heart of the paper 
(Leviathan et al., 2023). The whole motivation behind speculative decoding 
is faster inference, so without measuring the actual speedup, it's hard 
to appreciate *why* this technique matters.

**Proposed fix:** Add a small benchmark cell at the end of the notebook that:
- Generates N tokens using standard autoregressive decoding → records time
- Generates N tokens using speculative decoding → records time
- Prints the speedup ratio

This would make the notebook go from "here's how it works" to "here's 
why it actually matters" — a much more complete learning experience.

Happy to open a PR for this if you're open to it! 🚀

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add timing benchmarks to Speculative Decoding #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add timing benchmarks to Speculative Decoding #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions