Skip to content

Add timing benchmarks to Speculative DecodingΒ #5

@ShivenduShivu

Description

@ShivenduShivu

Hey! First off, really appreciate this repo β€” it's a great resource for
anyone trying to understand modern ML architectures from scratch. πŸ™Œ

I was going through the Speculative Decoding notebook and noticed that
while the implementation covers the core algorithm well, it doesn't
include any timing benchmarks β€” which is kind of the heart of the paper
(Leviathan et al., 2023). The whole motivation behind speculative decoding
is faster inference, so without measuring the actual speedup, it's hard
to appreciate why this technique matters.

Proposed fix: Add a small benchmark cell at the end of the notebook that:

  • Generates N tokens using standard autoregressive decoding β†’ records time
  • Generates N tokens using speculative decoding β†’ records time
  • Prints the speedup ratio

This would make the notebook go from "here's how it works" to "here's
why it actually matters" β€” a much more complete learning experience.

Happy to open a PR for this if you're open to it! πŸš€

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions