Aussie AI Blog

What's New in Speculative Decoding?

March 3rd, 2025

by David Spuler, Ph.D.

What's New in Speculative Decoding?

Speculative decoding is one of the earliest LLM efficiency improvements that parallelized a lot of decoding steps. And yet, there seems to be a never-ending supply of research papers on the topic of speculative decoding.

So, what's new? Here are some of the more recent research areas:

Draft model accuracy — more papers on this, as always; forgive me if I yawn!
Multiple parallel draft models — ongoing improvements to this idea.
Multi-query prompt lookup decoding — this generalizes prompt lookup decoding to scour not only the current prompt context, but also any previous queries in the history.
Distributed speculative decoding — optimal use of speculative decoding when inference processing is distributed over multiple GPUs or multiple servers.
Long context speculative decoding — examination of particular optimizations when applying speculative decoding to long context or ultralong contexts.
Vision and multimodal speculative decoding — visual tokenization is very different.

Read more about types of speculative decoding.

Aussie AI Blog

What's New in Speculative Decoding?

What's New in Speculative Decoding?

More AI Research Topics

Quick Links

Product

New to Writing?

Writing Styles