Aussie AI Blog

What's New in Speculative Decoding?

  • March 3rd, 2025
  • by David Spuler, Ph.D.

What's New in Speculative Decoding?

Speculative decoding is one of the earliest LLM efficiency improvements that parallelized a lot of decoding steps. And yet, there seems to be a never-ending supply of research papers on the topic of speculative decoding.

So, what's new? Here are some of the more recent research areas:

  • Draft model accuracy — more papers on this, as always; forgive me if I yawn!
  • Multiple parallel draft models — ongoing improvements to this idea.
  • Multi-query prompt lookup decoding — this generalizes prompt lookup decoding to scour not only the current prompt context, but also any previous queries in the history.
  • Distributed speculative decoding — optimal use of speculative decoding when inference processing is distributed over multiple GPUs or multiple servers.
  • Long context speculative decoding — examination of particular optimizations when applying speculative decoding to long context or ultralong contexts.
  • Vision and multimodal speculative decoding — visual tokenization is very different.

Read more about types of speculative decoding.

More AI Research Topics

Read more about: