Anyone tried using the new (ish) Gemma diffusion model as a speculative model?
It seems that MTP is the gold standard for speed up but still suffers from having to choose between regressive and parallel drafters that come with trade offs. Wouldn't using a diffusion model be a go
事件进展(2 篇报道)
-
2026-07-03 22:53Reddit r/LocalLLaMA Anyone tried using the new (ish) Gemma diffusion model as a speculative model?
-
2026-06-11 00:24Google DeepMind DiffusionGemma: 4x faster text generation