Supercharging LLM inference on Google TPUs: Achieving 3X speedups with
diffusion-style speculative decoding
-
Researchers at UCSD have successfully implemented DFlash, a block-diffusion
speculative decoding method, on Google TPUs to bypass the sequential
bottleneck...
16 hours ago