Discussion about this post

User's avatar
Neural Foundry's avatar

The AutoDeco paper is genuinely underrated. The whole idea that we've been manually tuning temperature and top-p when the model itself has way more context about what's needed for each token is almost embarassing in hindsight. What's clever is how Tencent made it differentiable with the smooth top-p replacement, because otherwise you couldn't train those prediction heads at all. The 1-2% overhead is negligible compared to the gains in both quality and usability.

Expand full comment

No posts