A Visual Guide to Attention Variants in Modern LLMs
23 points
1/21/1970
2 days ago
by Anon84
Comments
nv2156
Great read about the technical evidence around the shift from better attention to better serving of models. Just came across a companion piece around this https://news.ycombinator.com/item?id=47388676
Great read about the technical evidence around the shift from better attention to better serving of models. Just came across a companion piece around this https://news.ycombinator.com/item?id=47388676