-
Model Merging — a biased overview
A friendly tour of model merging, suspiciously aligned with my own research.
-
Attention sinks from the graph perspective
Why causal transformers naturally concentrate attention on their earliest tokens.
A friendly tour of model merging, suspiciously aligned with my own research.
Why causal transformers naturally concentrate attention on their earliest tokens.