-
Beyond Attention as a Graph
Higher-order (n-simplicial) attention as topology-driven message passing beyond graphs.
-
Model Merging — a biased overview
A friendly tour of model merging, suspiciously aligned with my own research.
-
Attention sinks from the graph perspective
Why causal transformers naturally concentrate attention on their earliest tokens.