Attention sinks and compression valleys in LLMs are two sides of the same coin

Queipo-de-Llano, E; Arroyo, A; Barbero, F; Dong, X; Bronstein, M; LeCun, Y; Shwartz-Ziv, R

AI Collection

Conference item : Poster

Attention sinks and compression valleys in LLMs are two sides of the same coin

Abstract:: Attention sinks and compression valleys have attracted significant attention as two puzzling phenomena in large language models, but have been studied in isolation. In this work, we present a surprising connection between attention sinks and compression valleys, tracing both to the formation of massive activations in the residual stream. We prove theoretically that massive activations necessarily produce representational compression and establish bounds on the resulting entropy reduction. Through experiments across several models (410M--120B parameters), we confirm that when the beginning-of-sequence token develops extreme activation norms in the middle layers, both compression valleys and attention sinks emerge simultaneously. Targeted ablation validates our theoretical predictions. This unified view motivates us to propose the Mix-Compress-Refine theory of information flow, as an attempt to explain how LLMs organize their computation in depth by controlling attention and representational compression via massive activations. Specifically, we posit that Transformer-based LLMs process tokens in three distinct phases: (1) broad mixing in the early layers, (2) compressed computation with limited mixing in the middle layers, and (3) selective refinement in the late layers. Our framework helps explain why embedding tasks perform best at intermediate layers, whereas generation tasks benefit from full-depth processing, clarifying differences in task-dependent representations.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Queipo-de-Llano, E., Arroyo, A., Barbero, F., Dong, X., Bronstein, M., LeCun, Y., & Shwartz-Ziv, R. (2026). Attention sinks and compression valleys in LLMs are two sides of the same coin. 14th International Conference on Learning Representations (ICLR 2026).

MLA Style

Queipo-de-Llano, E, et al. “Attention Sinks and Compression Valleys in LLMs Are Two Sides of the Same Coin.” 14th International Conference on Learning Representations (ICLR 2026), 2026.

Chicago Style

Queipo-de-Llano, E, A Arroyo, F Barbero, X Dong, M Bronstein, Y LeCun, and R Shwartz-Ziv. 2026. “Attention Sinks and Compression Valleys in LLMs Are Two Sides of the Same Coin.” In 14th International Conference on Learning Representations (ICLR 2026).
Print

Access Document

Files:: Queipo-de-Llano_et_al_2026_Attention_sinks_and.pdf

(Preview, Version of record, pdf, 2.6MB, Terms of use)

Publication website:: https://openreview.net/forum?id=c5TFhCJ6fs

Authors

+ Queipo-de-Llano, E More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Arroyo, A More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Barbero, F More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Dong, X More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Oxford college:: Lady Margaret Hall
Role:: Author
ORCID:: 0000-0002-1143-9786

+ Bronstein, M More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

More authors...

Host title:: Proceedings of the 14th International Conference on Learning Representations (ICLR 2026)
Article number:: 23824
Publication date:: 2026-01-26
Acceptance date:: 2026-01-26
Event title:: 14th International Conference on Learning Representations (ICLR 2026)
Event location:: Rio de Janeiro, Brazil
Event website:: https://iclr.cc/Conferences/2026
Event start date:: 2026-04-23
Event end date:: 2026-04-27

Language:: English
Keywords:: deep trasformer-based LLMs

attention sinks

compression valleys
Subtype:: Poster
Pubs id:: 2426916
Local pid:: pubs:2426916
Deposit date:: 2026-05-30
ARK identifier:: ark:/29072/ora_85d87e043fca4860a1e715adef016701

Terms of use

Copyright holder:: Queipo-de-Llano et al
Rights statement:: ©2026 The Authors. This paper is an open access article distributed under the terms of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item : Poster

Attention sinks and compression valleys in LLMs are two sides of the same coin

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item : Poster

Attention sinks and compression valleys in LLMs are two sides of the same coin

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions