Conference item icon

Conference item

Pause tokens strictly increase the expressivity of constant-depth transformers

Abstract:
Pause tokens, simple filler symbols such as “...”, consistently improve Transformer performance on both language and mathematical tasks, yet their theoretical effect remains unexplained. We provide the first formal separation result, proving that adding pause tokens to constant-depth, logarithmic-width Transformers strictly increases their computational expressivity. With bounded-precision activations, Transformers without pause tokens compute only a strict subset of AC0 functions, while adding a polynomial number of pause tokens allows them to express the entire class. For logarithmic-precision Transformers, we show that adding pause tokens achieves expressivity equivalent to TC0 , matching known upper bounds. Empirically, we demonstrate that two-layer causally masked Transformers can learn parity when supplied with pause tokens, a function that they appear unable to learn without them. Our results provide a rigorous theoretical explanation for prior empirical findings, clarify how pause tokens interact with width, depth, and numeric precision, and position them as a distinct mechanism, complementary to chain-of-thought prompting, for enhancing Transformer reasoning.
Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publication website:
https://neurips.cc/virtual/2025/loc/san-diego/poster/116941

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author
ORCID:
0000-0002-2300-4819


More from this funder
Funder identifier:
https://ror.org/0439y7842
Funding agency for:
London, C
Grant:
EP/W524311/1


Publisher:
NeurIPS
Article number:
116941
Publication date:
2025-12-05
Acceptance date:
2025-09-17
Event title:
39th Conference on Neural Information Processing Systems (NeurIPS 2025)
Event location:
San Diego, CA, USA
Event website:
https://neurips.cc/Conferences/2025
Event start date:
2025-12-02
Event end date:
2025-12-07


Language:
English
Pubs id:
2320325
Local pid:
pubs:2320325
Deposit date:
2025-11-09
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP