Separations in the representational capabilities of transformers and recurrent architectures

Bhattamishra, S; Hahn, M; Blunsom, P; Kanade, V

AI Collection

Conference item

Separations in the representational capabilities of transformers and recurrent architectures

Abstract:: Transformer architectures have been widely adopted in foundation models. Due to their high inference costs, there is renewed interest in exploring the potential of efficient recurrent architectures (RNNs). In this paper, we analyze the differences in the representational capabilities of Transformers and RNNs across several tasks of practical relevance, including index lookup, nearest neighbor, recognizing bounded Dyck languages, and string equality. For the tasks considered, our results show separations based on the size of the model required for different architectures. For example, we show that a one-layer Transformer of logarithmic width can perform index lookup, whereas an RNN requires a hidden state of linear size. Conversely, while constant-size RNNs can recognize bounded Dyck languages, we show that one-layer Transformers require a linear size for this task. Furthermore, we show that two-layer Transformers of logarithmic size can perform decision tasks such as string equality or disjointness, whereas both one-layer Transformers and recurrent models require linear size for these tasks. We also show that a log-size two-layer Transformer can implement the nearest neighbor algorithm in its forward pass; on the other hand recurrent models require linear size. Our constructions are based on the existence of N nearly orthogonal vectors in O(logN) dimensional space and our lower bounds are based on reductions from communication complexity problems. We supplement our theoretical results with experiments that highlight the differences in the performance of these architectures on practical-size sequences.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Bhattamishra, S., Hahn, M., Blunsom, P., & Kanade, V. (2025). Separations in the representational capabilities of transformers and recurrent architectures. 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), 36002–36045.

MLA Style

Bhattamishra, S, et al. “Separations in the Representational Capabilities of Transformers and Recurrent Architectures.” 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), 2025, pp. 36002–45.

Chicago Style

Bhattamishra, S, M Hahn, P Blunsom, and V Kanade. 2025. “Separations in the Representational Capabilities of Transformers and Recurrent Architectures.” In 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024), 36002–45. Curran Associates.
Print

Access Document

Files:: Bhattamishra_et_al_2024_Separations_in_the.pdf

(Preview, Version of record, pdf, 2.9MB, Terms of use)

Publisher copy:: 10.52202/079017-1135

Authors

+ Bhattamishra, S More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

+ Hahn, M More by this author

Role:: Author

+ Blunsom, P More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author

+ Kanade, V More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Computer Science
Role:: Author
ORCID:: 0000-0002-2300-4819

Publisher:: Curran Associates
Host title:: Advances in Neural Information Processing Systems 37
Pages:: 36002-36045
Publication date:: 2025-02-01
Acceptance date:: 2024-09-24
Event title:: 38th Annual Conference on Neural Information Processing Systems (NeurIPS 2024)
Event location:: Vancouver, Canada
Event website:: https://neurips.cc/Conferences/2024
Event start date:: 2024-12-10
Event end date:: 2024-12-15
DOI:: 10.52202/079017-1135
EISBN:: 9798331314385

Language:: English
Pubs id:: 2092622
UUID:: uuid_b09a2b5e-2d1e-472d-b72f-f4d8e52dee6a
Local pid:: pubs:2092622
Deposit date:: 2025-11-09
ARK identifier:: ark:/29072/ora_b09a2b5e2d1e472db72ff4d8e52dee6a

Terms of use

Copyright holder:: Bhattamishra et al. and NIPS

Licence:: Terms and Conditions of Use for Oxford University Research Archive

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Separations in the representational capabilities of transformers and recurrent architectures

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Separations in the representational capabilities of transformers and recurrent architectures

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions