Conference item icon

Conference item

Simplicity bias in transformers and their ability to learn sparse boolean functions

Abstract:
Despite the widespread success of Transformers on NLP tasks, recent works have found that they struggle to model several formal languages when compared to recurrent models. This raises the question of why Transformers perform well in practice and whether they have any properties that enable them to generalize better than recurrent models. In this work, we conduct an extensive empirical study on Boolean functions to demonstrate the following: (i) Random Transformers are relatively more biased towards functions of low sensitivity. (ii) When trained on Boolean functions, both Transformers and LSTMs prioritize learning functions of low sensitivity, with Transformers ultimately converging to functions of lower sensitivity. (iii) On sparse Boolean functions which have low sensitivity, we find that Transformers generalize near perfectly even in the presence of noisy labels whereas LSTMs overfit and achieve poor generalization accuracy. Overall, our results provide strong quantifiable evidence that suggests differences in the inductive biases of Transformers and recurrent models which may help explain Transformer’s effective generalization performance despite relatively limited expressiveness.
Publication status:
Accepted
Peer review status:
Peer reviewed

Actions


Access Document


Files:
Publisher copy:
10.18653/v1/2023.acl-long.317

Authors


More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author
ORCID:
0000-0002-2300-4819
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Computer Science
Role:
Author
ORCID:
0000-0003-4558-2457


Publisher:
ACL Anthology
Publication date:
2023-08-05
Acceptance date:
2023-05-01
Event title:
61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)
Event location:
Toronto, Canada
Event website:
https://2023.aclweb.org/
Event start date:
2023-07-09
Event end date:
2023-07-14
DOI:


Language:
English
Keywords:
Pubs id:
1488975
Local pid:
pubs:1488975
Deposit date:
2023-06-30

Terms of use



Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP