Preprint
Higher-order transformer derivative estimates for explicit pathwise learning guarantees
- Abstract:
-
An inherent challenge in computing fully-explicit generalization bounds for transformers involves obtaining covering number estimates for the given transformer class T. Crude estimates rely on a uniform upper bound on the local-Lipschitz constants of transformers in T, and finer estimates require an analysis of their higher-order partial derivatives. Unfortunately, these precise higher-order derivative estimates for (realistic) transformer models are not currently available in the literature as they are combinatorially delicate due to the intricate compositional structure of transformer blocks.
This paper fills this gap by precisely estimating all the higher-order derivatives of all orders for the transformer model. We consider realistic transformers with multiple (non-linearized) attention heads per block and layer normalization. We obtain fully-explicit estimates of all constants in terms of the number of attention heads, the depth and width of each transformer block, and the number of normalization layers. Further, we explicitly analyze the impact of various standard activation function choices (e.g. SWISH and GeLU). As an application, we obtain explicit pathwise generalization bounds for transformers on a single trajectory of an exponentially-ergodic Markov process valid at a fixed future time horizon. We conclude that real-world transformers can learn from N (non-i.i.d.) samples of a single Markov process’s trajectory at a rate of O (polylog(N)/√ N ) .
- Publication status:
- Published
- Peer review status:
- Not peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Pre-print, pdf, 1.2MB, Terms of use)
-
- Preprint server copy:
- 10.48550/arxiv.2405.16563
Authors
+ Natural Sciences and Engineering Research Council of Canada
More from this funder
- Funder identifier:
- https://ror.org/01h531d29
- Funding agency for:
- Kratsios, A
- Saqur, R
- Grant:
- RGPIN-2023-04482
- Preprint server:
- arXiv
- Publication date:
- 2024-05-26
- DOI:
- Language:
-
English
- Pubs id:
-
2282237
- UUID:
-
uuid_324ed2aa-0221-4bc3-b898-ffda49822b62
- Local pid:
-
pubs:2282237
- Source identifiers:
-
W4399115775
- Deposit date:
-
2026-01-23
- ARK identifier:
Terms of use
- Copyright holder:
- Limmer et al.
- Copyright date:
- 2024
- Rights statement:
- © The Author(s) 2024. This work is made available under the Creative Commons Attribution 4.0 License.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record