Conference item
When do prompting and prefix-tuning work? a theory of capabilities and limitations
- Abstract:
- Context-based fine-tuning methods, including prompting, in-context learning, soft prompting (also known as prompt tuning), and prefix-tuning, have gained popularity due to their ability to often match the performance of full fine-tuning with a fraction of the parameters. Despite their empirical successes, there is little theoretical understanding of how these techniques influence the internal computation of the model and their expressiveness limitations. We show that despite the continuous embedding space being more expressive than the discrete token space, soft-prompting and prefix-tuning are strictly less expressive than full fine-tuning, even with the same number of learnable parameters. Concretely, context-based fine-tuning cannot change the relative attention pattern over the content and can only bias the outputs of an attention layer in a fixed direction. This suggests that while techniques like prompting, in-context learning, soft prompting, and prefixtuning can effectively elicit skills present in the pretrained model, they cannot learn novel tasks that require new attention patterns.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 1.3MB, Terms of use)
-
- Publication website:
- https://openreview.net/forum?id=JewzobRhay
Authors
- Publisher:
- OpenReview
- Host title:
- Proceedings of the 12th International Conference on Learning Representations (ICLR 2024)
- Publication date:
- 2024-01-16
- Acceptance date:
- 2024-01-15
- Event title:
- 12th International Conference on Learning Representations (ICLR 2024)
- Event location:
- Vienna, Austria
- Event website:
- https://iclr.cc/Conferences/2024
- Event start date:
- 2024-05-07
- Event end date:
- 2024-05-11
- Language:
-
English
- Keywords:
- Pubs id:
-
1838323
- Local pid:
-
pubs:1838323
- Deposit date:
-
2024-03-18
- ARK identifier:
Terms of use
- Copyright date:
- 2024
- Notes:
- This paper was presented at the International Conference on Learning Representations (ICLR 2024), 7th - 11th May 2024, Vienna, Austria.
If you are the owner of this record, you can report an update to it here: Report update to this record