Conference item
Understanding reasoning in thinking language models via steering vectors
- Abstract:
-
Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, the underlying mechanisms enabling their reasoning capabilities remain poorly understood. This work studies the particular reasoning processes of thinking LLMs by analyzing DeepSeek-R1-Distill models and comparing them with non-thinking models like GPT-4o. Through a systematic experiment on 300 tasks across 10 diverse categories, we identify key behavioral patterns that characterize thinking models, including expressing their own uncertainty, coming up with examples for validating their working hypothesis, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model's activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model's reasoning process, such as its tendency to backtrack or express uncertainty. Our findings not only advance the understanding of how thinking models reason but also offer practical tools for steering their reasoning processes in a controlled and interpretable manner. We validate our approach using two DeepSeek-R1-Distill models, showing consistent results across different model architectures.
- Publication status:
- Published
- Peer review status:
- Peer reviewed
Actions
Access Document
- Files:
-
-
(Preview, Version of record, pdf, 876.8KB, Terms of use)
-
- Publication website:
- https://openreview.net/forum?id=OwhVWNOBcz
Authors
- Publisher:
- OpenReview
- Host title:
- Proceedings of the 13th International Conference on Learning Representations (ICLR 2025)
- Article number:
- 170
- Publication date:
- 2025-03-05
- Acceptance date:
- 2025-01-22
- Event title:
- 13th International Conference on Learning Representations (ICLR 2025)
- Event location:
- Singapore
- Event website:
- https://iclr.cc/Conferences/2025
- Event start date:
- 2025-04-24
- Event end date:
- 2025-04-28
- Language:
-
English
- Pubs id:
-
2433106
- Local pid:
-
pubs:2433106
- Deposit date:
-
2026-06-12
- ARK identifier:
Terms of use
- Copyright holder:
- Venhoff et al.
- Copyright date:
- 2025
- Rights statement:
- Copyright © 2025 The Author(s). This is an open access article published under CC BY 4.0.
- Licence:
- CC Attribution (CC BY)
If you are the owner of this record, you can report an update to it here: Report update to this record