Conference item icon

Conference item

Understanding reasoning in thinking language models via steering vectors

Abstract:

Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, the underlying mechanisms enabling their reasoning capabilities remain poorly understood. This work studies the particular reasoning processes of thinking LLMs by analyzing DeepSeek-R1-Distill models and comparing them with non-thinking models like GPT-4o. Through a systematic experiment on 300 tasks across 10 diverse categories, we identify key behavioral patterns that characterize thinking models, including expressing their own uncertainty, coming up with examples for validating their working hypothesis, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model's activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model's reasoning process, such as its tendency to backtrack or express uncertainty. Our findings not only advance the understanding of how thinking models reason but also offer practical tools for steering their reasoning processes in a controlled and interpretable manner. We validate our approach using two DeepSeek-R1-Distill models, showing consistent results across different model architectures.

Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publication website:
https://openreview.net/forum?id=OwhVWNOBcz

Authors

More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
More by this author
Institution:
University of Oxford
Division:
MPLS
Department:
Engineering Science
Role:
Author
ORCID:
0009-0006-0259-5732


Publisher:
OpenReview
Host title:
Proceedings of the 13th International Conference on Learning Representations (ICLR 2025)
Article number:
170
Publication date:
2025-03-05
Acceptance date:
2025-01-22
Event title:
13th International Conference on Learning Representations (ICLR 2025)
Event location:
Singapore
Event website:
https://iclr.cc/Conferences/2025
Event start date:
2025-04-24
Event end date:
2025-04-28


Language:
English
Pubs id:
2433106
Local pid:
pubs:2433106
Deposit date:
2026-06-12
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP