Understanding reasoning in thinking language models via steering vectors

Venhoff, C; Arcuschin, I; Torr, P; Conmy, A; Nanda, N

AI Collection

Conference item

Understanding reasoning in thinking language models via steering vectors

Abstract:: Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, the underlying mechanisms enabling their reasoning capabilities remain poorly understood. This work studies the particular reasoning processes of thinking LLMs by analyzing DeepSeek-R1-Distill models and comparing them with non-thinking models like GPT-4o. Through a systematic experiment on 300 tasks across 10 diverse categories, we identify key behavioral patterns that characterize thinking models, including expressing their own uncertainty, coming up with examples for validating their working hypothesis, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model's activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model's reasoning process, such as its tendency to backtrack or express uncertainty. Our findings not only advance the understanding of how thinking models reason but also offer practical tools for steering their reasoning processes in a controlled and interpretable manner. We validate our approach using two DeepSeek-R1-Distill models, showing consistent results across different model architectures.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Venhoff, C., Arcuschin, I., Torr, P., Conmy, A., & Nanda, N. (2025). Understanding reasoning in thinking language models via steering vectors. 13th International Conference on Learning Representations (ICLR 2025).

MLA Style

Venhoff, C, et al. “Understanding Reasoning in Thinking Language Models via Steering Vectors.” 13th International Conference on Learning Representations (ICLR 2025), 2025.

Chicago Style

Venhoff, C, I Arcuschin, P Torr, A Conmy, and N Nanda. 2025. “Understanding Reasoning in Thinking Language Models via Steering Vectors.” In 13th International Conference on Learning Representations (ICLR 2025). OpenReview.
Print

Access Document

Files:: Venhoff_et_al_2025_Understanding_reasoning_in.pdf

(Preview, Version of record, pdf, 876.8KB, Terms of use)

Publication website:: https://openreview.net/forum?id=OwhVWNOBcz

Authors

+ Venhoff, C More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author

+ Arcuschin, I More by this author

Role:: Author

+ Torr, P More by this author

Institution:: University of Oxford
Division:: MPLS
Department:: Engineering Science
Role:: Author
ORCID:: 0009-0006-0259-5732

+ Conmy, A More by this author

Role:: Author

+ Nanda, N More by this author

Role:: Author

Publisher:: OpenReview
Host title:: Proceedings of the 13th International Conference on Learning Representations (ICLR 2025)
Article number:: 170
Publication date:: 2025-03-05
Acceptance date:: 2025-01-22
Event title:: 13th International Conference on Learning Representations (ICLR 2025)
Event location:: Singapore
Event website:: https://iclr.cc/Conferences/2025
Event start date:: 2025-04-24
Event end date:: 2025-04-28

Language:: English
Pubs id:: 2433106
Local pid:: pubs:2433106
Deposit date:: 2026-06-12
ARK identifier:: ark:/29072/ora_8a72942ded874a45b4c4f2053c746230

Terms of use

Copyright holder:: Venhoff et al.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Understanding reasoning in thinking language models via steering vectors

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Understanding reasoning in thinking language models via steering vectors

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions