Conference item icon

Conference item

Can editing LLMs inject harm?

Abstract:

Large Language Models (LLMs) have emerged as a new information channel. Meanwhile, one critical but underexplored question is: Is it possible to bypass the safety alignment and inject harmful information into LLMs stealthily? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation with a newly constructed dataset EditAttack. Specifically, we focus on two typical safety risks of Editing Attack including Misinformation Injection and Bias Injection. For the first risk, we find that editing attacks can inject both commonsense and long-tail misinformation into LLMs, and the effectiveness for the former one is particularly high. For the second risk, we discover that not only can biased sentences be injected into LLMs with high effectiveness, but also one single biased sentence injection can degrade the overall fairness. Then, we further illustrate the high stealthiness of editing attacks. Our discoveries demonstrate the emerging misuse risks of knowledge editing techniques on compromising the safety alignment of LLMs and the feasibility of disseminating misinformation or bias with LLMs as new channels.

Publication status:
Published
Peer review status:
Peer reviewed

Actions

Access Document

Files:
Publisher copy:
10.1609/aaai.v40i36.40269

Authors


Publisher:
Association for the Advancement of Artificial Intelligence
Host title:
Proceedings of the 40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)
Volume:
40
Issue:
36
Pages:
30192-30200
Publication date:
2026-03-14
Acceptance date:
2025-11-07
Event title:
40th Annual AAAI Conference on Artificial Intelligence (AAAI 2026)
Event location:
Singapore
Event website:
https://aaai.org/conference/aaai/aaai-26/
Event start date:
2026-01-20
Event end date:
2026-01-27
DOI:
EISSN:
2374-3468
ISSN:
2159-5399
ISBN-10:
1577359062
ISBN-13:
9781577359067


Language:
English
Pubs id:
2361895
Local pid:
pubs:2361895
Deposit date:
2026-01-19
ARK identifier:

Terms of use


Views and Downloads






If you are the owner of this record, you can report an update to it here: Report update to this record

TO TOP