Multimodal pragmatic jailbreak on text-to-image models

Liu, T; Lai, Z; Wang, J; Zhang, G; Chen, S; Torr, P; Demberg, V; Tresp, V; Gu, J

AI Collection

Conference item

Multimodal pragmatic jailbreak on text-to-image models

Abstract:: Diffusion models have recently achieved remarkable advancements in terms of image quality and fidelity to textual prompts. Concurrently, the safety of such generative models has become an area of growing concern. This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content. To systematically explore this phenomenon, we propose a dataset to evaluate the current diffusion-based text-toimage (T2I) models under such jailbreak. We benchmark nine representative T2I models, including two closed-source commercial models. Experimental results reveal a concerning tendency to produce unsafe content: all tested models suffer from such type of jailbreak, with rates of unsafe generation ranging from around 10% to 70% where DALL·E 3 demonstrates almost the highest unsafety. In real-world scenarios, various filters such as keyword blocklists, customized prompt filters, and NSFW image filters, are commonly employed to mitigate these risks. We evaluate the effectiveness of such filters against our jailbreak and found that, while these filters may be effective for single modality detection, they fail to work against our jailbreak. We also investigate the underlying reason for such jailbreaks, from the perspective of text rendering capability and training data. Our work provides a foundation for further development towards more secure and reliable T2I models.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Liu, T., Lai, Z., Wang, J., Zhang, G., Chen, S., Torr, P., Demberg, V., Tresp, V., & Gu, J. (2025). Multimodal pragmatic jailbreak on text-to-image models. 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 1, 4681–4720.

MLA Style

Liu, T, et al. “Multimodal Pragmatic Jailbreak on Text-to-Image Models.” 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), vol. 1, 2025, pp. 4681–720.

Chicago Style

Liu, T, Z Lai, J Wang, G Zhang, S Chen, P Torr, V Demberg, V Tresp, and J Gu. 2025. “Multimodal Pragmatic Jailbreak on Text-to-Image Models.” In 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 1:4681–4720. Association for Computational Linguistics.
Print

Access Document

Files:: Liu_et_al_2025_Multimodal_pragmatic_jailbreak.pdf

(Preview, Version of record, pdf, 48.2MB, Terms of use)

Publisher copy:: 10.18653/v1/2025.acl-long.234

Authors

+ Liu, T More by this author

Role:: Author

+ Lai, Z More by this author

Role:: Author

+ Wang, J More by this author

Role:: Author

+ Zhang, G More by this author

Role:: Author

+ Chen, S More by this author

Role:: Author

More authors...

Publisher:: Association for Computational Linguistics
Host title:: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Volume:: 1
Pages:: 4681–4720
Publication date:: 2025-07-01
Acceptance date:: 2025-05-15
Event title:: 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)
Event location:: Vienna, Austria
Event website:: https://2025.aclweb.org/
Event start date:: 2025-07-27
Event end date:: 2025-08-01
DOI:: 10.18653/v1/2025.acl-long.234

Language:: English
Pubs id:: 2263057
Local pid:: pubs:2263057
Deposit date:: 2025-08-01
ARK identifier:: ark:/29072/ora_7299223b782e451fb6f49af9c03fcdad

Terms of use

Copyright holder:: Association for Computational Linguistics
Notes:: This paper was presented at the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), 27th July - 1st August 2025, Vienna, Austria.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Multimodal pragmatic jailbreak on text-to-image models

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Multimodal pragmatic jailbreak on text-to-image models

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions