Red teaming GPT-4V: are GPT-4V safe against uni/multi-modal jailbreak attacks?

Chen, S; Han, Z; He, B; Ding, Z; Yu, W; Torr, P; Tresp, V; Gu, J

Conference item

Red teaming GPT-4V: are GPT-4V safe against uni/multi-modal jailbreak attacks?

Abstract:: Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods. The dataset and code can be found here.

Publication status:: Accepted

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Cite

Cite this record

APA Style

Chen, S., Han, Z., He, B., Ding, Z., Yu, W., Torr, P., Tresp, V., & Gu, J. (2024). Red teaming GPT-4V: are GPT-4V safe against uni/multi-modal jailbreak attacks? Proceedings of the 12th International Conference on Learning Representations (ICLR 2024).

MLA Style

Chen, S., et al. “Red Teaming GPT-4V: Are GPT-4V Safe against Uni/Multi-Modal Jailbreak Attacks?” Proceedings of the 12th International Conference on Learning Representations (ICLR 2024), OpenReview, 2024.

Chicago Style

Chen, S, Z Han, B He, Z Ding, W Yu, P Torr, V Tresp, and J Gu. 2024. “Red Teaming GPT-4V: Are GPT-4V Safe against Uni/Multi-Modal Jailbreak Attacks?” In Proceedings of the 12th International Conference on Learning Representations (ICLR 2024). OpenReview.
Share
Print

Access Document

Files:: Chen_et_al_2024_Red_teaming_GPT-4V.pdf

(Preview, Version of record, pdf, 181.4KB, Terms of use)

Publication website:: https://openreview.net/forum?id=WubY1GeLij

Authors

+ Chen, S More by this author

Role:: Author

+ Han, Z More by this author

Role:: Author

+ He, B More by this author

Role:: Author

+ Ding, Z More by this author

Role:: Author

+ Yu, W More by this author

Role:: Author

More authors...

Publisher:: OpenReview
Host title:: Proceedings of the 12th International Conference on Learning Representations (ICLR 2024)
Publication date:: 2024-03-04
Acceptance date:: 2024-03-03
Event title:: 12th International Conference on Learning Representations (ICLR 2024)
Event location:: Vienna, Austria
Event website:: https://iclr.cc/
Event start date:: 2024-05-07
Event end date:: 2024-05-11

Language:: English
Keywords:: jailbreak

LLMs

AI safety

multimodal LLMs
Pubs id:: 2007697
Local pid:: pubs:2007697
Deposit date:: 2024-06-11

Terms of use

Rights statement:: This paper has been made open access via Creative Commons licensing (http://creativecommons.org/licenses/by/4.0/)
Notes:: This paper was presented at the 12th International Conference on Learning Representations (ICLR 2024), 7th-11th May 2024, Vienna, Austria.

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Conference item

Red teaming GPT-4V: are GPT-4V safe against uni/multi-modal jailbreak attacks?

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Conference item

Red teaming GPT-4V: are GPT-4V safe against uni/multi-modal jailbreak attacks?

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions