Foundational challenges in assuring alignment and safety of large language models

Anwar, U; Saparov, A; Rando, J; Paleka, D; Turpin, M; Hase, P; Lubana, ES; Jenner, E; Casper, S; Sourbut, O; Edelman, BL; Zhang, Z; Günther, M; Korinek, A; Hernandez-Orallo, J; Hammond, L; Bigelow, E; Pan, A; Langosco, L; Korbak, T; Zhang, H; Zhong, R; Héigeartaigh, S; Recchia, G; Corsi, G; Chan, A; Anderljung, M; Edwards, L; Petrov, A; Schroeder de Witt, C; Motwani, SR; Bengio, Y; Chen, D; Torr, PHS; Albanie, S; Maharaj, T; Foerster, J; Tramer, F; He, H; Kasirzade, A; Choi, Y; Krueger, D

AI Collection

Journal article

Foundational challenges in assuring alignment and safety of large language models

Abstract:: This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose 200+ concrete research questions.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Anwar, U., Saparov, A., Rando, J., Paleka, D., Turpin, M., Hase, P., Lubana, E. S., Jenner, E., Casper, S., Sourbut, O., Edelman, B. L., Zhang, Z., Günther, M., Korinek, A., Hernandez-Orallo, J., Hammond, L., Bigelow, E., Pan, A., Langosco, L., … Krueger, D. (2024). Foundational challenges in assuring alignment and safety of large language models. Transactions on Machine Learning Research, 2024.

MLA Style

Anwar, U, et al. “Foundational Challenges in Assuring Alignment and Safety of Large Language Models.” Transactions on Machine Learning Research, vol. 2024, 2024.

Chicago Style

Anwar, U, A Saparov, J Rando, et al. 2024. “Foundational Challenges in Assuring Alignment and Safety of Large Language Models.” Transactions on Machine Learning Research 2024.
Print

Access Document

Files:: Anwar_et_al_2025_Foundational_challenges_in.pdf

(Preview, Version of record, pdf, 1.7MB, Terms of use)

Publication website:: https://openreview.net/forum?id=oVTkOs8Pka

Authors

+ Anwar, U More by this author

Role:: Author

+ Saparov, A More by this author

Role:: Author

+ Rando, J More by this author

Role:: Author

+ Paleka, D More by this author

Role:: Author

+ Turpin, M More by this author

Role:: Author

More authors...

Publisher:: Journal of Machine Learning Research
Journal:: Transactions on Machine Learning Research More from this journal
Volume:: 2024
Publication date:: 2024-09-17
Acceptance date:: 2024-09-02
EISSN:: 2835-8856

Language:: English
Pubs id:: 2102156
Local pid:: pubs:2102156
Deposit date:: 2025-04-09
ARK identifier:: ark:/29072/ora_538a634c21204406b86da6e3a9bd4338

Terms of use

Copyright holder:: Anwar et al
Rights statement:: © 2025 The Authors. This paper is an open access article distributed under the terms of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/)

Licence:: CC Attribution (CC BY)

Views and Downloads

About views and downloads

If you are the owner of this record, you can report an update to it here: Report update to this record

Journal article

Foundational challenges in assuring alignment and safety of large language models

Actions

Access Document

Authors

Terms of use

Views and Downloads

Altmetrics

Dimensions

Journal article

Foundational challenges in assuring alignment and safety of large language models

Actions

Access Document

Authors

Bibliographic Details

Item Description

Terms of use

Metrics

Views and Downloads

Altmetrics

Dimensions