Mindbench.ai: an actionable platform to evaluate the profile and performance of large language models in a mental healthcare context

Dwyer, B; Flathers, M; Sano, A; Dempsey, A; Cipriani, A; Gazi, AH; Hill, B; Gorban, C; Rodriguez, CI; Stromeyer, C; King, D; Rozenblit, E; Strudwick, G; Linardon, J; Cheong, J; Firth, J; Herpertz, J; Schwarz, J; Truong, K; Emerson, M; Paulus, MP; Patriquin, M; Hua, Y; Choudhary, S

Journal article

Mindbench.ai: an actionable platform to evaluate the profile and performance of large language models in a mental healthcare context

Abstract:: Individuals are increasingly utilizing large language model (LLM)-based tools for mental health guidance and crisis support in place of human experts. While AI technology has great potential to improve health outcomes, insufficient empirical evidence exists to suggest that AI technology can be deployed as a clinical replacement; thus, there is an urgent need to assess and regulate such tools. Regulatory efforts have been made and multiple evaluation frameworks have been proposed, however,field-wide assessment metrics have yet to be formally integrated. In this paper, we introduce a comprehensive online platform that aggregates evaluation approaches and serves as a dynamic online resource to simplify LLM and LLM-based tool assessment: MindBench.ai. At its core, MindBench.ai is designed to provide easily accessible/interpretable information for diverse stakeholders (patients, clinicians, developers, regulators, etc.). To create MindBench.ai, we built off our work developing MINDapps.org to support informed decision-making around smartphone app use for mental health, and expanded the technical MINDapps.org framework to encompass novel large language model (LLM) functionalities through benchmarking approaches. The MindBench.ai platform is designed as a partnership with the National Alliance on Mental Illness (NAMI) to provide assessment tools that systematically evaluate LLMs and LLM-based tools with objective and transparent criteria from a healthcare standpoint, assessing both profile (i.e. technical features, privacy protections, and conversational style) and performance characteristics (i.e. clinical reasoning skills). With infrastructure designed to scale through community and expert contributions, along with adapting to technological advances, this platform establishes a critical foundation for the dynamic, empirical evaluation of LLM-based mental health tools—transforming assessment into a living, continuously evolving resource rather than a static snapshot.

Publication status:: Published

Peer review status:: Peer reviewed

Actions

Email

Email this record

Send the bibliographic details of this record to your email address.

Your Email
Please enter the email address that the record information will be sent to.

-
Your message (optional)
Please add any additional information to be included within the email.
Share
Cite

Cite this record

APA Style

Dwyer, B., Flathers, M., Sano, A., Dempsey, A., Cipriani, A., Gazi, A. H., Hill, B., Gorban, C., Rodriguez, C. I., Stromeyer, C., King, D., Rozenblit, E., Strudwick, G., Linardon, J., Cheong, J., Firth, J., Herpertz, J., Schwarz, J., Truong, K., … Choudhary, S. (2025). Mindbench.ai: an actionable platform to evaluate the profile and performance of large language models in a mental healthcare context. NPP—Digital Psychiatry and Neuroscience, 3(1).

MLA Style

Dwyer, B, et al. “Mindbench.ai: an Actionable Platform to Evaluate the Profile and Performance of Large Language Models in a Mental Healthcare Context.” NPP—Digital Psychiatry and Neuroscience, vol. 3, no. 1, 2025.

Chicago Style

Dwyer, B, M Flathers, A Sano, et al. 2025. “Mindbench.ai: an Actionable Platform to Evaluate the Profile and Performance of Large Language Models in a Mental Healthcare Context.” NPP—Digital Psychiatry and Neuroscience 3 (1).
Print