Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

doi:10.1038/s44184-024-00056-z

. 2024 Apr 2;3(1):12.

doi: 10.1038/s44184-024-00056-z.

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

Elizabeth C Stade^{1

2

3}, Shannon Wiltsey Stirman^{4

5}, Lyle H Ungar⁶, Cody L Boland⁴, H Andrew Schwartz⁷, David B Yaden⁸, João Sedoc⁹, Robert J DeRubeis¹⁰, Robb Willer¹¹, Johannes C Eichstaedt¹²

Affiliations

¹ Dissemination and Training Division, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, USA. betsystade@stanford.edu.
² Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA. betsystade@stanford.edu.
³ Institute for Human-Centered Artificial Intelligence & Department of Psychology, Stanford University, Stanford, CA, USA. betsystade@stanford.edu.
⁴ Dissemination and Training Division, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, USA.
⁵ Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
⁶ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.
⁷ Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
⁸ Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
⁹ Department of Technology, Operations, and Statistics, New York University, New York, NY, USA.
¹⁰ Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.
¹¹ Department of Sociology, Stanford University, Stanford, CA, USA.
¹² Institute for Human-Centered Artificial Intelligence & Department of Psychology, Stanford University, Stanford, CA, USA. johannes.stanford@gmail.com.

PMID: 38609507
PMCID: PMC10987499
DOI: 10.1038/s44184-024-00056-z

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

Elizabeth C Stade et al. Npj Ment Health Res. 2024.

. 2024 Apr 2;3(1):12.

doi: 10.1038/s44184-024-00056-z.

Authors

Affiliations

¹ Dissemination and Training Division, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, USA. betsystade@stanford.edu.
² Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA. betsystade@stanford.edu.
³ Institute for Human-Centered Artificial Intelligence & Department of Psychology, Stanford University, Stanford, CA, USA. betsystade@stanford.edu.
⁴ Dissemination and Training Division, National Center for PTSD, VA Palo Alto Health Care System, Palo Alto, CA, USA.
⁵ Department of Psychiatry and Behavioral Sciences, Stanford University, Stanford, CA, USA.
⁶ Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA, USA.
⁷ Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.
⁸ Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
⁹ Department of Technology, Operations, and Statistics, New York University, New York, NY, USA.
¹⁰ Department of Psychology, University of Pennsylvania, Philadelphia, PA, USA.
¹¹ Department of Sociology, Stanford University, Stanford, CA, USA.
¹² Institute for Human-Centered Artificial Intelligence & Department of Psychology, Stanford University, Stanford, CA, USA. johannes.stanford@gmail.com.

PMID: 38609507
PMCID: PMC10987499
DOI: 10.1038/s44184-024-00056-z

Abstract

Large language models (LLMs) such as Open AI's GPT-4 (which power ChatGPT) and Google's Gemini, built on artificial intelligence, hold immense potential to support, augment, or even eventually automate psychotherapy. Enthusiasm about such applications is mounting in the field as well as industry. These developments promise to address insufficient mental healthcare system capacity and scale individual access to personalized treatments. However, clinical psychology is an uncommonly high stakes application domain for AI systems, as responsible and evidence-based therapy requires nuanced expertise. This paper provides a roadmap for the ambitious yet responsible application of clinical LLMs in psychotherapy. First, a technical overview of clinical LLMs is presented. Second, the stages of integration of LLMs into psychotherapy are discussed while highlighting parallels to the development of autonomous vehicle technology. Third, potential applications of LLMs in clinical care, training, and research are discussed, highlighting areas of risk given the complex nature of psychotherapy. Fourth, recommendations for the responsible development and evaluation of clinical LLMs are provided, which include centering clinical science, involving robust interdisciplinary collaboration, and attending to issues like assessment, risk detection, transparency, and bias. Lastly, a vision is outlined for how LLMs might enable a new generation of studies of evidence-based interventions at scale, and how these studies may challenge assumptions about psychotherapy.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests: receiving consultation fees from Jimini Health (E.C.S., L.H.U., H.A.S., and J.C.E.).

Figures

**Fig. 1. Methods for tailoring clinical large language models.**
Figure was designed using image components from Flaticon.com.

**Fig. 2. Example clinical skills of large language models.**
*Note*. Figure was designed using image component from Flaticon.com.

**Fig. 3. Stages of integrating large language models into psychotherapy.**
Figure was designed using image components from Flaticon.com.

See this image and copyright information in PMC

Cited by

"It happened to be the perfect thing": experiences of generative AI chatbots for mental health.
Siddals S, Torous J, Coxon A. Siddals S, et al. Npj Ment Health Res. 2024 Oct 27;3(1):48. doi: 10.1038/s44184-024-00097-4. Npj Ment Health Res. 2024. PMID: 39465310 Free PMC article.
Large Language Models for Mental Health Applications: Systematic Review.
Guo Z, Lai A, Thygesen JH, Farrington J, Keen T, Li K. Guo Z, et al. JMIR Ment Health. 2024 Oct 18;11:e57400. doi: 10.2196/57400. JMIR Ment Health. 2024. PMID: 39423368 Free PMC article.
Describing the Framework for AI Tool Assessment in Mental Health and Applying It to a Generative AI Obsessive-Compulsive Disorder Platform: Tutorial.
Golden A, Aboujaoude E. Golden A, et al. JMIR Form Res. 2024 Oct 18;8:e62963. doi: 10.2196/62963. JMIR Form Res. 2024. PMID: 39423001 Free PMC article. Review.
Assessing the Impact of ChatGPT in Dermatology: A Comprehensive Rapid Review.
Goktas P, Grzybowski A. Goktas P, et al. J Clin Med. 2024 Oct 3;13(19):5909. doi: 10.3390/jcm13195909. J Clin Med. 2024. PMID: 39407969 Free PMC article. Review.
A Novel Cognitive Behavioral Therapy-Based Generative AI Tool (Socrates 2.0) to Facilitate Socratic Dialogue: Protocol for a Mixed Methods Feasibility Study.
Held P, Pridgen SA, Chen Y, Akhtar Z, Amin D, Pohorence S. Held P, et al. JMIR Res Protoc. 2024 Oct 10;13:e58195. doi: 10.2196/58195. JMIR Res Protoc. 2024. PMID: 39388255 Free PMC article.

See all "Cited by" articles

References

1. Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint at http://arxiv.org/abs/2303.12712 (2023).
1. Broderick, R. People are using AI for therapy, whether the tech is ready for it or not. Fast Company (2023).
1. Weizenbaum J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM. 1966;9:36–45. doi: 10.1145/365153.365168. - DOI
1. Bantilan N, Malgaroli M, Ray B, Hull TD. Just in time crisis response: Suicide alert system for telemedicine psychotherapy settings. Psychother. Res. 2021;31:289–299. doi: 10.1080/10503307.2020.1781952. - DOI - PubMed
1. Peretz G, Taylor CB, Ruzek JI, Jefroykin S, Sadeh-Sharvit S. Machine learning model to predict assignment of therapy homework in behavioral treatments: Algorithm development and validation. JMIR Form. Res. 2023;7:e45156. doi: 10.2196/45156. - DOI - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central

[1] Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint at http://arxiv.org/abs/2303.12712 (2023).

[2] Bubeck, S. et al. Sparks of artificial general intelligence: Early experiments with GPT-4. Preprint at http://arxiv.org/abs/2303.12712 (2023).

[3] Broderick, R. People are using AI for therapy, whether the tech is ready for it or not. Fast Company (2023).

[4] Broderick, R. People are using AI for therapy, whether the tech is ready for it or not. Fast Company (2023).

[5] Weizenbaum J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM. 1966;9:36–45. doi: 10.1145/365153.365168. - DOI

[6] Weizenbaum J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM. 1966;9:36–45. doi: 10.1145/365153.365168. - DOI

[7] Bantilan N, Malgaroli M, Ray B, Hull TD. Just in time crisis response: Suicide alert system for telemedicine psychotherapy settings. Psychother. Res. 2021;31:289–299. doi: 10.1080/10503307.2020.1781952. - DOI - PubMed

[8] Bantilan N, Malgaroli M, Ray B, Hull TD. Just in time crisis response: Suicide alert system for telemedicine psychotherapy settings. Psychother. Res. 2021;31:289–299. doi: 10.1080/10503307.2020.1781952. - DOI - PubMed

[9] Peretz G, Taylor CB, Ruzek JI, Jefroykin S, Sadeh-Sharvit S. Machine learning model to predict assignment of therapy homework in behavioral treatments: Algorithm development and validation. JMIR Form. Res. 2023;7:e45156. doi: 10.2196/45156. - DOI - PMC - PubMed

[10] Peretz G, Taylor CB, Ruzek JI, Jefroykin S, Sadeh-Sharvit S. Machine learning model to predict assignment of therapy homework in behavioral treatments: Algorithm development and validation. JMIR Form. Res. 2023;7:e45156. doi: 10.2196/45156. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

Affiliations

Large language models could change the future of behavioral healthcare: a proposal for responsible development and evaluation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources