Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values

IF 4.8 2区医学 Q1 PSYCHIATRY Jmir Mental Health Pub Date : 2024-04-09 DOI:10.2196/55988

Dorit Hadar-Shoval, Kfir Asraf, Yonathan Mizrachi, Yuval Haber, Zohar Elyoseph

{"title":"Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values","authors":"Dorit Hadar-Shoval, Kfir Asraf, Yonathan Mizrachi, Yuval Haber, Zohar Elyoseph","doi":"10.2196/55988","DOIUrl":null,"url":null,"abstract":"Background: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz’s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. Objective: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. Methods: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire—Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs’ value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. Results: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs’ value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs’ distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs’ responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. Conclusions: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.","PeriodicalId":48616,"journal":{"name":"Jmir Mental Health","volume":"16 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jmir Mental Health","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/55988","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz’s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. Objective: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. Methods: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire—Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs’ value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. Results: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs’ value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs’ distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs’ responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. Conclusions: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

评估大语言模型与心理健康整合的人类价值观的一致性：使用施瓦茨基本价值观理论的跨部门研究

背景：大型语言模型（LLMs）在心理健康应用方面具有潜力。然而，其不透明的调整过程可能会产生偏见，从而形成有问题的观点。评估大型语言模型中蕴含的指导其决策的价值观具有重要的伦理意义。施瓦茨的基本价值观理论（STBV）为量化文化价值取向提供了一个框架，并显示出其在检查心理健康背景下的价值观方面的实用性，包括文化、诊断以及治疗师与客户之间的动态关系。研究目的本研究旨在：（1）评估 STBV 是否能测量主要 LLMs 中的价值类构建；（2）确定 LLMs 是否表现出与人类和彼此不同的价值类模式。研究方法共对 4 种 LLM（Bard、Claude 2、Generative Pretrained Transformer [GPT]-3.5、GPT-4）进行了拟人化处理，并指导它们完成肖像价值观问卷-修订版（PVQ-RR），以评估价值类建构。我们对他们在 10 次测试中的回答进行了信度和效度分析。为了给 LLMs 的价值特征设定基准，我们将他们的结果与来自 49 个国家、完成 PVQ-RR 的 53,472 个不同样本的公开数据进行了比较。这样，我们就可以评估 LLMs 是否与不同文化群体的既定人类价值模式存在差异。我们还通过统计检验比较了不同模型之间的价值特征。结果PVQ-RR 在量化 LLMs 中的价值类基础设施方面显示出良好的可靠性和有效性。然而，当地居民的价值特征与人口数据之间出现了巨大差异。这些模型缺乏共识，表现出明显的动机偏差，反映出不透明的调整过程。例如，与人类相比，所有模型都优先考虑普遍性和自我导向，而不强调成就、权力和安全。成功的判别分析区分了 4 个 LLMs 不同的价值取向。进一步研究发现，当遇到需要在对立价值观之间做出选择的心理健康困境时，这些有偏差的价值取向能够有力地预测当地联络员的反应。这进一步验证了模型中嵌入的类似于动机价值的独特构建，这些构建影响了他们的决策。结论：本研究利用 STBV 方法绘制了主导 LLMs 的动机价值类基础结构图。虽然研究表明 STBV 可以有效地描述 LLMs 中的价值类基础结构，但与人类价值观的巨大差异引发了将这些模型与心理健康应用相结合的伦理问题。如果没有适当的保障措施，对某些文化价值集的偏爱会带来风险。例如，优先考虑普遍性可能会促进无条件的接受，即使在临床上是不明智的。此外，LLMs 之间的差异也凸显了标准化调整过程的必要性，以捕捉真正的文化多样性。因此，任何负责任地将当地语言使用者纳入心理健康护理的做法，都必须考虑到他们的内在偏见和动机不匹配，以确保在不同人群中公平地提供服务。要做到这一点，就需要提高透明度，改进调整技术，以灌输全面的人类价值观。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Jmir Mental Health Medicine-Psychiatry and Mental Health

CiteScore

10.80

自引率

3.80%

发文量

104

审稿时长

16 weeks

期刊介绍： JMIR Mental Health (JMH, ISSN 2368-7959) is a PubMed-indexed, peer-reviewed sister journal of JMIR, the leading eHealth journal (Impact Factor 2016: 5.175). JMIR Mental Health focusses on digital health and Internet interventions, technologies and electronic innovations (software and hardware) for mental health, addictions, online counselling and behaviour change. This includes formative evaluation and system descriptions, theoretical papers, review papers, viewpoint/vision papers, and rigorous evaluations.