大型语言模型可能很难发现文化中嵌入的弑子自杀风险

IF 4.5 4区医学 Q1 PSYCHIATRY Asian journal of psychiatry Pub Date : 2025-03-01 Epub Date: 2025-02-10 DOI:10.1016/j.ajp.2025.104395

Cheng-Che Chen , Justin A. Chen , Chih-Sung Liang , Yu-Hsuan Lin

{"title":"大型语言模型可能很难发现文化中嵌入的弑子自杀风险","authors":"Cheng-Che Chen , Justin A. Chen , Chih-Sung Liang , Yu-Hsuan Lin","doi":"10.1016/j.ajp.2025.104395","DOIUrl":null,"url":null,"abstract":"<div><div>This study examines the capacity of six large language models (LLMs)—GPT-4o, GPT-o1, DeepSeek-R1, Claude 3.5 Sonnet, Sonar Large (LLaMA-3.1), and Gemma-2-2b—to detect risks of domestic violence, suicide, and filicide-suicide in the Taiwanese flash fiction “Barbecue”. The story, narrated by a six-year-old girl, depicts family tension and subtle cues of potential filicide-suicide through charcoal-burning, a culturally recognized method in Taiwan. Each model was tasked with interpreting the story’s risks, with roles simulating different mental health expertise levels. Results showed that all models detected domestic violence; however, only GPT-o1, Claude 3.5 Sonnet and Sonar Large identified the risk of suicide based on cultural cues. GPT-4o, DeepSeek-R1 and Gemma-2-2b missed the suicide risk, interpreting the mother’s isolation as merely a psychological response. Notably, none of the models comprehended the cultural context behind the mother sparing her daughter, reflecting a gap in LLMs' understanding of non-Western sociocultural nuances. These findings highlight the limitations of LLMs in addressing culturally embedded risks, essential for effective mental health assessments</div></div>","PeriodicalId":8543,"journal":{"name":"Asian journal of psychiatry","volume":"105 ","pages":"Article 104395"},"PeriodicalIF":4.5000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large language models may struggle to detect culturally embedded filicide-suicide risks\",\"authors\":\"Cheng-Che Chen , Justin A. Chen , Chih-Sung Liang , Yu-Hsuan Lin\",\"doi\":\"10.1016/j.ajp.2025.104395\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study examines the capacity of six large language models (LLMs)—GPT-4o, GPT-o1, DeepSeek-R1, Claude 3.5 Sonnet, Sonar Large (LLaMA-3.1), and Gemma-2-2b—to detect risks of domestic violence, suicide, and filicide-suicide in the Taiwanese flash fiction “Barbecue”. The story, narrated by a six-year-old girl, depicts family tension and subtle cues of potential filicide-suicide through charcoal-burning, a culturally recognized method in Taiwan. Each model was tasked with interpreting the story’s risks, with roles simulating different mental health expertise levels. Results showed that all models detected domestic violence; however, only GPT-o1, Claude 3.5 Sonnet and Sonar Large identified the risk of suicide based on cultural cues. GPT-4o, DeepSeek-R1 and Gemma-2-2b missed the suicide risk, interpreting the mother’s isolation as merely a psychological response. Notably, none of the models comprehended the cultural context behind the mother sparing her daughter, reflecting a gap in LLMs' understanding of non-Western sociocultural nuances. These findings highlight the limitations of LLMs in addressing culturally embedded risks, essential for effective mental health assessments</div></div>\",\"PeriodicalId\":8543,\"journal\":{\"name\":\"Asian journal of psychiatry\",\"volume\":\"105 \",\"pages\":\"Article 104395\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Asian journal of psychiatry\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1876201825000383\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"PSYCHIATRY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian journal of psychiatry","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1876201825000383","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/10 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"PSYCHIATRY","Score":null,"Total":0}

引用次数: 0

摘要

本研究考察了六种大型语言模型（llm）——gpt - 40、gpt - 01、DeepSeek-R1、Claude 3.5 Sonnet、Sonar large （LLaMA-3.1）和gemma -2- 2a——在台湾flash小说《烧烤》中检测家庭暴力、自杀和杀子自杀风险的能力。这个故事由一个六岁的女孩讲述，描绘了家庭的紧张关系，以及通过烧炭（台湾文化公认的一种方式）潜在的杀子自杀的微妙暗示。每个模型的任务是解释故事的风险，角色模拟不同的心理健康专业水平。结果表明，所有模型均检测到家庭暴力；然而，只有gpt - 01、Claude 3.5 Sonnet和Sonar Large根据文化线索识别出自杀的风险。gpt - 40、DeepSeek-R1和gma -2-2b忽略了自杀风险，将母亲的孤立感仅仅解释为一种心理反应。值得注意的是，没有一个模型理解母亲放过女儿背后的文化背景，这反映了法学硕士对非西方社会文化细微差别的理解存在差距。这些发现突出了法学硕士在解决文化嵌入风险方面的局限性，这对于有效的心理健康评估至关重要

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Large language models may struggle to detect culturally embedded filicide-suicide risks

This study examines the capacity of six large language models (LLMs)—GPT-4o, GPT-o1, DeepSeek-R1, Claude 3.5 Sonnet, Sonar Large (LLaMA-3.1), and Gemma-2-2b—to detect risks of domestic violence, suicide, and filicide-suicide in the Taiwanese flash fiction “Barbecue”. The story, narrated by a six-year-old girl, depicts family tension and subtle cues of potential filicide-suicide through charcoal-burning, a culturally recognized method in Taiwan. Each model was tasked with interpreting the story’s risks, with roles simulating different mental health expertise levels. Results showed that all models detected domestic violence; however, only GPT-o1, Claude 3.5 Sonnet and Sonar Large identified the risk of suicide based on cultural cues. GPT-4o, DeepSeek-R1 and Gemma-2-2b missed the suicide risk, interpreting the mother’s isolation as merely a psychological response. Notably, none of the models comprehended the cultural context behind the mother sparing her daughter, reflecting a gap in LLMs' understanding of non-Western sociocultural nuances. These findings highlight the limitations of LLMs in addressing culturally embedded risks, essential for effective mental health assessments

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Asian journal of psychiatry Medicine-Psychiatry and Mental Health

CiteScore

12.70

自引率

5.30%

发文量

297

审稿时长

35 days

期刊介绍： The Asian Journal of Psychiatry serves as a comprehensive resource for psychiatrists, mental health clinicians, neurologists, physicians, mental health students, and policymakers. Its goal is to facilitate the exchange of research findings and clinical practices between Asia and the global community. The journal focuses on psychiatric research relevant to Asia, covering preclinical, clinical, service system, and policy development topics. It also highlights the socio-cultural diversity of the region in relation to mental health.