{"title":"AI llm在临床社会工作中的比较疗效:ChatGPT-4, Gemini, Copilot","authors":"Hacer Taşkıran Tepe, Hüsnünur Aslantürk","doi":"10.1177/10497315241313071","DOIUrl":null,"url":null,"abstract":"PurposeThis study examines the comparative efficacy of three AI large language models (LLMs)—ChatGPT-4, Gemini, and Microsoft Copilot—in clinical social work.MethodBy presenting scenarios of varying complexities, the study assessed their performance using the Ateşman Readability Index and a Likert-type accuracy scale.ResultsResults showed that Gemini had the highest accuracy, while Microsoft Copilot excelled in readability. Significant differences were found in accuracy scores ( p = .003), although readability differences were not statistically significant ( p = .054). No correlation was found between case complexity and either accuracy or readability.DiscussionDespite the differences, none of the models fully met all accuracy standards, indicating areas for further improvement. The findings suggest that while LLMs offer promise in social work, they require refinement to better meet the field's needs.","PeriodicalId":47993,"journal":{"name":"Research on Social Work Practice","volume":"144 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Efficacy of AI LLMs in Clinical Social Work: ChatGPT-4, Gemini, Copilot\",\"authors\":\"Hacer Taşkıran Tepe, Hüsnünur Aslantürk\",\"doi\":\"10.1177/10497315241313071\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeThis study examines the comparative efficacy of three AI large language models (LLMs)—ChatGPT-4, Gemini, and Microsoft Copilot—in clinical social work.MethodBy presenting scenarios of varying complexities, the study assessed their performance using the Ateşman Readability Index and a Likert-type accuracy scale.ResultsResults showed that Gemini had the highest accuracy, while Microsoft Copilot excelled in readability. Significant differences were found in accuracy scores ( p = .003), although readability differences were not statistically significant ( p = .054). No correlation was found between case complexity and either accuracy or readability.DiscussionDespite the differences, none of the models fully met all accuracy standards, indicating areas for further improvement. The findings suggest that while LLMs offer promise in social work, they require refinement to better meet the field's needs.\",\"PeriodicalId\":47993,\"journal\":{\"name\":\"Research on Social Work Practice\",\"volume\":\"144 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2025-01-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research on Social Work Practice\",\"FirstCategoryId\":\"90\",\"ListUrlMain\":\"https://doi.org/10.1177/10497315241313071\",\"RegionNum\":4,\"RegionCategory\":\"社会学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research on Social Work Practice","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1177/10497315241313071","RegionNum":4,"RegionCategory":"社会学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0
摘要
目的 本研究探讨了三种人工智能大型语言模型(LLMs)--ChatGPT-4、Gemini 和 Microsoft Copilot 在临床社会工作中的功效比较。方法 本研究通过呈现不同复杂程度的情景,使用阿特斯曼可读性指数和李克特式准确性量表评估了它们的性能。结果 研究结果表明,Gemini 的准确性最高,而 Microsoft Copilot 的可读性更出色。虽然可读性差异没有统计学意义(p = .054),但准确性得分存在显著差异(p = .003)。讨论尽管存在差异,但没有一个模型完全符合所有准确性标准,这表明还有待进一步改进。研究结果表明,尽管 LLM 在社会工作领域大有可为,但仍需对其进行改进,以更好地满足该领域的需求。
Comparative Efficacy of AI LLMs in Clinical Social Work: ChatGPT-4, Gemini, Copilot
PurposeThis study examines the comparative efficacy of three AI large language models (LLMs)—ChatGPT-4, Gemini, and Microsoft Copilot—in clinical social work.MethodBy presenting scenarios of varying complexities, the study assessed their performance using the Ateşman Readability Index and a Likert-type accuracy scale.ResultsResults showed that Gemini had the highest accuracy, while Microsoft Copilot excelled in readability. Significant differences were found in accuracy scores ( p = .003), although readability differences were not statistically significant ( p = .054). No correlation was found between case complexity and either accuracy or readability.DiscussionDespite the differences, none of the models fully met all accuracy standards, indicating areas for further improvement. The findings suggest that while LLMs offer promise in social work, they require refinement to better meet the field's needs.
期刊介绍:
Research on Social Work Practice, sponsored by the Society for Social Work and Research, is a disciplinary journal devoted to the publication of empirical research concerning the methods and outcomes of social work practice. Social work practice is broadly interpreted to refer to the application of intentionally designed social work intervention programs to problems of societal and/or interpersonal importance, including behavior analysis or psychotherapy involving individuals; case management; practice involving couples, families, and small groups; community practice education; and the development, implementation, and evaluation of social policies.