Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language.

IF 3.5 4区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE PeerJ Computer Science Pub Date : 2024-11-28 eCollection Date: 2024-01-01 DOI:10.7717/peerj-cs.2395

Sergei Koltcov, Anton Surkov, Olessia Koltsova, Vera Ignatenko

{"title":"Using large language models for extracting and pre-annotating texts on mental health from noisy data in a low-resource language.","authors":"Sergei Koltcov, Anton Surkov, Olessia Koltsova, Vera Ignatenko","doi":"10.7717/peerj-cs.2395","DOIUrl":null,"url":null,"abstract":"<p><p>Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems that utilize extensive public domain user-to-user and user-to-professional discussions on social media. These discussions, however, are extremely noisy, necessitating the adaptation of LLMs for fully automatic cleaning and pre-classification to reduce human annotation effort. To date, research on LLM-based annotation in the mental health domain is extremely scarce. In this article, we explore the potential of zero-shot classification using four LLMs to select and pre-classify texts into topics representing psychiatric disorders, in order to facilitate the future development of CAs for disorder-specific counseling. We use 64,404 Russian-language texts from online discussion threads labeled with seven most commonly discussed disorders: depression, neurosis, paranoia, anxiety disorder, bipolar disorder, obsessive-compulsive disorder, and borderline personality disorder. Our research shows that while preliminary data filtering using zero-shot technology slightly improves classification, LLM fine-tuning makes a far larger contribution to its quality. Both standard and natural language inference (NLI) modes of fine-tuning increase classification accuracy by more than three times compared to non-fine-tuned training with preliminarily filtered data. Although NLI fine-tuning achieves slightly higher accuracy (0.64) than the standard approach, it is six times slower, indicating a need for further experimentation with NLI hypothesis engineering. Additionally, we demonstrate that lemmatization does not affect classification quality and that multilingual models using texts in their original language perform slightly better than English-only models using automatically translated texts. Finally, we introduce our dataset and model as the first openly available Russian-language resource for developing conversational agents in the domain of mental health counseling.</p>","PeriodicalId":54224,"journal":{"name":"PeerJ Computer Science","volume":"10 ","pages":"e2395"},"PeriodicalIF":3.5000,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623104/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PeerJ Computer Science","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.7717/peerj-cs.2395","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Recent advancements in large language models (LLMs) have opened new possibilities for developing conversational agents (CAs) in various subfields of mental healthcare. However, this progress is hindered by limited access to high-quality training data, often due to privacy concerns and high annotation costs for low-resource languages. A potential solution is to create human-AI annotation systems that utilize extensive public domain user-to-user and user-to-professional discussions on social media. These discussions, however, are extremely noisy, necessitating the adaptation of LLMs for fully automatic cleaning and pre-classification to reduce human annotation effort. To date, research on LLM-based annotation in the mental health domain is extremely scarce. In this article, we explore the potential of zero-shot classification using four LLMs to select and pre-classify texts into topics representing psychiatric disorders, in order to facilitate the future development of CAs for disorder-specific counseling. We use 64,404 Russian-language texts from online discussion threads labeled with seven most commonly discussed disorders: depression, neurosis, paranoia, anxiety disorder, bipolar disorder, obsessive-compulsive disorder, and borderline personality disorder. Our research shows that while preliminary data filtering using zero-shot technology slightly improves classification, LLM fine-tuning makes a far larger contribution to its quality. Both standard and natural language inference (NLI) modes of fine-tuning increase classification accuracy by more than three times compared to non-fine-tuned training with preliminarily filtered data. Although NLI fine-tuning achieves slightly higher accuracy (0.64) than the standard approach, it is six times slower, indicating a need for further experimentation with NLI hypothesis engineering. Additionally, we demonstrate that lemmatization does not affect classification quality and that multilingual models using texts in their original language perform slightly better than English-only models using automatically translated texts. Finally, we introduce our dataset and model as the first openly available Russian-language resource for developing conversational agents in the domain of mental health counseling.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用大型语言模型从低资源语言的嘈杂数据中提取和预注释有关心理健康的文本。

大型语言模型（llm）的最新进展为在精神卫生保健的各个子领域开发会话代理（ca）开辟了新的可能性。然而，这一进展受到对高质量训练数据的有限访问的阻碍，这通常是由于隐私问题和低资源语言的高注释成本。一个潜在的解决方案是创建人类-人工智能注释系统，利用社交媒体上广泛的公共领域用户对用户和用户对专业人士的讨论。然而，这些讨论非常嘈杂，需要对llm进行适应，以实现全自动清洗和预分类，以减少人工注释工作。迄今为止，基于llm的标注在心理健康领域的研究非常少。在这篇文章中，我们探索了零射击分类的潜力，使用四个法学硕士来选择和预分类文本到代表精神疾病的主题，以促进ca在疾病特异性咨询方面的未来发展。我们使用了来自在线讨论线程的64404个俄语文本，这些文本被标记为七种最常讨论的疾病：抑郁症、神经症、偏执、焦虑症、双相情感障碍、强迫症和边缘型人格障碍。我们的研究表明，虽然使用zero-shot技术的初步数据过滤略微提高了分类，但LLM微调对分类质量的贡献要大得多。与使用初步过滤的数据进行非微调训练相比，标准和自然语言推理（NLI）微调模式的分类准确率都提高了3倍以上。虽然NLI微调达到了比标准方法稍高的精度（0.64），但它要慢6倍，这表明需要进一步的NLI假设工程实验。此外，我们证明了词序化不会影响分类质量，并且使用原始语言文本的多语言模型比使用自动翻译文本的纯英语模型表现略好。最后，我们介绍了我们的数据集和模型，作为在心理健康咨询领域开发会话代理的第一个公开可用的俄语资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

PeerJ Computer Science Computer Science-General Computer Science

CiteScore

6.10

自引率

5.30%

发文量

332

审稿时长

10 weeks

期刊介绍： PeerJ Computer Science is the new open access journal covering all subject areas in computer science, with the backing of a prestigious advisory board and more than 300 academic editors.