使用大型语言模型从在线文本中自动检测问题赌博标志。

IF 7.7 PLOS digital health Pub Date : 2024-09-25 eCollection Date: 2024-09-01 DOI:10.1371/journal.pdig.0000605

Elke Smith, Jan Peters, Nils Reiter

{"title":"使用大型语言模型从在线文本中自动检测问题赌博标志。","authors":"Elke Smith, Jan Peters, Nils Reiter","doi":"10.1371/journal.pdig.0000605","DOIUrl":null,"url":null,"abstract":"Problem gambling is a major public health concern and is associated with profound psychological distress and economic problems. There are numerous gambling communities on the internet where users exchange information about games, gambling tactics, as well as gambling-related problems. Individuals exhibiting higher levels of problem gambling engage more in such communities. Online gambling communities may provide insights into problem-gambling behaviour. Using data scraped from a major German gambling discussion board, we fine-tuned a large language model, specifically a Bidirectional Encoder Representations from Transformers (BERT) model, to predict signs of problem-gambling from forum posts. Training data were generated by manual annotation and by taking into account diagnostic criteria and gambling-related cognitive distortions. Using cross-validation, our models achieved a precision of 0.95 and F1 score of 0.71, demonstrating that satisfactory classification performance can be achieved by generating high-quality training material through manual annotation based on diagnostic criteria. The current study confirms that a BERT-based model can be reliably used on small data sets and to detect signatures of problem gambling in online communication data. Such computational approaches may have potential for the detection of changes in problem-gambling prevalence among online users.","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"3 9","pages":"e0000605"},"PeriodicalIF":7.7000,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423982/pdf/","citationCount":"0","resultStr":"{\"title\":\"Automatic detection of problem-gambling signs from online texts using large language models.\",\"authors\":\"Elke Smith, Jan Peters, Nils Reiter\",\"doi\":\"10.1371/journal.pdig.0000605\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Problem gambling is a major public health concern and is associated with profound psychological distress and economic problems. There are numerous gambling communities on the internet where users exchange information about games, gambling tactics, as well as gambling-related problems. Individuals exhibiting higher levels of problem gambling engage more in such communities. Online gambling communities may provide insights into problem-gambling behaviour. Using data scraped from a major German gambling discussion board, we fine-tuned a large language model, specifically a Bidirectional Encoder Representations from Transformers (BERT) model, to predict signs of problem-gambling from forum posts. Training data were generated by manual annotation and by taking into account diagnostic criteria and gambling-related cognitive distortions. Using cross-validation, our models achieved a precision of 0.95 and F1 score of 0.71, demonstrating that satisfactory classification performance can be achieved by generating high-quality training material through manual annotation based on diagnostic criteria. The current study confirms that a BERT-based model can be reliably used on small data sets and to detect signatures of problem gambling in online communication data. Such computational approaches may have potential for the detection of changes in problem-gambling prevalence among online users.\",\"PeriodicalId\":74465,\"journal\":{\"name\":\"PLOS digital health\",\"volume\":\"3 9\",\"pages\":\"e0000605\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423982/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLOS digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pdig.0000605\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000605","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

问题赌博是一个重大的公共健康问题，与深重的心理压力和经济问题有关。互联网上有许多赌博社区，用户在那里交流有关游戏、赌博策略以及赌博相关问题的信息。问题赌博程度较高的人参与此类社区的程度较高。网络赌博社区可以帮助人们了解问题赌博行为。我们利用从德国一个主要赌博讨论区收集的数据，微调了一个大型语言模型，特别是一个来自变换器的双向编码器表征（BERT）模型，以预测论坛帖子中的问题赌博迹象。训练数据由人工注释生成，并考虑了诊断标准和与赌博相关的认知扭曲。通过交叉验证，我们的模型达到了 0.95 的精确度和 0.71 的 F1 分数，证明了通过基于诊断标准的人工标注生成高质量的训练材料可以获得令人满意的分类性能。目前的研究证实，基于 BERT 的模型可以可靠地用于小型数据集，并检测在线交流数据中的问题赌博特征。这种计算方法可能具有检测在线用户中问题赌博流行率变化的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Automatic detection of problem-gambling signs from online texts using large language models.

Problem gambling is a major public health concern and is associated with profound psychological distress and economic problems. There are numerous gambling communities on the internet where users exchange information about games, gambling tactics, as well as gambling-related problems. Individuals exhibiting higher levels of problem gambling engage more in such communities. Online gambling communities may provide insights into problem-gambling behaviour. Using data scraped from a major German gambling discussion board, we fine-tuned a large language model, specifically a Bidirectional Encoder Representations from Transformers (BERT) model, to predict signs of problem-gambling from forum posts. Training data were generated by manual annotation and by taking into account diagnostic criteria and gambling-related cognitive distortions. Using cross-validation, our models achieved a precision of 0.95 and F1 score of 0.71, demonstrating that satisfactory classification performance can be achieved by generating high-quality training material through manual annotation based on diagnostic criteria. The current study confirms that a BERT-based model can be reliably used on small data sets and to detect signatures of problem gambling in online communication data. Such computational approaches may have potential for the detection of changes in problem-gambling prevalence among online users.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLOS digital health

自引率

0.00%

发文量