CLPsych2019 Shared Task: Predicting Suicide Risk Level from Reddit Posts on Multiple Forums

V. Ruiz, Lingyun Shi, Wei Quan, N. Ryan, C. Biernesser, D. Brent, R. Tsui
{"title":"CLPsych2019 Shared Task: Predicting Suicide Risk Level from Reddit Posts on Multiple Forums","authors":"V. Ruiz, Lingyun Shi, Wei Quan, N. Ryan, C. Biernesser, D. Brent, R. Tsui","doi":"10.18653/v1/W19-3020","DOIUrl":null,"url":null,"abstract":"We aimed to predict an individual suicide risk level from longitudinal posts on Reddit discussion forums. Through participating in a shared task competition hosted by CLPsych2019, we received two annotated datasets: a training dataset with 496 users (31,553 posts) and a test dataset with 125 users (9610 posts). We submitted results from our three best-performing machine-learning models: SVM, Naïve Bayes, and an ensemble model. Each model provided a user’s suicide risk level in four categories, i.e., no risk, low risk, moderate risk, and severe risk. Among the three models, the ensemble model had the best macro-averaged F1 score 0.379 when tested on the holdout test dataset. The NB model had the best performance in two additional binary-classification tasks, i.e., no risk vs. flagged risk (any risk level other than no risk) with F1 score 0.836 and no or low risk vs. urgent risk (moderate or severe risk) with F1 score 0.736. We conclude that the NB model may serve as a tool for identifying users with flagged or urgent suicide risk based on longitudinal posts on Reddit discussion forums.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-3020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

Abstract

We aimed to predict an individual suicide risk level from longitudinal posts on Reddit discussion forums. Through participating in a shared task competition hosted by CLPsych2019, we received two annotated datasets: a training dataset with 496 users (31,553 posts) and a test dataset with 125 users (9610 posts). We submitted results from our three best-performing machine-learning models: SVM, Naïve Bayes, and an ensemble model. Each model provided a user’s suicide risk level in four categories, i.e., no risk, low risk, moderate risk, and severe risk. Among the three models, the ensemble model had the best macro-averaged F1 score 0.379 when tested on the holdout test dataset. The NB model had the best performance in two additional binary-classification tasks, i.e., no risk vs. flagged risk (any risk level other than no risk) with F1 score 0.836 and no or low risk vs. urgent risk (moderate or severe risk) with F1 score 0.736. We conclude that the NB model may serve as a tool for identifying users with flagged or urgent suicide risk based on longitudinal posts on Reddit discussion forums.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CLPsych2019共享任务:从多个论坛上的Reddit帖子预测自杀风险水平
我们旨在通过Reddit论坛上的纵向帖子预测个人自杀风险水平。通过参加CLPsych2019主办的共享任务竞赛,我们收到了两个带注释的数据集:一个有496个用户(31,553个帖子)的训练数据集和一个有125个用户(9610个帖子)的测试数据集。我们提交了三个表现最好的机器学习模型的结果:SVM、Naïve贝叶斯和一个集成模型。每个模型都提供了用户的自杀风险水平,分为无风险、低风险、中等风险和严重风险四类。在三个模型中,集成模型在holdout测试数据集上的宏观平均F1得分为0.379。NB模型在另外两个二元分类任务中表现最好,即无风险vs标记风险(除无风险之外的任何风险水平)F1得分为0.836,无风险或低风险vs紧急风险(中度或重度风险)F1得分为0.736。我们的结论是,NB模型可以作为一种工具,根据Reddit论坛上的纵向帖子,识别有标记或紧急自杀风险的用户。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Depressed Individuals Use Negative Self-Focused Language When Recalling Recent Interactions with Close Romantic Partners but Not Family or Friends Suicide Risk Assessment on Social Media: USI-UPF at the CLPsych 2019 Shared Task ConvSent at CLPsych 2019 Task A: Using Post-level Sentiment Features for Suicide Risk Prediction on Reddit Linguistic Analysis of Schizophrenia in Reddit Posts Predicting Suicide Risk from Online Postings in Reddit The UGent-IDLab submission to the CLPysch 2019 Shared Task A
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1