Comparing Selective Masking Methods for Depression Detection in Social Media

IF 5.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Computational Linguistics Pub Date : 2023-04-28 DOI:10.1162/coli_a_00479

Chanapa Pananookooln, Jakrapop Akaranee, Chaklam Silpasuwanchai

{"title":"Comparing Selective Masking Methods for Depression Detection in Social Media","authors":"Chanapa Pananookooln, Jakrapop Akaranee, Chaklam Silpasuwanchai","doi":"10.1162/coli_a_00479","DOIUrl":null,"url":null,"abstract":"\n Identifying those at risk for depression is a crucial issue where social media provides an excellent platform for examining the linguistic patterns of depressed individuals. A significant challenge in depression classification problem is ensuring that prediction models are not overly dependent on topic keywords i.e., depression keywords, such that it fails to predict when such keywords are unavailable. One promising approach is masking, i.e., by selectively masking various words and asking the model to predict the masked words, the model is forced to learn the inherent language patterns of depression. This study evaluates seven masking techniques. Moreover, to predict the masked words during pre-training or fine-tuning phase was also examined. Last, six class imbalance ratios were compared to determine the robustness of masked words selection methods. Key findings demonstrated that selective masking outperforms random masking in terms of F1-score. The most accurate and robust models were identified. Our research also indicated that reconstructing the masked words during pre-training phase is more advantageous than during the fine-tuning phase. Further discussion and implications were made. This is the first study to comprehensively compare masked words selection methods, which has broad implications for the field of depression classification and general NLP. Our code can be found in: https://github.com/chanapapan/Depression-Detection.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":" ","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2023-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Linguistics","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/coli_a_00479","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Identifying those at risk for depression is a crucial issue where social media provides an excellent platform for examining the linguistic patterns of depressed individuals. A significant challenge in depression classification problem is ensuring that prediction models are not overly dependent on topic keywords i.e., depression keywords, such that it fails to predict when such keywords are unavailable. One promising approach is masking, i.e., by selectively masking various words and asking the model to predict the masked words, the model is forced to learn the inherent language patterns of depression. This study evaluates seven masking techniques. Moreover, to predict the masked words during pre-training or fine-tuning phase was also examined. Last, six class imbalance ratios were compared to determine the robustness of masked words selection methods. Key findings demonstrated that selective masking outperforms random masking in terms of F1-score. The most accurate and robust models were identified. Our research also indicated that reconstructing the masked words during pre-training phase is more advantageous than during the fine-tuning phase. Further discussion and implications were made. This is the first study to comprehensively compare masked words selection methods, which has broad implications for the field of depression classification and general NLP. Our code can be found in: https://github.com/chanapapan/Depression-Detection.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

社交媒体中抑郁症检测的选择性蒙面方法比较

识别那些有抑郁症风险的人是一个关键问题，社交媒体为研究抑郁症患者的语言模式提供了一个很好的平台。抑郁症分类问题的一个重大挑战是确保预测模型不过度依赖主题关键词，即抑郁症关键词，从而在这些关键词不可用时无法预测。一种很有前途的方法是掩蔽，即通过选择性地掩蔽各种单词，并要求模型预测被掩蔽的单词，模型被迫学习抑郁症固有的语言模式。本研究评估了七种掩蔽技术。此外，还研究了在预训练阶段和微调阶段对掩蔽词的预测。最后，比较了六种类别不平衡比率，以确定掩蔽词选择方法的稳健性。主要研究结果表明，在f1得分方面，选择性掩蔽优于随机掩蔽。确定了最准确和最稳健的模型。我们的研究还表明，在预训练阶段重建被屏蔽词比在微调阶段更有利。作了进一步的讨论和影响。这是第一个全面比较掩蔽词选择方法的研究，对抑郁症分类和一般自然语言处理领域具有广泛的意义。我们的代码可以在https://github.com/chanapapan/Depression-Detection中找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Computational Linguistics 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Computational Linguistics, the longest-running publication dedicated solely to the computational and mathematical aspects of language and the design of natural language processing systems, provides university and industry linguists, computational linguists, AI and machine learning researchers, cognitive scientists, speech specialists, and philosophers with the latest insights into the computational aspects of language research.