首页 > 最新文献

Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology最新文献

英文 中文
Predicting Suicide Risk from Online Postings in Reddit The UGent-IDLab submission to the CLPysch 2019 Shared Task A UGent-IDLab提交给CLPysch 2019共享任务A
Semere Kiros Bitew, Giannis Bekoulis, Johannes Deleu, Lucas Sterckx, Klim Zaporojets, Thomas Demeester, Chris Develder
This paper describes IDLab’s text classification systems submitted to Task A as part of the CLPsych 2019 shared task. The aim of this shared task was to develop automated systems that predict the degree of suicide risk of people based on their posts on Reddit. Bag-of-words features, emotion features and post level predictions are used to derive user-level predictions. Linear models and ensembles of these models are used to predict final scores. We find that predicting fine-grained risk levels is much more difficult than flagging potentially at-risk users. Furthermore, we do not find clear added value from building richer ensembles compared to simple baselines, given the available training data and the nature of the prediction task.
本文描述了作为CLPsych 2019共享任务的一部分提交给任务A的IDLab文本分类系统。这项共享任务的目的是开发自动化系统,根据人们在Reddit上的帖子来预测他们的自杀风险程度。使用词袋特征、情感特征和帖子级别预测来推导用户级别预测。使用线性模型和这些模型的集合来预测最终分数。我们发现,预测细粒度的风险水平比标记潜在风险用户要困难得多。此外,考虑到可用的训练数据和预测任务的性质,我们没有发现与简单基线相比,构建更丰富的集成的明显附加价值。
{"title":"Predicting Suicide Risk from Online Postings in Reddit The UGent-IDLab submission to the CLPysch 2019 Shared Task A","authors":"Semere Kiros Bitew, Giannis Bekoulis, Johannes Deleu, Lucas Sterckx, Klim Zaporojets, Thomas Demeester, Chris Develder","doi":"10.18653/v1/W19-3019","DOIUrl":"https://doi.org/10.18653/v1/W19-3019","url":null,"abstract":"This paper describes IDLab’s text classification systems submitted to Task A as part of the CLPsych 2019 shared task. The aim of this shared task was to develop automated systems that predict the degree of suicide risk of people based on their posts on Reddit. Bag-of-words features, emotion features and post level predictions are used to derive user-level predictions. Linear models and ensembles of these models are used to predict final scores. We find that predicting fine-grained risk levels is much more difficult than flagging potentially at-risk users. Furthermore, we do not find clear added value from building richer ensembles compared to simple baselines, given the available training data and the nature of the prediction task.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116669813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Towards augmenting crisis counselor training by improving message retrieval 通过改进信息检索来加强危机咨询师的培训
O. Demasi, Marti A. Hearst, B. Recht
A fundamental challenge when training counselors is presenting novices with the opportunity to practice counseling distressed individuals without exacerbating a situation. Rather than replacing human empathy with an automated counselor, we propose simulating an individual in crisis so that human counselors in training can practice crisis counseling in a low-risk environment. Towards this end, we collect a dataset of suicide prevention counselor role-play transcripts and make initial steps towards constructing a CRISISbot for humans to counsel while in training. In this data-constrained setting, we evaluate the potential for message retrieval to construct a coherent chat agent in light of recent advances with text embedding methods. Our results show that embeddings can considerably improve retrieval approaches to make them competitive with generative models. By coherently retrieving messages, we can help counselors practice chatting in a low-risk environment.
当培训咨询师时,一个基本的挑战是给新手提供机会,让他们在不使情况恶化的情况下练习咨询困扰的个人。与其用自动化咨询师取代人类的同理心,我们建议模拟危机中的个体,这样接受培训的人类咨询师就可以在低风险的环境中进行危机咨询。为此,我们收集了一个自杀预防咨询师角色扮演记录的数据集,并为构建一个CRISISbot迈出了第一步,以便人类在训练时提供咨询。在这种数据约束的设置中,我们根据文本嵌入方法的最新进展,评估了消息检索构建连贯聊天代理的潜力。我们的研究结果表明,嵌入可以大大改善检索方法,使其与生成模型竞争。通过连贯地检索信息,我们可以帮助辅导员在低风险的环境中练习聊天。
{"title":"Towards augmenting crisis counselor training by improving message retrieval","authors":"O. Demasi, Marti A. Hearst, B. Recht","doi":"10.18653/v1/W19-3001","DOIUrl":"https://doi.org/10.18653/v1/W19-3001","url":null,"abstract":"A fundamental challenge when training counselors is presenting novices with the opportunity to practice counseling distressed individuals without exacerbating a situation. Rather than replacing human empathy with an automated counselor, we propose simulating an individual in crisis so that human counselors in training can practice crisis counseling in a low-risk environment. Towards this end, we collect a dataset of suicide prevention counselor role-play transcripts and make initial steps towards constructing a CRISISbot for humans to counsel while in training. In this data-constrained setting, we evaluate the potential for message retrieval to construct a coherent chat agent in light of recent advances with text embedding methods. Our results show that embeddings can considerably improve retrieval approaches to make them competitive with generative models. By coherently retrieving messages, we can help counselors practice chatting in a low-risk environment.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"149 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133953339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Using natural conversations to classify autism with limited data: Age matters 使用自然对话在有限数据下对自闭症进行分类:年龄很重要
M. Hauser, E. Sariyanidi, B. Tunç, C. Zampella, E. Brodkin, R. Schultz, J. Parish-Morris
Spoken language ability is highly heterogeneous in Autism Spectrum Disorder (ASD), which complicates efforts to identify linguistic markers for use in diagnostic classification, clinical characterization, and for research and clinical outcome measurement. Machine learning techniques that harness the power of multivariate statistics and non-linear data analysis hold promise for modeling this heterogeneity, but many models require enormous datasets, which are unavailable for most psychiatric conditions (including ASD). In lieu of such datasets, good models can still be built by leveraging domain knowledge. In this study, we compare two machine learning approaches: the first approach incorporates prior knowledge about language variation across middle childhood, adolescence, and adulthood to classify 6-minute naturalistic conversation samples from 140 age- and IQ-matched participants (81 with ASD), while the other approach treats all ages the same. We found that individual age-informed models were significantly more accurate than a single model tasked with building a common algorithm across age groups. Furthermore, predictive linguistic features differed significantly by age group, confirming the importance of considering age-related changes in language use when classifying ASD. Our results suggest that limitations imposed by heterogeneity inherent to ASD and from developmental change with age can be (at least partially) overcome using domain knowledge, such as understanding spoken language development from childhood through adulthood.
自闭症谱系障碍(ASD)患者的口语能力是高度异质性的,这使得识别用于诊断分类、临床表征以及研究和临床结果测量的语言标记变得复杂。利用多元统计和非线性数据分析能力的机器学习技术有望为这种异质性建模,但许多模型需要庞大的数据集,而这些数据集无法用于大多数精神疾病(包括ASD)。代替这样的数据集,好的模型仍然可以通过利用领域知识来构建。在本研究中,我们比较了两种机器学习方法:第一种方法结合了关于童年中期、青春期和成年期语言变化的先验知识,对140名年龄和智商匹配的参与者(81名患有ASD)的6分钟自然对话样本进行分类,而另一种方法对所有年龄段的人都进行了相同的分类。我们发现,与建立跨年龄组通用算法的单一模型相比,单个年龄信息模型的准确性要高得多。此外,预测语言特征在不同年龄组之间存在显著差异,这证实了在对ASD进行分类时考虑与年龄相关的语言使用变化的重要性。我们的研究结果表明,ASD固有的异质性和随着年龄的发展变化所带来的限制可以(至少部分地)通过领域知识来克服,例如理解从童年到成年的口语发展。
{"title":"Using natural conversations to classify autism with limited data: Age matters","authors":"M. Hauser, E. Sariyanidi, B. Tunç, C. Zampella, E. Brodkin, R. Schultz, J. Parish-Morris","doi":"10.18653/v1/W19-3006","DOIUrl":"https://doi.org/10.18653/v1/W19-3006","url":null,"abstract":"Spoken language ability is highly heterogeneous in Autism Spectrum Disorder (ASD), which complicates efforts to identify linguistic markers for use in diagnostic classification, clinical characterization, and for research and clinical outcome measurement. Machine learning techniques that harness the power of multivariate statistics and non-linear data analysis hold promise for modeling this heterogeneity, but many models require enormous datasets, which are unavailable for most psychiatric conditions (including ASD). In lieu of such datasets, good models can still be built by leveraging domain knowledge. In this study, we compare two machine learning approaches: the first approach incorporates prior knowledge about language variation across middle childhood, adolescence, and adulthood to classify 6-minute naturalistic conversation samples from 140 age- and IQ-matched participants (81 with ASD), while the other approach treats all ages the same. We found that individual age-informed models were significantly more accurate than a single model tasked with building a common algorithm across age groups. Furthermore, predictive linguistic features differed significantly by age group, confirming the importance of considering age-related changes in language use when classifying ASD. Our results suggest that limitations imposed by heterogeneity inherent to ASD and from developmental change with age can be (at least partially) overcome using domain knowledge, such as understanding spoken language development from childhood through adulthood.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116906843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Suicide Risk Assessment on Social Media: USI-UPF at the CLPsych 2019 Shared Task 社交媒体上的自杀风险评估:USI-UPF在CLPsych 2019共享任务上
E. A. Ríssola, Diana Ramírez-Cifuentes, Ana Freire, F. Crestani
This paper describes the participation of the USI-UPF team at the shared task of the 2019 Computational Linguistics and Clinical Psychology Workshop (CLPsych2019). The goal is to assess the degree of suicide risk of social media users given a labelled dataset with their posts. An appropriate suicide risk assessment, with the usage of automated methods, can assist experts on the detection of people at risk and eventually contribute to prevent suicide. We propose a set of machine learning models with features based on lexicons, word embeddings, word level n-grams, and statistics extracted from users’ posts. The results show that the most effective models for the tasks are obtained integrating lexicon-based features, a selected set of n-grams, and statistical measures.
本文描述了USI-UPF团队参与2019年计算语言学与临床心理学研讨会(CLPsych2019)的共同任务。该研究的目的是评估社交媒体用户的自杀风险程度,该数据集包含他们的帖子。使用自动化方法进行适当的自杀风险评估,可以帮助专家发现有风险的人,并最终有助于预防自杀。我们提出了一套机器学习模型,其特征基于词汇、词嵌入、词级n-图和从用户帖子中提取的统计数据。结果表明,结合基于词典的特征、选定的n-gram集和统计度量,获得了最有效的任务模型。
{"title":"Suicide Risk Assessment on Social Media: USI-UPF at the CLPsych 2019 Shared Task","authors":"E. A. Ríssola, Diana Ramírez-Cifuentes, Ana Freire, F. Crestani","doi":"10.18653/v1/W19-3021","DOIUrl":"https://doi.org/10.18653/v1/W19-3021","url":null,"abstract":"This paper describes the participation of the USI-UPF team at the shared task of the 2019 Computational Linguistics and Clinical Psychology Workshop (CLPsych2019). The goal is to assess the degree of suicide risk of social media users given a labelled dataset with their posts. An appropriate suicide risk assessment, with the usage of automated methods, can assist experts on the detection of people at risk and eventually contribute to prevent suicide. We propose a set of machine learning models with features based on lexicons, word embeddings, word level n-grams, and statistics extracted from users’ posts. The results show that the most effective models for the tasks are obtained integrating lexicon-based features, a selected set of n-grams, and statistical measures.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114717268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Linguistic Analysis of Schizophrenia in Reddit Posts Reddit帖子中精神分裂症的语言分析
Jonathan Zomick, Sarah Ita Levitan, M. Serper
We explore linguistic indicators of schizophrenia in Reddit discussion forums. Schizophrenia (SZ) is a chronic mental disorder that affects a person’s thoughts and behaviors. Identifying and detecting signs of SZ is difficult given that SZ is relatively uncommon, affecting approximately 1% of the US population, and people suffering with SZ often believe that they do not have the disorder. Linguistic abnormalities are a hallmark of SZ and many of the illness’s symptoms are manifested through language. In this paper we leverage the vast amount of data available from social media and use statistical and machine learning approaches to study linguistic characteristics of SZ. We collected and analyzed a large corpus of Reddit posts from users claiming to have received a formal diagnosis of SZ and identified several linguistic features that differentiated these users from a control (CTL) group. We compared these results to other findings on social media linguistic analysis and SZ. We also developed a machine learning classifier to automatically identify self-identified users with SZ on Reddit.
我们在Reddit论坛上探讨了精神分裂症的语言指标。精神分裂症(SZ)是一种影响人的思想和行为的慢性精神障碍。由于SZ相对罕见,影响了大约1%的美国人口,并且患有SZ的人通常认为他们没有这种疾病,因此识别和检测SZ的迹象是困难的。语言异常是SZ的标志,许多疾病的症状都是通过语言表现出来的。在本文中,我们利用来自社交媒体的大量数据,并使用统计和机器学习方法来研究SZ的语言特征。我们收集并分析了大量来自声称接受了SZ正式诊断的用户的Reddit帖子,并确定了将这些用户与对照组(CTL)区分开来的几个语言特征。我们将这些结果与社交媒体语言分析和SZ的其他发现进行了比较。我们还开发了一个机器学习分类器来自动识别Reddit上带有SZ的自我识别用户。
{"title":"Linguistic Analysis of Schizophrenia in Reddit Posts","authors":"Jonathan Zomick, Sarah Ita Levitan, M. Serper","doi":"10.18653/v1/W19-3009","DOIUrl":"https://doi.org/10.18653/v1/W19-3009","url":null,"abstract":"We explore linguistic indicators of schizophrenia in Reddit discussion forums. Schizophrenia (SZ) is a chronic mental disorder that affects a person’s thoughts and behaviors. Identifying and detecting signs of SZ is difficult given that SZ is relatively uncommon, affecting approximately 1% of the US population, and people suffering with SZ often believe that they do not have the disorder. Linguistic abnormalities are a hallmark of SZ and many of the illness’s symptoms are manifested through language. In this paper we leverage the vast amount of data available from social media and use statistical and machine learning approaches to study linguistic characteristics of SZ. We collected and analyzed a large corpus of Reddit posts from users claiming to have received a formal diagnosis of SZ and identified several linguistic features that differentiated these users from a control (CTL) group. We compared these results to other findings on social media linguistic analysis and SZ. We also developed a machine learning classifier to automatically identify self-identified users with SZ on Reddit.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116139865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Depressed Individuals Use Negative Self-Focused Language When Recalling Recent Interactions with Close Romantic Partners but Not Family or Friends 抑郁的人在回忆最近与亲密恋人的互动时,会使用消极的自我关注语言,而不是家人或朋友
Taleen Nalabandian, Molly Ireland
Depression is characterized by a self-focused negative attentional bias, which is often reflected in everyday language use. In a prospective writing study, we explored whether the association between depressive symptoms and negative, self-focused language varies across social contexts. College students (N = 243) wrote about a recent interaction with a person they care deeply about. Depression symptoms positively correlated with negative emotion words and first-person singular pronouns (or negative self-focus) when writing about a recent interaction with romantic partners or, to a lesser extent, friends, but not family members. The pattern of results was more pronounced when participants perceived greater self-other overlap (i.e., interpersonal closeness) with their romantic partner. Findings regarding how the linguistic profile of depression differs by type of relationship may inform more effective methods of clinical diagnosis and treatment.
抑郁症以自我为中心的消极注意偏差为特征,这通常反映在日常语言使用中。在一项前瞻性写作研究中,我们探讨了抑郁症状与消极的、自我关注的语言之间的联系是否在不同的社会背景下有所不同。大学生(N = 243)写下了最近与他们深爱的人的互动。在描述最近与恋人的互动时,抑郁症状与消极情绪词汇和第一人称单数代词(或消极的自我关注)呈正相关,在较小程度上,与朋友有关,但与家庭成员无关。当参与者与他们的浪漫伴侣有更大的自我-他人重叠(即人际关系亲密)时,结果的模式更为明显。关于抑郁症的语言特征如何因关系类型而不同的研究结果可能会为更有效的临床诊断和治疗方法提供信息。
{"title":"Depressed Individuals Use Negative Self-Focused Language When Recalling Recent Interactions with Close Romantic Partners but Not Family or Friends","authors":"Taleen Nalabandian, Molly Ireland","doi":"10.18653/v1/W19-3008","DOIUrl":"https://doi.org/10.18653/v1/W19-3008","url":null,"abstract":"Depression is characterized by a self-focused negative attentional bias, which is often reflected in everyday language use. In a prospective writing study, we explored whether the association between depressive symptoms and negative, self-focused language varies across social contexts. College students (N = 243) wrote about a recent interaction with a person they care deeply about. Depression symptoms positively correlated with negative emotion words and first-person singular pronouns (or negative self-focus) when writing about a recent interaction with romantic partners or, to a lesser extent, friends, but not family members. The pattern of results was more pronounced when participants perceived greater self-other overlap (i.e., interpersonal closeness) with their romantic partner. Findings regarding how the linguistic profile of depression differs by type of relationship may inform more effective methods of clinical diagnosis and treatment.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114602731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
CLaC at CLPsych 2019: Fusion of Neural Features and Predicted Class Probabilities for Suicide Risk Assessment Based on Online Posts 基于网络帖子的自杀风险评估的神经特征融合和预测类别概率
Elham Mohammadi, Hessam Amini, Leila Kosseim
This paper summarizes our participation to the CLPsych 2019 shared task, under the name CLaC. The goal of the shared task was to detect and assess suicide risk based on a collection of online posts. For our participation, we used an ensemble method which utilizes 8 neural sub-models to extract neural features and predict class probabilities, which are then used by an SVM classifier. Our team ranked first in 2 out of the 3 tasks (tasks A and C).
本文总结了我们参与CLPsych 2019共享任务(CLaC)的情况。共享任务的目标是根据网上帖子的集合来检测和评估自杀风险。对于我们的参与,我们使用了一种集成方法,该方法利用8个神经子模型来提取神经特征并预测类别概率,然后由SVM分类器使用。我们团队在3个任务中有2个(任务A和C)获得了第一名。
{"title":"CLaC at CLPsych 2019: Fusion of Neural Features and Predicted Class Probabilities for Suicide Risk Assessment Based on Online Posts","authors":"Elham Mohammadi, Hessam Amini, Leila Kosseim","doi":"10.18653/v1/W19-3004","DOIUrl":"https://doi.org/10.18653/v1/W19-3004","url":null,"abstract":"This paper summarizes our participation to the CLPsych 2019 shared task, under the name CLaC. The goal of the shared task was to detect and assess suicide risk based on a collection of online posts. For our participation, we used an ensemble method which utilizes 8 neural sub-models to extract neural features and predict class probabilities, which are then used by an SVM classifier. Our team ranked first in 2 out of the 3 tasks (tasks A and C).","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121272328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
ConvSent at CLPsych 2019 Task A: Using Post-level Sentiment Features for Suicide Risk Prediction on Reddit 在Reddit上使用后级情感特征进行自杀风险预测
Kristen Allen, Shrey Bagroy, Alexander L Davis, T. Krishnamurti
This work aims to infer mental health status from public text for early detection of suicide risk. It contributes to Shared Task A in the 2019 CLPsych workshop by predicting users’ suicide risk given posts in the Reddit subforum r/SuicideWatch. We use a convolutional neural network to incorporate LIWC information at the Reddit post level about topics discussed, first-person focus, emotional experience, grammatical choices, and thematic style. In sorting users into one of four risk categories, our best system’s macro-averaged F1 score was 0.50 on the withheld test set. The work demonstrates the predictive power of the Linguistic Inquiry and Word Count dictionary, in conjunction with a convolutional network and holistic consideration of each post and user.
本研究旨在从公共文本中推断心理健康状况,以便早期发现自杀风险。它通过预测Reddit子论坛r/SuicideWatch上用户的自杀风险,为2019年CLPsych研讨会的共享任务A做出贡献。我们使用卷积神经网络来整合Reddit帖子级别的LIWC信息,包括讨论的主题、第一人称焦点、情感体验、语法选择和主题风格。在将用户分为四个风险类别时,我们的最佳系统在保留测试集上的宏观平均F1分数为0.50。这项工作展示了语言调查和单词计数词典的预测能力,结合卷积网络和对每个帖子和用户的整体考虑。
{"title":"ConvSent at CLPsych 2019 Task A: Using Post-level Sentiment Features for Suicide Risk Prediction on Reddit","authors":"Kristen Allen, Shrey Bagroy, Alexander L Davis, T. Krishnamurti","doi":"10.18653/v1/W19-3024","DOIUrl":"https://doi.org/10.18653/v1/W19-3024","url":null,"abstract":"This work aims to infer mental health status from public text for early detection of suicide risk. It contributes to Shared Task A in the 2019 CLPsych workshop by predicting users’ suicide risk given posts in the Reddit subforum r/SuicideWatch. We use a convolutional neural network to incorporate LIWC information at the Reddit post level about topics discussed, first-person focus, emotional experience, grammatical choices, and thematic style. In sorting users into one of four risk categories, our best system’s macro-averaged F1 score was 0.50 on the withheld test set. The work demonstrates the predictive power of the Linguistic Inquiry and Word Count dictionary, in conjunction with a convolutional network and holistic consideration of each post and user.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115527633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The importance of sharing patient-generated clinical speech and language data 共享患者生成的临床语音和语言数据的重要性
Kathleen C. Fraser, N. Linz, Hali Lindsay, A. König
Increased access to large datasets has driven progress in NLP. However, most computational studies of clinically-validated, patient-generated speech and language involve very few datapoints, as such data are difficult (and expensive) to collect. In this position paper, we argue that we must find ways to promote data sharing across research groups, in order to build datasets of a more appropriate size for NLP and machine learning analysis. We review the benefits and challenges of sharing clinical language data, and suggest several concrete actions by both clinical and NLP researchers to encourage multi-site and multi-disciplinary data sharing. We also propose the creation of a collaborative data sharing platform, to allow NLP researchers to take a more active responsibility for data transcription, annotation, and curation.
增加对大型数据集的访问推动了NLP的进步。然而,大多数临床验证的、患者生成的语音和语言的计算研究涉及很少的数据点,因为这些数据很难(而且昂贵)收集。在这篇立场文件中,我们认为我们必须找到促进研究小组之间数据共享的方法,以便为NLP和机器学习分析构建更合适规模的数据集。我们回顾了共享临床语言数据的好处和挑战,并建议临床和NLP研究人员采取一些具体行动来鼓励多地点和多学科数据共享。我们还建议创建一个协作数据共享平台,使NLP研究人员能够更积极地承担数据转录、注释和管理的责任。
{"title":"The importance of sharing patient-generated clinical speech and language data","authors":"Kathleen C. Fraser, N. Linz, Hali Lindsay, A. König","doi":"10.18653/v1/W19-3007","DOIUrl":"https://doi.org/10.18653/v1/W19-3007","url":null,"abstract":"Increased access to large datasets has driven progress in NLP. However, most computational studies of clinically-validated, patient-generated speech and language involve very few datapoints, as such data are difficult (and expensive) to collect. In this position paper, we argue that we must find ways to promote data sharing across research groups, in order to build datasets of a more appropriate size for NLP and machine learning analysis. We review the benefits and challenges of sharing clinical language data, and suggest several concrete actions by both clinical and NLP researchers to encourage multi-site and multi-disciplinary data sharing. We also propose the creation of a collaborative data sharing platform, to allow NLP researchers to take a more active responsibility for data transcription, annotation, and curation.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126916020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Suicide Risk Assessment with Multi-level Dual-Context Language and BERT 基于多层次双语境语言和BERT的自杀风险评估
Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy-Hien Vu, Mohammadzaman Zamani, Parth Limbachiya, Sharath Chandra Guntuku, H. A. Schwartz
Mental health predictive systems typically model language as if from a single context (e.g. Twitter posts, status updates, or forum posts) and often limited to a single level of analysis (e.g. either the message-level or user-level). Here, we bring these pieces together to explore the use of open-vocabulary (BERT embeddings, topics) and theoretical features (emotional expression lexica, personality) for the task of suicide risk assessment on support forums (the CLPsych-2019 Shared Task). We used dual context based approaches (modeling content from suicide forums separate from other content), built over both traditional ML models as well as a novel dual RNN architecture with user-factor adaptation. We find that while affect from the suicide context distinguishes with no-risk from those with “any-risk”, personality factors from the non-suicide contexts provide distinction of the levels of risk: low, medium, and high risk. Within the shared task, our dual-context approach (listed as SBU-HLAB in the official results) achieved state-of-the-art performance predicting suicide risk using a combination of suicide-context and non-suicide posts (Task B), achieving an F1 score of 0.50 over hidden test set labels.
心理健康预测系统通常从单一上下文(例如Twitter帖子、状态更新或论坛帖子)对语言进行建模,并且通常仅限于单一分析级别(例如消息级别或用户级别)。在这里,我们将这些片段结合在一起,探索在支持论坛(CLPsych-2019共享任务)上使用开放词汇(BERT嵌入、主题)和理论特征(情绪表达词汇、个性)进行自杀风险评估的任务。我们使用了基于双重上下文的方法(将自杀论坛的内容与其他内容分开建模),建立在传统的ML模型和具有用户因素适应性的新型双重RNN架构之上。我们发现,虽然来自自杀情境的影响将无风险者与有“任何风险”者区分开来,但来自非自杀情境的人格因素将风险水平区分为:低、中、高风险。在共享任务中,我们的双上下文方法(在官方结果中被列为SBU-HLAB)使用自杀上下文和非自杀帖子(任务B)的组合实现了最先进的预测自杀风险的性能,比隐藏测试集标签获得了0.50的F1分数。
{"title":"Suicide Risk Assessment with Multi-level Dual-Context Language and BERT","authors":"Matthew Matero, Akash Idnani, Youngseo Son, Salvatore Giorgi, Huy-Hien Vu, Mohammadzaman Zamani, Parth Limbachiya, Sharath Chandra Guntuku, H. A. Schwartz","doi":"10.18653/v1/W19-3005","DOIUrl":"https://doi.org/10.18653/v1/W19-3005","url":null,"abstract":"Mental health predictive systems typically model language as if from a single context (e.g. Twitter posts, status updates, or forum posts) and often limited to a single level of analysis (e.g. either the message-level or user-level). Here, we bring these pieces together to explore the use of open-vocabulary (BERT embeddings, topics) and theoretical features (emotional expression lexica, personality) for the task of suicide risk assessment on support forums (the CLPsych-2019 Shared Task). We used dual context based approaches (modeling content from suicide forums separate from other content), built over both traditional ML models as well as a novel dual RNN architecture with user-factor adaptation. We find that while affect from the suicide context distinguishes with no-risk from those with “any-risk”, personality factors from the non-suicide contexts provide distinction of the levels of risk: low, medium, and high risk. Within the shared task, our dual-context approach (listed as SBU-HLAB in the official results) achieved state-of-the-art performance predicting suicide risk using a combination of suicide-context and non-suicide posts (Task B), achieving an F1 score of 0.50 over hidden test set labels.","PeriodicalId":201097,"journal":{"name":"Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121825743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
期刊
Proceedings of the Sixth Workshop on Computational Linguistics and Clinical Psychology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1