Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting最新文献

Towards Reducing Diagnostic Errors with Interpretable Risk Prediction. 通过可解释的风险预测减少诊断错误。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2024-06-01 DOI: 10.18653/v1/2024.naacl-long.399

Denis Jered McInerney, William Dickinson, Lucy C Flynn, Andrea C Young, Geoffrey S Young, Jan-Willem van de Meent, Byron C Wallace

Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual "true" diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.

由于临床医生无法轻松获取患者电子健康记录（EHR）中的相关信息，因此出现了许多诊断错误。在这项工作中，我们提出了一种方法，利用 LLMs 来识别病人电子健康记录数据中表明特定诊断风险增加或减少的证据片段；我们的最终目的是增加对证据的获取，减少诊断错误。特别是，我们提出了一种神经相加模型，在临床医生仍不确定的时间点上，以证据为支持，做出个性化的风险估计预测，目的是特别减少因不完全鉴别而导致的诊断延误和错误。要训练这样一个模型，就必须推断出最终 "真实 "诊断的时间细粒度回溯标签。我们通过 LLM 来实现这一目标，以确保在做出可靠诊断之前，输入的文本是真实的。我们使用 LLM 检索初始证据库，然后根据模型学习到的相关性完善这组证据。我们通过模拟临床医生如何使用我们的方法在预定义的鉴别诊断列表中做出决定，对我们的方法的实用性进行了深入评估。

{"title":"Towards Reducing Diagnostic Errors with Interpretable Risk Prediction.","authors":"Denis Jered McInerney, William Dickinson, Lucy C Flynn, Andrea C Young, Geoffrey S Young, Jan-Willem van de Meent, Byron C Wallace","doi":"10.18653/v1/2024.naacl-long.399","DOIUrl":"10.18653/v1/2024.naacl-long.399","url":null,"abstract":"Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual \"true\" diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"7193-7210"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ODD: A Benchmark Dataset for the Natural Language Processing Based Opioid Related Aberrant Behavior Detection. ODD：基于自然语言处理的阿片类药物相关异常行为检测基准数据集。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2024-06-01

Sunjae Kwon, Xun Wang, Weisong Liu, Emily Druhl, Minhee L Sung, Joel I Reisman, Wenjun Li, Robert D Kerns, William Becker, Hong Yu

Opioid related aberrant behaviors (ORABs) present novel risk factors for opioid overdose. This paper introduces a novel biomedical natural language processing benchmark dataset named ODD, for ORAB Detection Dataset. ODD is an expert-annotated dataset designed to identify ORABs from patients' EHR notes and classify them into nine categories; 1) Confirmed Aberrant Behavior, 2) Suggested Aberrant Behavior, 3) Opioids, 4) Indication, 5) Diagnosed opioid dependency, 6) Benzodiazepines, 7) Medication Changes, 8) Central Nervous System-related, and 9) Social Determinants of Health. We explored two state-of-the-art natural language processing models (fine-tuning and prompt-tuning approaches) to identify ORAB. Experimental results show that the prompt-tuning models outperformed the fine-tuning models in most categories and the gains were especially higher among uncommon categories (Suggested Aberrant Behavior, Confirmed Aberrant Behaviors, Diagnosed Opioid Dependence, and Medication Change). Although the best model achieved the highest 88.17% on macro average area under precision recall curve, uncommon classes still have a large room for performance improvement. ODD is publicly available.

阿片类药物相关异常行为（ORAB）是阿片类药物过量的新型风险因素。本文介绍了一种新的生物医学自然语言处理基准数据集，名为 ODD（ORAB Detection Dataset）。ODD 是一个由专家注释的数据集，旨在从患者的电子病历记录中识别 ORAB，并将其分为九类：1) 已确认的异常行为，2) 建议的异常行为，3) 阿片类药物，4) 适应症，5) 已诊断的阿片类药物依赖，6) 苯二氮卓类药物，7) 药物变化，8) 中枢神经系统相关，9) 阿片类药物过量。中枢神经系统相关，以及 9) 健康的社会决定因素。我们探索了两种最先进的自然语言处理模型（微调法和提示调整法）来识别 ORAB。实验结果表明，在大多数类别中，提示调整模型的表现优于微调模型，尤其是在不常见的类别（建议的异常行为、确认的异常行为、确诊的阿片类药物依赖和用药改变）中，提示调整模型的收益更高。虽然最佳模型在精确度召回曲线下的宏观平均面积上达到了最高的 88.17%，但不常见类别的性能仍有很大的提升空间。ODD 已公开发布。

{"title":"ODD: A Benchmark Dataset for the Natural Language Processing Based Opioid Related Aberrant Behavior Detection.","authors":"Sunjae Kwon, Xun Wang, Weisong Liu, Emily Druhl, Minhee L Sung, Joel I Reisman, Wenjun Li, Robert D Kerns, William Becker, Hong Yu","doi":"","DOIUrl":"","url":null,"abstract":"Opioid related aberrant behaviors (ORABs) present novel risk factors for opioid overdose. This paper introduces a novel biomedical natural language processing benchmark dataset named ODD, for ORAB Detection Dataset. ODD is an expert-annotated dataset designed to identify ORABs from patients' EHR notes and classify them into nine categories; 1) Confirmed Aberrant Behavior, 2) Suggested Aberrant Behavior, 3) Opioids, 4) Indication, 5) Diagnosed opioid dependency, 6) Benzodiazepines, 7) Medication Changes, 8) Central Nervous System-related, and 9) Social Determinants of Health. We explored two state-of-the-art natural language processing models (fine-tuning and prompt-tuning approaches) to identify ORAB. Experimental results show that the prompt-tuning models outperformed the fine-tuning models in most categories and the gains were especially higher among uncommon categories (Suggested Aberrant Behavior, Confirmed Aberrant Behaviors, Diagnosed Opioid Dependence, and Medication Change). Although the best model achieved the highest 88.17% on macro average area under precision recall curve, uncommon classes still have a large room for performance improvement. ODD is publicly available.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"4338-4359"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Personalized Jargon Identification for Enhanced Interdisciplinary Communication.

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2024-06-01 DOI: 10.18653/v1/2024.naacl-long.255

Yue Guo, Joseph Chee Chang, Maria Antoniak, Erin Bransom, Trevor Cohen, Lucy Lu Wang, Tal August

Scientific jargon can confuse researchers when they read materials from other domains. Identifying and translating jargon for individual researchers could speed up research, but current methods of jargon identification mainly use corpus-level familiarity indicators rather than modeling researcher-specific needs, which can vary greatly based on each researcher's background. We collect a dataset of over 10K term familiarity annotations from 11 computer science researchers for terms drawn from 100 paper abstracts. Analysis of this data reveals that jargon familiarity and information needs vary widely across annotators, even within the same sub-domain (e.g., NLP). We investigate features representing domain, subdomain, and individual knowledge to predict individual jargon familiarity. We compare supervised and prompt-based approaches, finding that prompt-based methods using information about the individual researcher (e.g., personal publications, self-defined subfield of research) yield the highest accuracy, though the task remains difficult and supervised approaches have lower false positive rates. This research offers insights into features and methods for the novel task of integrating personal data into scientific jargon identification.

{"title":"Personalized Jargon Identification for Enhanced Interdisciplinary Communication.","authors":"Yue Guo, Joseph Chee Chang, Maria Antoniak, Erin Bransom, Trevor Cohen, Lucy Lu Wang, Tal August","doi":"10.18653/v1/2024.naacl-long.255","DOIUrl":"10.18653/v1/2024.naacl-long.255","url":null,"abstract":"Scientific jargon can confuse researchers when they read materials from other domains. Identifying and translating jargon for individual researchers could speed up research, but current methods of jargon identification mainly use corpus-level familiarity indicators rather than modeling researcher-specific needs, which can vary greatly based on each researcher's background. We collect a dataset of over 10K term familiarity annotations from 11 computer science researchers for terms drawn from 100 paper abstracts. Analysis of this data reveals that jargon familiarity and information needs vary widely across annotators, even within the same sub-domain (e.g., NLP). We investigate features representing domain, subdomain, and individual knowledge to predict individual jargon familiarity. We compare supervised and prompt-based approaches, finding that prompt-based methods using information about the individual researcher (e.g., personal publications, self-defined subfield of research) yield the highest accuracy, though the task remains difficult and supervised approaches have lower false positive rates. This research offers insights into features and methods for the novel task of integrating personal data into scientific jargon identification.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"4535-4550"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11801132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143366228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ScAN: Suicide Attempt and Ideation Events Dataset. 扫描:自杀企图和构思事件数据集。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2022-07-01 DOI: 10.18653/v1/2022.naacl-main.75

Bhanu Pratap Singh Rawat, Samuel Kovaly, Wilfred R Pigeon, Hong Yu

Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retreiver), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.

自杀是一个重要的公共卫生问题，也是全世界死亡的主要原因之一。自杀行为，包括自杀企图(SA)和自杀意念(SI)，是自杀死亡的主要危险因素。与患者以前和目前的SA和SI相关的信息经常记录在电子健康记录(EHR)笔记中。准确地发现这些记录可能有助于改善对患者自杀行为的监测和预测，并提醒医疗专业人员采取预防自杀的措施。在这项研究中，我们首先建立了自杀企图和构思事件(ScAN)数据集，这是公开可用的MIMIC III数据集的一个子集，涵盖了超过12k+ EHR笔记，其中包含19k+注释的SA和SI事件信息。注释还包含自杀企图方法等属性。我们还提供了一个强大的基线模型ScANER (Suicide Attempt and Ideation Events retrever)，一个基于roberta的多任务模型，该模型具有检索模块，用于从住院患者的电子病历记录中提取所有相关的自杀行为证据，以及一个预测模块，用于识别患者住院期间发生的自杀行为类型(SA和SI)。ScANER在识别自杀行为证据方面的宏观加权f1得分为0.83，在患者住院期间的SA和SI分类方面的宏观加权f1得分分别为0.78和0.60。ScAN和ScANER是公开可用的。

{"title":"ScAN: Suicide Attempt and Ideation Events Dataset.","authors":"Bhanu Pratap Singh Rawat, Samuel Kovaly, Wilfred R Pigeon, Hong Yu","doi":"10.18653/v1/2022.naacl-main.75","DOIUrl":"https://doi.org/10.18653/v1/2022.naacl-main.75","url":null,"abstract":"Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retreiver), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2022 ","pages":"1029-1040"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9958515/pdf/nihms-1875183.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9423903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ScAN: Suicide Attempt and Ideation Events Dataset 扫描:自杀企图和构思事件数据集

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2022-05-12 DOI: 10.48550/arXiv.2205.07872

Bhanu Pratap Singh Rawat, Samuel Kovaly, W. Pigeon, Hong-ye Yu

Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients’ previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients’ suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient’s stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient’s hospital-stay, respectively. ScAN and ScANER are publicly available.

自杀是一个重要的公共卫生问题，也是全世界死亡的主要原因之一。自杀行为，包括自杀企图(SA)和自杀意念(SI)，是自杀死亡的主要危险因素。与患者以前和目前的SA和SI相关的信息经常记录在电子健康记录(EHR)笔记中。准确地发现这些记录可能有助于改善对患者自杀行为的监测和预测，并提醒医疗专业人员采取预防自杀的措施。在这项研究中，我们首先建立了自杀企图和构思事件(ScAN)数据集，这是公开可用的MIMIC III数据集的一个子集，涵盖了超过12k+ EHR笔记，其中包含19k+注释的SA和SI事件信息。注释还包含自杀企图方法等属性。我们还提供了一个强大的基线模型ScANER(自杀企图和意念事件检索器)，一个基于roberta的多任务模型，该模型具有检索模块，用于从住院患者的电子病历记录中提取所有相关的自杀行为证据，以及一个预测模块，用于识别患者住院期间发生的自杀行为类型(SA和SI)。ScANER在识别自杀行为证据方面的宏观加权f1得分为0.83，在患者住院期间的SA和SI分类方面的宏观加权f1得分分别为0.78和0.60。ScAN和ScANER是公开可用的。

{"title":"ScAN: Suicide Attempt and Ideation Events Dataset","authors":"Bhanu Pratap Singh Rawat, Samuel Kovaly, W. Pigeon, Hong-ye Yu","doi":"10.48550/arXiv.2205.07872","DOIUrl":"https://doi.org/10.48550/arXiv.2205.07872","url":null,"abstract":"Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients’ previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients’ suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient’s stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient’s hospital-stay, respectively. ScAN and ScANER are publicly available.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"17 1","pages":"1029-1040"},"PeriodicalIF":0.0,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78256254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Analysis of Behavior Classification in Motivational Interviewing. 动机访谈中的行为分类分析。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2021-06-01 DOI: 10.18653/v1/2021.clpsych-1.13

Leili Tavabi, Trang Tran, Kalin Stefanov, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani

Analysis of client and therapist behavior in counseling sessions can provide helpful insights for assessing the quality of the session and consequently, the client's behavioral outcome. In this paper, we study the automatic classification of standardized behavior codes (i.e. annotations) used for assessment of psychotherapy sessions in Motivational Interviewing (MI). We develop models and examine the classification of client behaviors throughout MI sessions, comparing the performance by models trained on large pretrained embeddings (RoBERTa) versus interpretable and expert-selected features (LIWC). Our best performing model using the pretrained RoBERTa embeddings beats the baseline model, achieving an F1 score of 0.66 in the subject-independent 3-class classification. Through statistical analysis on the classification results, we identify prominent LIWC features that may not have been captured by the model using pretrained embeddings. Although classification using LIWC features underperforms RoBERTa, our findings motivate the future direction of incorporating auxiliary tasks in the classification of MI codes.

对客户和治疗师在心理咨询过程中的行为进行分析，有助于评估咨询过程的质量，进而评估客户的行为结果。在本文中，我们研究了标准化行为代码（即注释）的自动分类，用于评估动机访谈法（MI）中的心理治疗过程。我们开发了模型并研究了客户在整个动机访谈过程中的行为分类，比较了在大型预训练嵌入（RoBERTa）和可解释及专家选择特征（LIWC）上训练的模型的性能。我们使用预训练的 RoBERTa 嵌入的最佳表现模型击败了基线模型，在与主体无关的三类分类中取得了 0.66 的 F1 分数。通过对分类结果的统计分析，我们发现了使用预训练嵌入的模型可能没有捕捉到的突出的 LIWC 特征。虽然使用 LIWC 特征进行分类的结果不如 RoBERTa，但我们的研究结果为在 MI 代码分类中加入辅助任务提供了新的方向。

{"title":"Analysis of Behavior Classification in Motivational Interviewing.","authors":"Leili Tavabi, Trang Tran, Kalin Stefanov, Brian Borsari, Joshua D Woolley, Stefan Scherer, Mohammad Soleymani","doi":"10.18653/v1/2021.clpsych-1.13","DOIUrl":"10.18653/v1/2021.clpsych-1.13","url":null,"abstract":"Analysis of client and therapist behavior in counseling sessions can provide helpful insights for assessing the quality of the session and consequently, the client's behavioral outcome. In this paper, we study the automatic classification of standardized behavior codes (i.e. annotations) used for assessment of psychotherapy sessions in Motivational Interviewing (MI). We develop models and examine the classification of client behaviors throughout MI sessions, comparing the performance by models trained on large pretrained embeddings (RoBERTa) versus interpretable and expert-selected features (LIWC). Our best performing model using the pretrained RoBERTa embeddings beats the baseline model, achieving an F1 score of 0.66 in the subject-independent 3-class classification. Through statistical analysis on the classification results, we identify prominent LIWC features that may not have been captured by the model using pretrained embeddings. Although classification using LIWC features underperforms RoBERTa, our findings motivate the future direction of incorporating auxiliary tasks in the classification of MI codes.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"110-115"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321779/pdf/nihms-1727153.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39266882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora. TextEssence:语料库间语义转换交互分析工具。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2021-06-01

Denis Newman-Griffis, Venkatesh Sivaraman, Adam Perer, Eric Fosler-Lussier, Harry Hochheiser

Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce TextEssence, an interactive system designed to enable comparative analysis of corpora using embeddings. TextEssence includes visual, neighbor-based, and similarity-based modes of embedding analysis in a lightweight, web-based interface. We further propose a new measure of embedding confidence based on nearest neighborhood overlap, to assist in identifying high-quality embeddings for corpus analysis. A case study on COVID-19 scientific literature illustrates the utility of the system. TextEssence can be found at https://textessence.github.io.

词和概念的嵌入捕捉语言的句法和语义规律;然而，他们认为，作为研究不同语料库的特征以及它们之间如何相互关联的工具，它们的作用有限。我们介绍TextEssence，这是一个交互式系统，旨在使用嵌入对语料库进行比较分析。TextEssence包括可视化的、基于邻居的和基于相似度的嵌入分析模式，在一个轻量级的、基于web的界面中。我们进一步提出了一种新的基于最近邻重叠的嵌入置信度度量，以帮助识别用于语料分析的高质量嵌入。一项关于COVID-19科学文献的案例研究说明了该系统的实用性。TextEssence可以在https://textessence.github.io上找到。

引用次数: 0

Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality. 对人类水平NLP的预训练变压器的经验评价:样本大小和维度的作用。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2021-06-01 DOI: 10.18653/v1/2021.naacl-main.357

Adithya V Ganesan, Matthew Matero, Aravind Reddy Ravula, Huy Vu, H Andrew Schwartz

In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just $\frac{1}{12}$ of the embedding dimensions.

在人类级别的NLP任务中，例如预测心理健康、个性或人口统计，观察的数量通常小于现代基于转换器的语言模型中每层的标准768+隐藏状态大小，限制了有效利用转换器的能力。在这里，我们对降维方法(主成分分析、因子分解技术或多层自编码器)以及嵌入向量的维数和样本量作为预测性能的函数的作用进行了系统的研究。我们首先发现，对数据量有限的大型模型进行微调会带来很大的困难，这可以通过预训练的降维机制来克服。RoBERTa在人类级别的任务中始终实现最佳性能，PCA在更好地处理编写较长文本的用户方面比其他简化方法更有优势。最后，我们观察到大多数任务获得的结果与仅使用12个嵌入维度的最佳性能相当。

{"title":"Empirical Evaluation of Pre-trained Transformers for Human-Level NLP: The Role of Sample Size and Dimensionality.","authors":"Adithya V Ganesan, Matthew Matero, Aravind Reddy Ravula, Huy Vu, H Andrew Schwartz","doi":"10.18653/v1/2021.naacl-main.357","DOIUrl":"https://doi.org/10.18653/v1/2021.naacl-main.357","url":null,"abstract":"In human-level NLP tasks, such as predicting mental health, personality, or demographics, the number of observations is often smaller than the standard 768+ hidden state sizes of each layer within modern transformer-based language models, limiting the ability to effectively leverage transformers. Here, we provide a systematic study on the role of dimension reduction methods (principal components analysis, factorization techniques, or multi-layer auto-encoders) as well as the dimensionality of embedding vectors and sample sizes as a function of predictive performance. We first find that fine-tuning large models with a limited amount of data pose a significant difficulty which can be overcome with a pre-trained dimension reduction regime. RoBERTa consistently achieves top performance in human-level tasks, with PCA giving benefit over other reduction methods in better handling users that write longer texts. Finally, we observe that a majority of the tasks achieve results comparable to the best performance with just <math> <mrow><mfrac><mn>1</mn> <mrow><mn>12</mn></mrow> </mfrac> </mrow> </math> of the embedding dimensions.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"4515-4532"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8294338/pdf/nihms-1716243.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39215546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research. 翻译NLP:自然语言处理研究的新范式和一般原则。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2021-06-01

Denis Newman-Griffis, Jill Fain Lehman, Carolyn Rosé, Harry Hochheiser

Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.

自然语言处理(NLP)研究通过基础科学将对普遍原理的研究与针对特定用例和设置的应用科学相结合。然而，基础NLP和应用之间的交流过程通常被认为是自然出现的，导致许多创新没有得到应用，许多重要问题没有得到研究。我们描述了一个翻译型NLP的新范式，其目的是构建和促进基础和应用NLP研究相互告知的过程。因此，翻译NLP提出了第三种研究范式，重点是理解应用需求带来的挑战，以及这些挑战如何推动基础科学和技术设计的创新。我们表明，NLP研究的许多重大进展都是从基本原则与应用需求的交叉中出现的，并提出了一个概念框架，概述了转化研究中的利益相关者和关键问题。我们的框架为将翻译型自然语言处理发展为一个专门的研究领域提供了路线图，并确定了一般的翻译原则，以促进基础研究和应用研究之间的交流。

{"title":"Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research.","authors":"Denis Newman-Griffis, Jill Fain Lehman, Carolyn Rosé, Harry Hochheiser","doi":"","DOIUrl":"","url":null,"abstract":"Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2021 ","pages":"4125-4138"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8223521/pdf/nihms-1710048.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39115253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

What's in a Summary? Laying the Groundwork for Advances in Hospital-Course Summarization. 摘要有哪些内容？为医院课程总结的进步奠定基础。

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting

Pub Date : 2021-06-01 DOI: 10.18653/v1/2021.naacl-main.382

Griffin Adams, Emily Alsentzer, Mert Ketenci, Jason Zucker, Noémie Elhadad

Summarization of clinical narratives is a long-standing research problem. Here, we introduce the task of hospital-course summarization. Given the documentation authored throughout a patient's hospitalization, generate a paragraph that tells the story of the patient admission. We construct an English, text-to-text dataset of 109,000 hospitalizations (2M source notes) and their corresponding summary proxy: the clinician-authored "Brief Hospital Course" paragraph written as part of a discharge note. Exploratory analyses reveal that the BHC paragraphs are highly abstractive with some long extracted fragments; are concise yet comprehensive; differ in style and content organization from the source notes; exhibit minimal lexical cohesion; and represent silver-standard references. Our analysis identifies multiple implications for modeling this complex, multi-document summarization task.

临床叙述的总结是一个长期存在的研究问题。在此，我们介绍医院病程总结任务。给定病人住院期间撰写的文件，生成一个段落，讲述病人入院的故事。我们构建了一个英文文本到文本数据集，其中包含 109,000 个住院病例（200 万份原始病历）及其相应的摘要代理：临床医生撰写的 "简要住院病程 "段落，作为出院病历的一部分。探索性分析表明，"简要住院过程 "段落具有高度的抽象性，其中包含一些较长的提取片段；简洁而全面；在风格和内容组织方面与原始病历不同；表现出最低限度的词汇连贯性；并且代表了银标准参考文献。我们的分析为这一复杂的多文档摘要任务建模提供了多方面的启示。

引用次数: 0