首页 > 最新文献

Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting最新文献

英文 中文
Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm. 用语言捕捉人类的认知风格:一个实验性的评价范式。
Vasudha Varadarajan, Syeda Mahwish, Xiaoran Liu, Julia Buffolino, Christian C Luhmann, Ryan L Boyd, H Andrew Schwartz

While NLP models often seek to capture cognitive states via language, the validity of predicted states is determined by comparing them to annotations created without access the cognitive states of the authors. In behavioral sciences, cognitive states are instead measured via experiments. Here, we introduce an experiment-based framework for evaluating language-based cognitive style models against human behavior. We explore the phenomenon of decision making, and its relationship to the linguistic style of an individual talking about a recent decision they made. The participants then follow a classical decision-making experiment that captures their cognitive style, determined by how preferences change during a decision exercise. We find that language features, intended to capture cognitive style, can predict participants' decision style with moderate-to-high accuracy (AUC ~ 0.8), demonstrating that cognitive style can be partly captured and revealed by discourse patterns.

虽然NLP模型经常试图通过语言捕捉认知状态,但预测状态的有效性是通过将它们与没有访问作者认知状态的注释进行比较来确定的。在行为科学中,认知状态是通过实验来测量的。在这里,我们介绍了一个基于实验的框架来评估基于语言的认知风格模型与人类行为。我们探讨了决策现象,以及它与个人谈论他们最近做出的决定的语言风格的关系。然后,参与者进行了一个经典的决策实验,该实验捕捉了他们的认知风格,决定于他们在决策练习中偏好的变化。研究发现,意在捕捉认知风格的语言特征能够以中至高精度(AUC ~ 0.8)预测被试的决策风格,表明认知风格可以部分被话语模式捕捉和揭示。
{"title":"Capturing Human Cognitive Styles with Language: Towards an Experimental Evaluation Paradigm.","authors":"Vasudha Varadarajan, Syeda Mahwish, Xiaoran Liu, Julia Buffolino, Christian C Luhmann, Ryan L Boyd, H Andrew Schwartz","doi":"10.18653/v1/2025.naacl-short.81","DOIUrl":"10.18653/v1/2025.naacl-short.81","url":null,"abstract":"<p><p>While NLP models often seek to capture cognitive states via language, the validity of predicted states is determined by comparing them to annotations created without access the cognitive states of the authors. In behavioral sciences, cognitive states are instead measured via experiments. Here, we introduce an experiment-based framework for evaluating language-based cognitive style models against human behavior. We explore the phenomenon of decision making, and its relationship to the linguistic style of an individual talking about a recent decision they made. The participants then follow a classical decision-making experiment that captures their cognitive style, determined by how preferences change during a decision exercise. We find that language features, intended to capture cognitive style, can predict participants' decision style with moderate-to-high accuracy (AUC ~ 0.8), demonstrating that cognitive style can be partly captured and revealed by discourse patterns.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2025 ","pages":"966-979"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12483192/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145208569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Protein2Text: Resampling Mechanism to Translate Protein Sequences into Human-Interpretable Text. Protein2Text:将蛋白质序列翻译成人类可解释的文本的重新采样机制。
Ala Jararweh, Oladimeji Macaulay, David Arredondo, Yue Hu, Luis Tafoya, Kushal Virupakshappa, Avinash Sahu

Proteins play critical roles in biological systems, yet 99.7% of over 227 million known protein sequences remain uncharacterized due to the limitations of experimental methods. To assist experimentalists in narrowing down hypotheses and accelerating protein characterization, we present Protein2Text, a multimodal large language model that interprets protein sequences and generates informative text to address open-ended questions about protein functions and attributes. By integrating a resampling mechanism within an adapted LLaVA framework, our model effectively maps protein sequences into a language-compatible space, enhancing its capability to handle diverse and complex queries. Trained on a newly curated dataset derived from PubMed articles and rigorously evaluated using four comprehensive benchmarks-including in-domain and cross-domain evaluations-Protein2Text outperforms several existing models in open-ended question-answering tasks. Our work also highlights the limitations of current evaluation metrics applied to template-based approaches, which may lead to misleading results, emphasizing the need for unbiased assessment methods. Our model weights, evaluation datasets, and evaluation scripts are publicly available at https://github.com/alaaj27/Protein2Text.git.

蛋白质在生物系统中起着至关重要的作用,但由于实验方法的限制,超过2.27亿个已知蛋白质序列中有99.7%尚未表征。为了帮助实验人员缩小假设范围并加速蛋白质表征,我们提出了Protein2Text,这是一个多模态大语言模型,可以解释蛋白质序列并生成信息文本,以解决有关蛋白质功能和属性的开放式问题。通过在适应的LLaVA框架内集成重新采样机制,我们的模型有效地将蛋白质序列映射到语言兼容的空间中,增强了其处理多样化和复杂查询的能力。protein2text在一个来自PubMed文章的新整理的数据集上进行了训练,并使用四个综合基准进行了严格的评估——包括域内和跨域评估——在开放式问答任务中优于几个现有的模型。我们的工作还强调了应用于基于模板的方法的当前评估指标的局限性,这可能导致误导性的结果,强调了对公正评估方法的需求。我们的模型权重、评估数据集和评估脚本可以在https://github.com/alaaj27/Protein2Text.git上公开获得。
{"title":"Protein2Text: Resampling Mechanism to Translate Protein Sequences into Human-Interpretable Text.","authors":"Ala Jararweh, Oladimeji Macaulay, David Arredondo, Yue Hu, Luis Tafoya, Kushal Virupakshappa, Avinash Sahu","doi":"10.18653/v1/2025.naacl-industry.68","DOIUrl":"10.18653/v1/2025.naacl-industry.68","url":null,"abstract":"<p><p>Proteins play critical roles in biological systems, yet 99.7% of over 227 million known protein sequences remain uncharacterized due to the limitations of experimental methods. To assist experimentalists in narrowing down hypotheses and accelerating protein characterization, we present Protein2Text, a multimodal large language model that interprets protein sequences and generates informative text to address open-ended questions about protein functions and attributes. By integrating a resampling mechanism within an adapted LLaVA framework, our model effectively maps protein sequences into a language-compatible space, enhancing its capability to handle diverse and complex queries. Trained on a newly curated dataset derived from PubMed articles and rigorously evaluated using four comprehensive benchmarks-including in-domain and cross-domain evaluations-Protein2Text outperforms several existing models in open-ended question-answering tasks. Our work also highlights the limitations of current evaluation metrics applied to template-based approaches, which may lead to misleading results, emphasizing the need for unbiased assessment methods. Our model weights, evaluation datasets, and evaluation scripts are publicly available at https://github.com/alaaj27/Protein2Text.git.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2025 ","pages":"918-937"},"PeriodicalIF":0.0,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144692724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Reducing Diagnostic Errors with Interpretable Risk Prediction. 通过可解释的风险预测减少诊断错误。
Denis Jered McInerney, William Dickinson, Lucy C Flynn, Andrea C Young, Geoffrey S Young, Jan-Willem van de Meent, Byron C Wallace

Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual "true" diagnoses. We do so with LLMs, to ensure that the input text is from before a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.

由于临床医生无法轻松获取患者电子健康记录(EHR)中的相关信息,因此出现了许多诊断错误。在这项工作中,我们提出了一种方法,利用 LLMs 来识别病人电子健康记录数据中表明特定诊断风险增加或减少的证据片段;我们的最终目的是增加对证据的获取,减少诊断错误。特别是,我们提出了一种神经相加模型,在临床医生仍不确定的时间点上,以证据为支持,做出个性化的风险估计预测,目的是特别减少因不完全鉴别而导致的诊断延误和错误。要训练这样一个模型,就必须推断出最终 "真实 "诊断的时间细粒度回溯标签。我们通过 LLM 来实现这一目标,以确保在做出可靠诊断之前,输入的文本是真实的。我们使用 LLM 检索初始证据库,然后根据模型学习到的相关性完善这组证据。我们通过模拟临床医生如何使用我们的方法在预定义的鉴别诊断列表中做出决定,对我们的方法的实用性进行了深入评估。
{"title":"Towards Reducing Diagnostic Errors with Interpretable Risk Prediction.","authors":"Denis Jered McInerney, William Dickinson, Lucy C Flynn, Andrea C Young, Geoffrey S Young, Jan-Willem van de Meent, Byron C Wallace","doi":"10.18653/v1/2024.naacl-long.399","DOIUrl":"10.18653/v1/2024.naacl-long.399","url":null,"abstract":"<p><p>Many diagnostic errors occur because clinicians cannot easily access relevant information in patient Electronic Health Records (EHRs). In this work we propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses; our ultimate aim is to increase access to evidence and reduce diagnostic errors. In particular, we propose a Neural Additive Model to make predictions backed by evidence with individualized risk estimates at time-points where clinicians are still uncertain, aiming to specifically mitigate delays in diagnosis and errors stemming from an incomplete differential. To train such a model, it is necessary to infer temporally fine-grained retrospective labels of eventual \"true\" diagnoses. We do so with LLMs, to ensure that the input text is from <i>before</i> a confident diagnosis can be made. We use an LLM to retrieve an initial pool of evidence, but then refine this set of evidence according to correlations learned by the model. We conduct an in-depth evaluation of the usefulness of our approach by simulating how it might be used by a clinician to decide between a pre-defined list of differential diagnoses.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"7193-7210"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11501083/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142514333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ODD: A Benchmark Dataset for the Natural Language Processing Based Opioid Related Aberrant Behavior Detection. ODD:基于自然语言处理的阿片类药物相关异常行为检测基准数据集。
Sunjae Kwon, Xun Wang, Weisong Liu, Emily Druhl, Minhee L Sung, Joel I Reisman, Wenjun Li, Robert D Kerns, William Becker, Hong Yu

Opioid related aberrant behaviors (ORABs) present novel risk factors for opioid overdose. This paper introduces a novel biomedical natural language processing benchmark dataset named ODD, for ORAB Detection Dataset. ODD is an expert-annotated dataset designed to identify ORABs from patients' EHR notes and classify them into nine categories; 1) Confirmed Aberrant Behavior, 2) Suggested Aberrant Behavior, 3) Opioids, 4) Indication, 5) Diagnosed opioid dependency, 6) Benzodiazepines, 7) Medication Changes, 8) Central Nervous System-related, and 9) Social Determinants of Health. We explored two state-of-the-art natural language processing models (fine-tuning and prompt-tuning approaches) to identify ORAB. Experimental results show that the prompt-tuning models outperformed the fine-tuning models in most categories and the gains were especially higher among uncommon categories (Suggested Aberrant Behavior, Confirmed Aberrant Behaviors, Diagnosed Opioid Dependence, and Medication Change). Although the best model achieved the highest 88.17% on macro average area under precision recall curve, uncommon classes still have a large room for performance improvement. ODD is publicly available.

阿片类药物相关异常行为(ORAB)是阿片类药物过量的新型风险因素。本文介绍了一种新的生物医学自然语言处理基准数据集,名为 ODD(ORAB Detection Dataset)。ODD 是一个由专家注释的数据集,旨在从患者的电子病历记录中识别 ORAB,并将其分为九类:1) 已确认的异常行为,2) 建议的异常行为,3) 阿片类药物,4) 适应症,5) 已诊断的阿片类药物依赖,6) 苯二氮卓类药物,7) 药物变化,8) 中枢神经系统相关,9) 阿片类药物过量。中枢神经系统相关,以及 9) 健康的社会决定因素。我们探索了两种最先进的自然语言处理模型(微调法和提示调整法)来识别 ORAB。实验结果表明,在大多数类别中,提示调整模型的表现优于微调模型,尤其是在不常见的类别(建议的异常行为、确认的异常行为、确诊的阿片类药物依赖和用药改变)中,提示调整模型的收益更高。虽然最佳模型在精确度召回曲线下的宏观平均面积上达到了最高的 88.17%,但不常见类别的性能仍有很大的提升空间。ODD 已公开发布。
{"title":"ODD: A Benchmark Dataset for the Natural Language Processing Based Opioid Related Aberrant Behavior Detection.","authors":"Sunjae Kwon, Xun Wang, Weisong Liu, Emily Druhl, Minhee L Sung, Joel I Reisman, Wenjun Li, Robert D Kerns, William Becker, Hong Yu","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Opioid related aberrant behaviors (ORABs) present novel risk factors for opioid overdose. This paper introduces a novel biomedical natural language processing benchmark dataset named ODD, for ORAB Detection Dataset. ODD is an expert-annotated dataset designed to identify ORABs from patients' EHR notes and classify them into nine categories; 1) Confirmed Aberrant Behavior, 2) Suggested Aberrant Behavior, 3) Opioids, 4) Indication, 5) Diagnosed opioid dependency, 6) Benzodiazepines, 7) Medication Changes, 8) Central Nervous System-related, and 9) Social Determinants of Health. We explored two state-of-the-art natural language processing models (fine-tuning and prompt-tuning approaches) to identify ORAB. Experimental results show that the prompt-tuning models outperformed the fine-tuning models in most categories and the gains were especially higher among uncommon categories (Suggested Aberrant Behavior, Confirmed Aberrant Behaviors, Diagnosed Opioid Dependence, and Medication Change). Although the best model achieved the highest 88.17% on macro average area under precision recall curve, uncommon classes still have a large room for performance improvement. ODD is publicly available.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"4338-4359"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11368170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142121292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Personalized Jargon Identification for Enhanced Interdisciplinary Communication. 个性化术语识别促进跨学科交流。
Yue Guo, Joseph Chee Chang, Maria Antoniak, Erin Bransom, Trevor Cohen, Lucy Lu Wang, Tal August

Scientific jargon can confuse researchers when they read materials from other domains. Identifying and translating jargon for individual researchers could speed up research, but current methods of jargon identification mainly use corpus-level familiarity indicators rather than modeling researcher-specific needs, which can vary greatly based on each researcher's background. We collect a dataset of over 10K term familiarity annotations from 11 computer science researchers for terms drawn from 100 paper abstracts. Analysis of this data reveals that jargon familiarity and information needs vary widely across annotators, even within the same sub-domain (e.g., NLP). We investigate features representing domain, subdomain, and individual knowledge to predict individual jargon familiarity. We compare supervised and prompt-based approaches, finding that prompt-based methods using information about the individual researcher (e.g., personal publications, self-defined subfield of research) yield the highest accuracy, though the task remains difficult and supervised approaches have lower false positive rates. This research offers insights into features and methods for the novel task of integrating personal data into scientific jargon identification.

当研究人员阅读其他领域的材料时,科学术语会让他们感到困惑。为研究人员识别和翻译专业术语可以加快研究速度,但目前的专业术语识别方法主要使用语料库级别的熟悉度指标,而不是建模研究人员的特定需求,这可能会因研究人员的背景而有很大差异。我们收集了来自11位计算机科学研究人员的超过10K个术语熟悉度注释的数据集,这些术语来自100篇论文摘要。对这些数据的分析表明,不同的注释者对术语的熟悉程度和信息需求差异很大,甚至在同一子领域(例如,NLP)中也是如此。我们研究了代表领域、子领域和个人知识的特征,以预测个人术语的熟悉程度。我们比较了监督方法和基于提示的方法,发现基于提示的方法使用关于个体研究人员的信息(例如,个人出版物,自定义的研究子领域)产生最高的准确性,尽管任务仍然困难,监督方法的假阳性率较低。本研究为将个人数据整合到科学术语识别中的新任务提供了特征和方法。
{"title":"Personalized Jargon Identification for Enhanced Interdisciplinary Communication.","authors":"Yue Guo, Joseph Chee Chang, Maria Antoniak, Erin Bransom, Trevor Cohen, Lucy Lu Wang, Tal August","doi":"10.18653/v1/2024.naacl-long.255","DOIUrl":"10.18653/v1/2024.naacl-long.255","url":null,"abstract":"<p><p>Scientific jargon can confuse researchers when they read materials from other domains. Identifying and translating jargon for individual researchers could speed up research, but current methods of jargon identification mainly use corpus-level familiarity indicators rather than modeling researcher-specific needs, which can vary greatly based on each researcher's background. We collect a dataset of over 10K term familiarity annotations from 11 computer science researchers for terms drawn from 100 paper abstracts. Analysis of this data reveals that jargon familiarity and information needs vary widely across annotators, even within the same sub-domain (e.g., NLP). We investigate features representing domain, subdomain, and individual knowledge to predict individual jargon familiarity. We compare supervised and prompt-based approaches, finding that prompt-based methods using information about the individual researcher (e.g., personal publications, self-defined subfield of research) yield the highest accuracy, though the task remains difficult and supervised approaches have lower false positive rates. This research offers insights into features and methods for the novel task of integrating personal data into scientific jargon identification.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"4535-4550"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11801132/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143366228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning. PromptFix:通过对抗性提示调整的少量后门移除。
Tianrong Zhang, Zhaohan Xi, Ting Wang, Prasenjit Mitra, Jinghui Chen

Pre-trained language models (PLMs) have attracted enormous attention over the past few years with their unparalleled performances. Meanwhile, the soaring cost to train PLMs as well as their amazing generalizability have jointly contributed to few-shot fine-tuning and prompting as the most popular training paradigms for natural language processing (NLP) models. Nevertheless, existing studies have shown that these NLP models can be backdoored such that model behavior is manipulated when trigger tokens are presented. In this paper, we propose PromptFix, a novel backdoor mitigation strategy for NLP models via adversarial prompt-tuning in few-shot settings. Unlike existing NLP backdoor removal methods, which rely on accurate trigger inversion and subsequent model fine-tuning, PromptFix keeps the model parameters intact and only utilizes two extra sets of soft tokens which approximate the trigger and counteract it respectively. The use of soft tokens and adversarial optimization eliminates the need to enumerate possible backdoor configurations and enables an adaptive balance between trigger finding and preservation of performance. Experiments with various backdoor attacks validate the effectiveness of the proposed method and the performances when domain shift is present further shows PromptFix's applicability to models pre-trained on unknown data source which is the common case in prompt tuning scenarios.

在过去的几年中,预训练语言模型(plm)以其无与伦比的性能引起了人们的极大关注。与此同时,训练plm的成本飙升,以及它们惊人的通用性,共同促成了几次微调和提示,成为自然语言处理(NLP)模型中最流行的训练范例。然而,现有的研究表明,这些NLP模型可以被后门,这样当触发令牌出现时,模型行为就会被操纵。在本文中,我们提出了PromptFix,这是一种新的后门缓解策略,用于NLP模型,通过在少数镜头设置中进行对抗性提示调整。现有的NLP后门去除方法依赖于精确的触发反转和随后的模型微调,与之不同的是,PromptFix保持模型参数完整,只使用两组额外的软令牌,分别近似触发和抵消触发。使用软令牌和对抗性优化消除了枚举可能的后门配置的需要,并在触发器查找和性能保持之间实现了自适应平衡。针对各种后门攻击的实验验证了该方法的有效性,在存在域漂移的情况下的性能进一步证明了PromptFix对未知数据源预训练模型的适用性,这是提示调优场景中常见的情况。
{"title":"PromptFix: Few-shot Backdoor Removal via Adversarial Prompt Tuning.","authors":"Tianrong Zhang, Zhaohan Xi, Ting Wang, Prasenjit Mitra, Jinghui Chen","doi":"10.18653/v1/2024.naacl-long.177","DOIUrl":"10.18653/v1/2024.naacl-long.177","url":null,"abstract":"<p><p>Pre-trained language models (PLMs) have attracted enormous attention over the past few years with their unparalleled performances. Meanwhile, the soaring cost to train PLMs as well as their amazing generalizability have jointly contributed to few-shot fine-tuning and prompting as the most popular training paradigms for natural language processing (NLP) models. Nevertheless, existing studies have shown that these NLP models can be backdoored such that model behavior is manipulated when trigger tokens are presented. In this paper, we propose PromptFix, a novel backdoor mitigation strategy for NLP models via adversarial prompt-tuning in few-shot settings. Unlike existing NLP backdoor removal methods, which rely on accurate trigger inversion and subsequent model fine-tuning, PromptFix keeps the model parameters intact and only utilizes two extra sets of soft tokens which approximate the trigger and counteract it respectively. The use of soft tokens and adversarial optimization eliminates the need to enumerate possible backdoor configurations and enables an adaptive balance between trigger finding and preservation of performance. Experiments with various backdoor attacks validate the effectiveness of the proposed method and the performances when domain shift is present further shows PromptFix's applicability to models pre-trained on unknown data source which is the common case in prompt tuning scenarios.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"1 ","pages":"3212-3225"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12395398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144981805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ALBA: Adaptive Language-Based Assessments for Mental Health. 基于适应性语言的心理健康评估
Vasudha Varadarajan, Sverker Sikström, Oscar N E Kjell, H Andrew Schwartz

Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments have shown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introduces the task of Adaptive Language-Based Assessment (ALBA), which involves adaptively ordering questions while also scoring an individual's latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical Test Theory and Item Response Theory. We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-based method (ALIRT) and a supervised Actor-Critic model. While we found both methods to improve over non-adaptive baselines, We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ≈ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.

心理健康问题因人而异,有不同的症状和体征。最近,基于语言的评估显示出了捕捉这种多样性的希望,但它们需要大量的每个人的单词样本来保证准确性。这项工作介绍了适应性语言评估(ALBA)的任务,该任务包括自适应排序问题,同时也使用对先前问题的有限语言反应来评分个体的潜在心理特征。为此,我们在经典测试理论和项目反应理论两种心理测量理论的基础上发展了适应性测试方法。我们对排序和评分策略进行了实证评估,并将其组织为两种新方法:基于半监督项目反应理论的方法(ALIRT)和监督行为者-评论家模型。虽然我们发现这两种方法都比非自适应基线有所改进,但我们发现ALIRT是最准确和可扩展的,在较少的问题下实现了最高的准确性(例如,与通常需要至少7个问题相比,仅在3个问题后Pearson r≈0.93)。一般来说,基于适应性语言的抑郁和焦虑评估能够利用较小的语言样本,而不会影响有效性或大量计算成本。
{"title":"ALBA: Adaptive Language-Based Assessments for Mental Health.","authors":"Vasudha Varadarajan, Sverker Sikström, Oscar N E Kjell, H Andrew Schwartz","doi":"10.18653/v1/2024.naacl-long.136","DOIUrl":"10.18653/v1/2024.naacl-long.136","url":null,"abstract":"<p><p>Mental health issues differ widely among individuals, with varied signs and symptoms. Recently, language-based assessments have shown promise in capturing this diversity, but they require a substantial sample of words per person for accuracy. This work introduces the task of Adaptive Language-Based Assessment (ALBA), which involves adaptively <i>ordering</i> questions while also <i>scoring</i> an individual's latent psychological trait using limited language responses to previous questions. To this end, we develop adaptive testing methods under two psychometric measurement theories: Classical <i>Test Theory</i> and <i>Item Response Theory</i>. We empirically evaluate ordering and scoring strategies, organizing into two new methods: a semi-supervised item response theory-based method (ALIRT) and a supervised <i>Actor-Critic</i> model. While we found both methods to improve over non-adaptive baselines, We found ALIRT to be the most accurate and scalable, achieving the highest accuracy with fewer questions (e.g., Pearson r ≈ 0.93 after only 3 questions as compared to typically needing at least 7 questions). In general, adaptive language-based assessments of depression and anxiety were able to utilize a smaller sample of language without compromising validity or large computational costs.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"2466-2478"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11907698/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pedagogically Aligned Objectives Create Reliable Automatic Cloze Tests. 教学上一致的目标创建可靠的自动完形测试。
Brian Ondov, Dina Demner-Fushman, Kush Attal

The cloze training objective of Masked Language Models makes them a natural choice for generating plausible distractors for human cloze questions. However, distractors must also be both distinct and incorrect, neither of which is directly addressed by existing neural methods. Evaluation of recent models has also relied largely on automated metrics, which cannot demonstrate the reliability or validity of human comprehension tests. In this work, we first formulate the pedagogically motivated objectives of plausibility, incorrectness, and distinctiveness in terms of conditional distributions from language models. Second, we present an unsupervised, interpretable method that uses these objectives to jointly optimize sets of distractors. Third, we test the reliability and validity of the resulting cloze tests compared to other methods with human participants. We find our method has stronger correlation with teacher-created comprehension tests than the state-of-the-art neural method and is more internally consistent. Our implementation is freely available and can quickly create a multiple choice cloze test from any given passage.

掩蔽语言模型的完形填空训练目标使其成为为人类完形填空问题生成似是而非的干扰因素的自然选择。然而,干扰物也必须是明显的和不正确的,这两者都不是现有的神经方法直接解决的。对最新模型的评估也主要依赖于自动化的度量标准,这无法证明人类理解测试的可靠性或有效性。在这项工作中,我们首先根据语言模型的条件分布,制定了可行性、不正确性和独特性的教学动机目标。其次,我们提出了一种无监督的、可解释的方法,该方法使用这些目标来联合优化干扰物集。第三,与其他人类参与者的方法相比,我们测试了最终完形测试的可靠性和有效性。我们发现,与最先进的神经方法相比,我们的方法与教师创建的理解测试有更强的相关性,并且更具有内部一致性。我们的实现是免费的,可以从任何给定的文章中快速创建一个选择填空测试。
{"title":"Pedagogically Aligned Objectives Create Reliable Automatic Cloze Tests.","authors":"Brian Ondov, Dina Demner-Fushman, Kush Attal","doi":"10.18653/v1/2024.naacl-long.220","DOIUrl":"10.18653/v1/2024.naacl-long.220","url":null,"abstract":"<p><p>The cloze training objective of Masked Language Models makes them a natural choice for generating plausible distractors for human cloze questions. However, distractors must also be both distinct and incorrect, neither of which is directly addressed by existing neural methods. Evaluation of recent models has also relied largely on automated metrics, which cannot demonstrate the reliability or validity of human comprehension tests. In this work, we first formulate the pedagogically motivated objectives of plausibility, incorrectness, and distinctiveness in terms of conditional distributions from language models. Second, we present an unsupervised, interpretable method that uses these objectives to jointly optimize sets of distractors. Third, we test the reliability and validity of the resulting cloze tests compared to other methods with human participants. We find our method has stronger correlation with teacher-created comprehension tests than the state-of-the-art neural method and is more internally consistent. Our implementation is freely available and can quickly create a multiple choice cloze test from any given passage.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2024 ","pages":"3961-3972"},"PeriodicalIF":0.0,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12415984/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145031355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ScAN: Suicide Attempt and Ideation Events Dataset. 扫描:自杀企图和构思事件数据集。
Bhanu Pratap Singh Rawat, Samuel Kovaly, Wilfred R Pigeon, Hong Yu

Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retreiver), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.

自杀是一个重要的公共卫生问题,也是全世界死亡的主要原因之一。自杀行为,包括自杀企图(SA)和自杀意念(SI),是自杀死亡的主要危险因素。与患者以前和目前的SA和SI相关的信息经常记录在电子健康记录(EHR)笔记中。准确地发现这些记录可能有助于改善对患者自杀行为的监测和预测,并提醒医疗专业人员采取预防自杀的措施。在这项研究中,我们首先建立了自杀企图和构思事件(ScAN)数据集,这是公开可用的MIMIC III数据集的一个子集,涵盖了超过12k+ EHR笔记,其中包含19k+注释的SA和SI事件信息。注释还包含自杀企图方法等属性。我们还提供了一个强大的基线模型ScANER (Suicide Attempt and Ideation Events retrever),一个基于roberta的多任务模型,该模型具有检索模块,用于从住院患者的电子病历记录中提取所有相关的自杀行为证据,以及一个预测模块,用于识别患者住院期间发生的自杀行为类型(SA和SI)。ScANER在识别自杀行为证据方面的宏观加权f1得分为0.83,在患者住院期间的SA和SI分类方面的宏观加权f1得分分别为0.78和0.60。ScAN和ScANER是公开可用的。
{"title":"ScAN: Suicide Attempt and Ideation Events Dataset.","authors":"Bhanu Pratap Singh Rawat,&nbsp;Samuel Kovaly,&nbsp;Wilfred R Pigeon,&nbsp;Hong Yu","doi":"10.18653/v1/2022.naacl-main.75","DOIUrl":"https://doi.org/10.18653/v1/2022.naacl-main.75","url":null,"abstract":"<p><p>Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients' previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients' suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built <b>S</b>uicide <b>A</b>ttempt and Ideatio<b>n</b> Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12<i>k</i>+ EHR notes with 19<i>k</i>+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (<b>S</b>ui<b>c</b>ide <b>A</b>ttempt and Ideatio<b>n</b> <b>E</b>vents <b>R</b>etreiver), a multi-task RoBERTa-based model with a <i>retrieval module</i> to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a <i>prediction module</i> to identify the type of suicidal behavior (SA and SI) concluded during the patient's stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient's hospital-stay, respectively. ScAN and ScANER are publicly available.</p>","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"2022 ","pages":"1029-1040"},"PeriodicalIF":0.0,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9958515/pdf/nihms-1875183.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9423903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ScAN: Suicide Attempt and Ideation Events Dataset 扫描:自杀企图和构思事件数据集
Bhanu Pratap Singh Rawat, Samuel Kovaly, W. Pigeon, Hong-ye Yu
Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients’ previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients’ suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient’s stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient’s hospital-stay, respectively. ScAN and ScANER are publicly available.
自杀是一个重要的公共卫生问题,也是全世界死亡的主要原因之一。自杀行为,包括自杀企图(SA)和自杀意念(SI),是自杀死亡的主要危险因素。与患者以前和目前的SA和SI相关的信息经常记录在电子健康记录(EHR)笔记中。准确地发现这些记录可能有助于改善对患者自杀行为的监测和预测,并提醒医疗专业人员采取预防自杀的措施。在这项研究中,我们首先建立了自杀企图和构思事件(ScAN)数据集,这是公开可用的MIMIC III数据集的一个子集,涵盖了超过12k+ EHR笔记,其中包含19k+注释的SA和SI事件信息。注释还包含自杀企图方法等属性。我们还提供了一个强大的基线模型ScANER(自杀企图和意念事件检索器),一个基于roberta的多任务模型,该模型具有检索模块,用于从住院患者的电子病历记录中提取所有相关的自杀行为证据,以及一个预测模块,用于识别患者住院期间发生的自杀行为类型(SA和SI)。ScANER在识别自杀行为证据方面的宏观加权f1得分为0.83,在患者住院期间的SA和SI分类方面的宏观加权f1得分分别为0.78和0.60。ScAN和ScANER是公开可用的。
{"title":"ScAN: Suicide Attempt and Ideation Events Dataset","authors":"Bhanu Pratap Singh Rawat, Samuel Kovaly, W. Pigeon, Hong-ye Yu","doi":"10.48550/arXiv.2205.07872","DOIUrl":"https://doi.org/10.48550/arXiv.2205.07872","url":null,"abstract":"Suicide is an important public health concern and one of the leading causes of death worldwide. Suicidal behaviors, including suicide attempts (SA) and suicide ideations (SI), are leading risk factors for death by suicide. Information related to patients’ previous and current SA and SI are frequently documented in the electronic health record (EHR) notes. Accurate detection of such documentation may help improve surveillance and predictions of patients’ suicidal behaviors and alert medical professionals for suicide prevention efforts. In this study, we first built Suicide Attempt and Ideation Events (ScAN) dataset, a subset of the publicly available MIMIC III dataset spanning over 12k+ EHR notes with 19k+ annotated SA and SI events information. The annotations also contain attributes such as method of suicide attempt. We also provide a strong baseline model ScANER (Suicide Attempt and Ideation Events Retriever), a multi-task RoBERTa-based model with a retrieval module to extract all the relevant suicidal behavioral evidences from EHR notes of an hospital-stay and, and a prediction module to identify the type of suicidal behavior (SA and SI) concluded during the patient’s stay at the hospital. ScANER achieved a macro-weighted F1-score of 0.83 for identifying suicidal behavioral evidences and a macro F1-score of 0.78 and 0.60 for classification of SA and SI for the patient’s hospital-stay, respectively. ScAN and ScANER are publicly available.","PeriodicalId":74542,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting","volume":"17 1","pages":"1029-1040"},"PeriodicalIF":0.0,"publicationDate":"2022-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78256254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1