首页 > 最新文献

Proceedings of the conference. Association for Computational Linguistics. Meeting最新文献

英文 中文
Classifying Electronic Consults for Triage Status and Question Type. 分类电子会诊的分诊状态和问题类型。
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.nlpmc-1.1
Xiyu Ding, Michael L Barnett, Ateev Mehrotra, Timothy A Miller

Electronic consult (eConsult) systems allow specialists more flexibility to respond to referrals more efficiently, thereby increasing access in under-resourced healthcare settings like safety net systems. Understanding the usage patterns of eConsult system is an important part of improving specialist efficiency. In this work, we develop and apply classifiers to a dataset of eConsult questions from primary care providers to specialists, classifying the messages for how they were triaged by the specialist office, and the underlying type of clinical question posed by the primary care provider. We show that pre-trained transformer models are strong baselines, with improving performance from domain-specific training and shared representations.

电子咨询(eConsult)系统使专家能够更灵活、更有效地对转诊作出反应,从而增加了资源不足的医疗保健环境(如安全网系统)的可及性。了解eConsult系统的使用模式是提高专家工作效率的重要组成部分。在这项工作中,我们开发并将分类器应用于从初级保健提供者到专家的eConsult问题数据集,对专家办公室如何对其进行分类的信息以及初级保健提供者提出的潜在临床问题类型进行分类。我们展示了预训练的变压器模型是强大的基线,通过特定领域的训练和共享表示提高了性能。
{"title":"Classifying Electronic Consults for Triage Status and Question Type.","authors":"Xiyu Ding,&nbsp;Michael L Barnett,&nbsp;Ateev Mehrotra,&nbsp;Timothy A Miller","doi":"10.18653/v1/2020.nlpmc-1.1","DOIUrl":"https://doi.org/10.18653/v1/2020.nlpmc-1.1","url":null,"abstract":"<p><p>Electronic consult (eConsult) systems allow specialists more flexibility to respond to referrals more efficiently, thereby increasing access in under-resourced healthcare settings like safety net systems. Understanding the usage patterns of eConsult system is an important part of improving specialist efficiency. In this work, we develop and apply classifiers to a dataset of eConsult questions from primary care providers to specialists, classifying the messages for how they were triaged by the specialist office, and the underlying type of clinical question posed by the primary care provider. We show that pre-trained transformer models are strong baselines, with improving performance from domain-specific training and shared representations.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7603636/pdf/nihms-1640423.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38566053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab. BENTO:基于 CodaLab 构建临床 NLP 管道的可视化平台。
Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.acl-demos.13
Yonghao Jin, Fei Li, Hong Yu

CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.

CodaLab 是一个用于协作计算研究的开源网络平台。虽然 CodaLab 在研究界很受欢迎,但它的界面对创建可重复使用工具的支持有限,而这些工具可以很容易地应用于新数据集并组成管道。在临床领域,医疗笔记的自然语言处理(NLP)通常涉及多个步骤,如标记化、命名实体识别等。由于这些步骤需要不同的工具,而这些工具通常分散在不同的出版物中,因此研究人员很难使用它们来处理自己的数据集。在本文中,我们介绍了 BENTO,一个建立在 CodaLab 基础上的带有图形用户界面(GUI)的工作流管理平台,以促进临床 NLP 管道的构建过程。BENTO 配备了大量临床 NLP 工具,这些工具已利用医疗笔记和专家注释进行了预训练,可随时用于各种临床 NLP 任务。它还允许研究人员和开发人员创建自己的定制工具(如预训练的 NLP 模型),并以受控和可重复的方式使用它们。此外,图形用户界面还能让计算机背景有限的研究人员将工具组成 NLP 管道,然后以 "所见即所得"(WYSIWYG)的方式将管道应用于自己的数据集。虽然 BENTO 是为临床 NLP 应用而设计的,但其底层架构非常灵活,可适用于任何其他领域。
{"title":"BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.","authors":"Yonghao Jin, Fei Li, Hong Yu","doi":"10.18653/v1/2020.acl-demos.13","DOIUrl":"10.18653/v1/2020.acl-demos.13","url":null,"abstract":"<p><p>CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present <i>BENTO</i>, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a \"what you see is what you get\" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"95-100"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7679080/pdf/nihms-1644629.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38630240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-stage Federated Phenotyping and Patient Representation Learning. 两阶段联合表型和患者表征学习。
Dianbo Liu, Dmitriy Dligach, Timothy Miller

A large percentage of medical information is in unstructured text format in electronic medical record systems. Manual extraction of information from clinical notes is extremely time consuming. Natural language processing has been widely used in recent years for automatic information extraction from medical texts. However, algorithms trained on data from a single healthcare provider are not generalizable and error-prone due to the heterogeneity and uniqueness of medical documents. We develop a two-stage federated natural language processing method that enables utilization of clinical notes from different hospitals or clinics without moving the data, and demonstrate its performance using obesity and comorbities phenotyping as medical task. This approach not only improves the quality of a specific clinical task but also facilitates knowledge progression in the whole healthcare system, which is an essential part of learning health system. To the best of our knowledge, this is the first application of federated machine learning in clinical NLP.

在电子病历系统中,很大比例的医疗信息采用非结构化文本格式。人工从临床记录中提取信息非常耗时。近年来,自然语言处理被广泛应用于医学文本信息的自动提取。然而,由于医疗文档的异质性和唯一性,对来自单个医疗保健提供者的数据进行训练的算法不具有通用性,而且容易出错。我们开发了一种两阶段联合自然语言处理方法,可以在不移动数据的情况下利用来自不同医院或诊所的临床记录,并使用肥胖和合并症表型作为医疗任务来演示其性能。这种方法不仅提高了特定临床任务的质量,而且促进了整个医疗保健系统的知识进步,这是学习卫生系统的重要组成部分。据我们所知,这是联邦机器学习在临床NLP中的首次应用。
{"title":"Two-stage Federated Phenotyping and Patient Representation Learning.","authors":"Dianbo Liu, Dmitriy Dligach, Timothy Miller","doi":"10.18653/v1/W19-5030","DOIUrl":"10.18653/v1/W19-5030","url":null,"abstract":"<p><p>A large percentage of medical information is in unstructured text format in electronic medical record systems. Manual extraction of information from clinical notes is extremely time consuming. Natural language processing has been widely used in recent years for automatic information extraction from medical texts. However, algorithms trained on data from a single healthcare provider are not generalizable and error-prone due to the heterogeneity and uniqueness of medical documents. We develop a two-stage federated natural language processing method that enables utilization of clinical notes from different hospitals or clinics without moving the data, and demonstrate its performance using obesity and comorbities phenotyping as medical task. This approach not only improves the quality of a specific clinical task but also facilitates knowledge progression in the whole healthcare system, which is an essential part of learning health system. To the best of our knowledge, this is the first application of federated machine learning in clinical NLP.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"283-291"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8072229/pdf/nihms-1063931.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38915276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 58
We need to talk about standard splits. 我们需要讨论标准分割。
Kyle Gorman, Steven Bedrick

It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used "standard split". We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.

在语音和语言技术中,根据测试集的表现对系统进行排名是标准做法。然而,很少有研究人员应用统计测试来确定性能差异是否可能偶然出现,也很少有人检查多个训练-测试分裂之间系统排名的稳定性。我们对2000年至2018年间发布的9个词性标注器进行了复制和再现实验,每个标注器都在广泛使用的“标准分割”上报告了最先进的性能。使用随机生成的分割,我们无法可靠地再现一些排名。我们建议在系统比较中使用随机生成的分割。
{"title":"We need to talk about standard splits.","authors":"Kyle Gorman,&nbsp;Steven Bedrick","doi":"10.18653/v1/p19-1267","DOIUrl":"https://doi.org/10.18653/v1/p19-1267","url":null,"abstract":"<p><p>It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used \"standard split\". We fail to reliably reproduce some rankings using <i>randomly generated</i> splits. We suggest that randomly generated splits should be used in system comparison.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2019 ","pages":"2786-2791"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10287171/pdf/nihms-1908534.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9715654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
Multimodal Transformer for Unaligned Multimodal Language Sequences. 未对齐多模态语言序列的多模态变换器
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

人类语言通常是多模态的,包括自然语言、面部手势和声音行为。然而,对这种多模态人类语言时间序列数据建模存在两大挑战:1) 由于每种模态的序列采样率不同,导致固有的数据不对齐;以及 2) 不同模态的元素之间存在长程依赖关系。在本文中,我们引入了多模态变换器(MulT),以端到端方式通用地解决上述问题,而无需明确地对齐数据。我们模型的核心是定向成对跨模态注意力,它关注跨不同时间步长的多模态序列之间的交互,并潜移默化地将流从一种模态适应到另一种模态。在对齐和非对齐多模态时间序列上进行的综合实验表明,我们的模型在很大程度上优于最先进的方法。此外,经验分析表明,MulT 中提出的跨模态注意力机制能够捕捉到相关的跨模态信号。
{"title":"Multimodal Transformer for Unaligned Multimodal Language Sequences.","authors":"Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov","doi":"10.18653/v1/p19-1656","DOIUrl":"10.18653/v1/p19-1656","url":null,"abstract":"<p><p>Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"6558-6569"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195022/pdf/nihms-1570579.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37896067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid Attention based Multimodal Network for Spoken Language Classification. 用于口语分类的基于注意力的多模式混合网络。
Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic

We examine the utility of linguistic content and vocal characteristics for multimodal deep learning in human spoken language understanding. We present a deep multimodal network with both feature attention and modality attention to classify utterance-level speech data. The proposed hybrid attention architecture helps the system focus on learning informative representations for both modality-specific feature extraction and model fusion. The experimental results show that our system achieves state-of-the-art or competitive results on three published multimodal datasets. We also demonstrated the effectiveness and generalization of our system on a medical speech dataset from an actual trauma scenario. Furthermore, we provided a detailed comparison and analysis of traditional approaches and deep learning methods on both feature extraction and fusion.

我们研究了语言内容和声音特征在人类口语理解中对多模式深度学习的效用。我们提出了一个同时具有特征注意和模态注意的深度多模态网络来对话语级语音数据进行分类。所提出的混合注意力架构有助于系统专注于学习用于模态特定特征提取和模型融合的信息表示。实验结果表明,我们的系统在三个已发表的多模态数据集上取得了最先进或有竞争力的结果。我们还在实际创伤场景的医学语音数据集上展示了我们系统的有效性和通用性。此外,我们对传统方法和深度学习方法在特征提取和融合方面进行了详细的比较和分析。
{"title":"Hybrid Attention based Multimodal Network for Spoken Language Classification.","authors":"Yue Gu,&nbsp;Kangning Yang,&nbsp;Shiyu Fu,&nbsp;Shuhong Chen,&nbsp;Xinyu Li,&nbsp;Ivan Marsic","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We examine the utility of linguistic content and vocal characteristics for multimodal deep learning in human spoken language understanding. We present a deep multimodal network with both feature attention and modality attention to classify utterance-level speech data. The proposed hybrid attention architecture helps the system focus on learning informative representations for both modality-specific feature extraction and model fusion. The experimental results show that our system achieves state-of-the-art or competitive results on three published multimodal datasets. We also demonstrated the effectiveness and generalization of our system on a medical speech dataset from an actual trauma scenario. Furthermore, we provided a detailed comparison and analysis of traditional approaches and deep learning methods on both feature extraction and fusion.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2018 ","pages":"2379-2390"},"PeriodicalIF":0.0,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6217979/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41156853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment. 使用词级对齐的层次注意策略进行多模式情感分析。
Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic

Multimodal affective computing, learning to recognize and interpret human affect and subjective information from multiple data sources, is still challenging because:(i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract levels, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data. Our introduced model outperforms state-of-the-art approaches on published datasets, and we demonstrate that our model's synchronized attention over modalities offers visual interpretability.

多模式情感计算,学习识别和解释来自多个数据源的人类情感和主观信息,仍然具有挑战性,因为:(i)很难从异质输入中提取信息特征来表示人类情感;(ii)目前的融合策略只在抽象层面融合不同的模态,忽略了模态之间与时间相关的相互作用。针对这些问题,我们引入了一种具有注意力和词级融合的分层多模式架构,以从文本和音频数据中对话语级情感和情绪进行分类。我们引入的模型在已发布的数据集上优于最先进的方法,我们证明了我们的模型对模态的同步关注提供了视觉可解释性。
{"title":"Multimodal Affective Analysis Using Hierarchical Attention Strategy with Word-Level Alignment.","authors":"Yue Gu,&nbsp;Kangning Yang,&nbsp;Shiyu Fu,&nbsp;Shuhong Chen,&nbsp;Xinyu Li,&nbsp;Ivan Marsic","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Multimodal affective computing, learning to recognize and interpret human affect and subjective information from multiple data sources, is still challenging because:(i) it is hard to extract informative features to represent human affects from heterogeneous inputs; (ii) current fusion strategies only fuse different modalities at abstract levels, ignoring time-dependent interactions between modalities. Addressing such issues, we introduce a hierarchical multimodal architecture with attention and word-level fusion to classify utterance-level sentiment and emotion from text and audio data. Our introduced model outperforms state-of-the-art approaches on published datasets, and we demonstrate that our model's synchronized attention over modalities offers visual interpretability.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2018 ","pages":"2225-2235"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6261375/pdf/nihms-993286.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41174352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compositional Language Modeling for Icon-Based Augmentative and Alternative Communication. 基于图标的组合语言建模的辅助和替代交流。
Shiran Dudy, Steven Bedrick

Icon-based communication systems are widely used in the field of Augmentative and Alternative Communication. Typically, icon-based systems have lagged behind word- and character-based systems in terms of predictive typing functionality, due to the challenges inherent to training icon-based language models. We propose a method for synthesizing training data for use in icon-based language models, and explore two different modeling strategies.

基于图标的通信系统在辅助通信和替代通信领域有着广泛的应用。通常,由于训练基于图标的语言模型所固有的挑战,基于图标的系统在预测键入功能方面落后于基于单词和字符的系统。我们提出了一种用于基于图标的语言模型的综合训练数据的方法,并探索了两种不同的建模策略。
{"title":"Compositional Language Modeling for Icon-Based Augmentative and Alternative Communication.","authors":"Shiran Dudy,&nbsp;Steven Bedrick","doi":"10.18653/v1/w18-3404","DOIUrl":"https://doi.org/10.18653/v1/w18-3404","url":null,"abstract":"<p><p>Icon-based communication systems are widely used in the field of Augmentative and Alternative Communication. Typically, icon-based systems have lagged behind word- and character-based systems in terms of predictive typing functionality, due to the challenges inherent to training icon-based language models. We propose a method for synthesizing training data for use in icon-based language models, and explore two different modeling strategies.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2018 ","pages":"25-32"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8087438/pdf/nihms-1001619.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38939330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Training Classifiers with Natural Language Explanations. 用自然语言解释训练分类器
Braden Hancock, Martin Bringmann, Paroma Varma, Percy Liang, Stephanie Wang, Christopher Ré

Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount of unlabeled data, which is used to train a classifier. On three relation extraction tasks, we find that users are able to train classifiers with comparable F1 scores from 5-100× faster by providing explanations instead of just labels. Furthermore, given the inherent imperfection of labeling functions, we find that a simple rule-based semantic parser suffices.

训练精确的分类器需要许多标签,但每个标签只能提供有限的信息(二进制分类时为一个比特)。在这项工作中,我们提出了一个用于训练分类器的框架--BabbleLabble,在这个框架中,注释者为每个标签决定提供自然语言解释。语义解析器将这些解释转换成程序化的标签函数,为任意数量的未标签数据生成噪声标签,用于训练分类器。在三个关系提取任务中,我们发现用户通过提供解释而不仅仅是标签,能够以 5-100 倍的速度训练分类器,并获得相当的 F1 分数。此外,考虑到标签功能本身的不完善,我们发现基于规则的简单语义解析器就足够了。
{"title":"Training Classifiers with Natural Language Explanations.","authors":"Braden Hancock, Martin Bringmann, Paroma Varma, Percy Liang, Stephanie Wang, Christopher Ré","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Training accurate classifiers requires many labels, but each label provides only limited information (one bit for binary classification). In this work, we propose BabbleLabble, a framework for training classifiers in which an annotator provides a natural language explanation for each labeling decision. A semantic parser converts these explanations into programmatic labeling functions that generate noisy labels for an arbitrary amount of unlabeled data, which is used to train a classifier. On three relation extraction tasks, we find that users are able to train classifiers with comparable F1 scores from 5-100× faster by providing explanations instead of just labels. Furthermore, given the inherent imperfection of labeling functions, we find that a simple rule-based semantic parser suffices.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2018 ","pages":"1884-1895"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6534135/pdf/nihms-993798.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37013648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature. 一个包含患者、干预措施和结果的多层次注释的语料库,以支持医学文献的语言处理。
Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain J Marshall, Ani Nenkova, Byron C Wallace

We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the 'PICO' elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.

我们提供了一个由5000篇注释丰富的医学文章摘要组成的语料库,这些文章描述了临床随机对照试验。注释包括描述入选患者群体、所研究的干预措施和与之比较的文本跨度的分界,以及测量的结果(“PICO”元素)。这些跨度在更精细的层面上被进一步注释,例如,其中的个体干预被标记并映射到结构化的医学词汇表中。我们从具有不同专业知识和成本水平的不同员工那里获得了注释。我们详细描述了我们的数据收集过程和语料库本身。然后,我们概述了一系列具有挑战性的NLP任务,这些任务将有助于检索医学文献和循证医学实践。
{"title":"A Corpus with Multi-Level Annotations of Patients, Interventions and Outcomes to Support Language Processing for Medical Literature.","authors":"Benjamin Nye, Junyi Jessy Li, Roma Patel, Yinfei Yang, Iain J Marshall, Ani Nenkova, Byron C Wallace","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present a corpus of 5,000 richly annotated abstracts of medical articles describing clinical randomized controlled trials. Annotations include demarcations of text spans that describe the Patient population enrolled, the Interventions studied and to what they were Compared, and the Outcomes measured (the 'PICO' elements). These spans are further annotated at a more granular level, e.g., individual interventions within them are marked and mapped onto a structured medical vocabulary. We acquired annotations from a diverse set of workers with varying levels of expertise and cost. We describe our data collection process and the corpus itself in detail. We then outline a set of challenging NLP tasks that would aid searching of the medical literature and the practice of evidence-based medicine.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2018 ","pages":"197-207"},"PeriodicalIF":0.0,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6174533/pdf/nihms-988059.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41158894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Proceedings of the conference. Association for Computational Linguistics. Meeting
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1