Proceedings of the conference. Association for Computational Linguistics. Meeting最新文献

英文中文

Calibrating Structured Output Predictors for Natural Language Processing. 校准用于自然语言处理的结构化输出预测器。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.acl-main.188

Abhyuday Jagannatha, Hong Yu

We address the problem of calibrating prediction confidence for output entities of interest in natural language processing (NLP) applications. It is important that NLP applications such as named entity recognition and question answering produce calibrated confidence scores for their predictions, especially if the applications are to be deployed in a safety-critical domain such as healthcare. However, the output space of such structured prediction models is often too large to adapt binary or multi-class calibration methods directly. In this study, we propose a general calibration scheme for output entities of interest in neural network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. Additionally, we show that our calibration method can also be used as an uncertainty-aware, entity-specific decoding step to improve the performance of the underlying model at no additional training cost or data requirements. We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering. We also improve our model's performance from our decoding step across several tasks and benchmark datasets. Our method improves the calibration and model performance on out-ofdomain test scenarios as well.

我们解决了校准自然语言处理(NLP)应用中感兴趣的输出实体的预测置信度的问题。重要的是，NLP应用程序(如命名实体识别和问答)为其预测生成校准的置信度分数，特别是如果应用程序要部署在安全关键领域(如医疗保健)中。然而，这种结构化预测模型的输出空间往往太大，无法直接适应二值或多类校准方法。在这项研究中，我们提出了一种基于神经网络的结构化预测模型中感兴趣的输出实体的通用校准方案。该方法可用于任何二值类标定方案和神经网络模型。此外，我们表明，我们的校准方法也可以用作不确定性感知，实体特定的解码步骤，以提高底层模型的性能，而不需要额外的训练成本或数据需求。我们表明，我们的方法优于当前的命名实体识别、词性和问答校准技术。我们还通过跨多个任务和基准数据集的解码步骤提高了模型的性能。该方法还提高了域外测试场景下的标定和模型性能。

{"title":"Calibrating Structured Output Predictors for Natural Language Processing.","authors":"Abhyuday Jagannatha, Hong Yu","doi":"10.18653/v1/2020.acl-main.188","DOIUrl":"https://doi.org/10.18653/v1/2020.acl-main.188","url":null,"abstract":"We address the problem of calibrating prediction confidence for output entities of interest in natural language processing (NLP) applications. It is important that NLP applications such as named entity recognition and question answering produce calibrated confidence scores for their predictions, especially if the applications are to be deployed in a safety-critical domain such as healthcare. However, the output space of such structured prediction models is often too large to adapt binary or multi-class calibration methods directly. In this study, we propose a general calibration scheme for output entities of interest in neural network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. Additionally, we show that our calibration method can also be used as an uncertainty-aware, entity-specific decoding step to improve the performance of the underlying model at no additional training cost or data requirements. We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering. We also improve our model's performance from our decoding step across several tasks and benchmark datasets. Our method improves the calibration and model performance on out-ofdomain test scenarios as well.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"2078-2092"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7890517/pdf/nihms-1661932.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25390283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Automated Scoring of Clinical Expressive Language Evaluation Tasks. 临床表达性语言评估任务的自动评分。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.bea-1.18

Yiyi Wang, Emily Prud'hommeaux, Meysam Asgari, Jill Dolata

Many clinical assessment instruments used to diagnose language impairments in children include a task in which the subject must formulate a sentence to describe an image using a specific target word. Because producing sentences in this way requires the speaker to integrate syntactic and semantic knowledge in a complex manner, responses are typically evaluated on several different dimensions of appropriateness yielding a single composite score for each response. In this paper, we present a dataset consisting of non-clinically elicited responses for three related sentence formulation tasks, and we propose an approach for automatically evaluating their appropriateness. Using neural machine translation, we generate correct-incorrect sentence pairs to serve as synthetic data in order to increase the amount and diversity of training data for our scoring model. Our scoring model uses transfer learning to facilitate automatic sentence appropriateness evaluation. We further compare custom word embeddings with pre-trained contextualized embeddings serving as features for our scoring model. We find that transfer learning improves scoring accuracy, particularly when using pre-trained contextualized embeddings.

许多用于诊断儿童语言障碍的临床评估工具都包含一项任务，即受试者必须用特定的目标词造句来描述一幅图像。由于以这种方式造句需要说话者以复杂的方式整合句法和语义知识，因此通常会从几个不同的适当性维度对回答进行评估，从而为每个回答得出一个综合分数。在本文中，我们介绍了一个数据集，该数据集由三个相关造句任务的非临床诱导回答组成，我们还提出了一种自动评估其适当性的方法。通过使用神经机器翻译，我们生成了正确-不正确句子对作为合成数据，以增加评分模型训练数据的数量和多样性。我们的评分模型使用迁移学习来促进句子适当性的自动评估。我们进一步比较了自定义词嵌入和作为评分模型特征的预训练上下文嵌入。我们发现，迁移学习提高了评分的准确性，尤其是在使用预先训练好的上下文嵌入式时。

{"title":"Automated Scoring of Clinical Expressive Language Evaluation Tasks.","authors":"Yiyi Wang, Emily Prud'hommeaux, Meysam Asgari, Jill Dolata","doi":"10.18653/v1/2020.bea-1.18","DOIUrl":"10.18653/v1/2020.bea-1.18","url":null,"abstract":"Many clinical assessment instruments used to diagnose language impairments in children include a task in which the subject must formulate a sentence to describe an image using a specific target word. Because producing sentences in this way requires the speaker to integrate syntactic and semantic knowledge in a complex manner, responses are typically evaluated on several different dimensions of appropriateness yielding a single composite score for each response. In this paper, we present a dataset consisting of non-clinically elicited responses for three related sentence formulation tasks, and we propose an approach for automatically evaluating their appropriateness. Using neural machine translation, we generate correct-incorrect sentence pairs to serve as synthetic data in order to increase the amount and diversity of training data for our scoring model. Our scoring model uses transfer learning to facilitate automatic sentence appropriateness evaluation. We further compare custom word embeddings with pre-trained contextualized embeddings serving as features for our scoring model. We find that transfer learning improves scoring accuracy, particularly when using pre-trained contextualized embeddings.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"177-185"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7556318/pdf/nihms-1636235.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38497581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrating Multimodal Information in Large Pretrained Transformers. 大型预训练变压器的多模态信息集成。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.acl-main.214

Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, Ehsan Hoque

Recent Transformer-based contextual word representations, including BERT and XLNet, have shown state-of-the-art performance in multiple disciplines within NLP. Fine-tuning the trained contextual models on task-specific datasets has been the key to achieving superior performance downstream. While fine-tuning these pre-trained models is straight-forward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused on modeling face-to-face communication). Pre-trained models don't have the necessary components to accept two extra modalities of vision and acoustic. In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). MAG allows BERT and XLNet to accept multimodal nonverbal data during fine-tuning. It does so by generating a shift to internal representation of BERT and XLNet; a shift that is conditioned on the visual and acoustic modalities. In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis. Fine-tuning MAG-BERT and MAG-XLNet significantly boosts the sentiment analysis performance over previous baselines as well as language-only fine-tuning of BERT and XLNet. On the CMU-MOSI dataset, MAG-XLNet achieves human-level multimodal sentiment analysis performance for the first time in the NLP community.

最近基于transformer的上下文词表示，包括BERT和XLNet，已经在NLP的多个学科中显示了最先进的性能。在任务特定数据集上对训练好的上下文模型进行微调是在下游实现卓越性能的关键。虽然对这些预训练的模型进行微调对于词汇应用程序(只有语言模态的应用程序)来说是很简单的，但对于多模态语言(NLP中一个不断发展的领域，专注于建模面对面的交流)来说，这并不容易。预先训练的模型没有必要的组件来接受视觉和听觉两种额外的模式。在本文中，我们提出了BERT和XLNet的附件，称为多模态自适应门(MAG)。MAG允许BERT和XLNet在微调期间接受多模态非语言数据。它通过生成BERT和XLNet的内部表示的转换来实现这一点;以视觉和听觉模式为条件的转变。在我们的实验中，我们研究了常用的CMU-MOSI和CMU-MOSEI数据集用于多模态情感分析。与之前的基线相比，对magg -BERT和magg -XLNet进行微调可以显著提高情感分析的性能，也可以对BERT和XLNet进行仅语言的微调。在CMU-MOSI数据集上，MAG-XLNet在NLP社区首次实现了人类水平的多模态情感分析性能。

{"title":"Integrating Multimodal Information in Large Pretrained Transformers.","authors":"Wasifur Rahman, Md Kamrul Hasan, Sangwu Lee, Amir Zadeh, Chengfeng Mao, Louis-Philippe Morency, Ehsan Hoque","doi":"10.18653/v1/2020.acl-main.214","DOIUrl":"https://doi.org/10.18653/v1/2020.acl-main.214","url":null,"abstract":"Recent Transformer-based contextual word representations, including BERT and XLNet, have shown state-of-the-art performance in multiple disciplines within NLP. Fine-tuning the trained contextual models on task-specific datasets has been the key to achieving superior performance downstream. While fine-tuning these pre-trained models is straight-forward for lexical applications (applications with only language modality), it is not trivial for multimodal language (a growing area in NLP focused on modeling face-to-face communication). Pre-trained models don't have the necessary components to accept two extra modalities of vision and acoustic. In this paper, we proposed an attachment to BERT and XLNet called Multimodal Adaptation Gate (MAG). MAG allows BERT and XLNet to accept multimodal nonverbal data during fine-tuning. It does so by generating a shift to internal representation of BERT and XLNet; a shift that is conditioned on the visual and acoustic modalities. In our experiments, we study the commonly used CMU-MOSI and CMU-MOSEI datasets for multimodal sentiment analysis. Fine-tuning MAG-BERT and MAG-XLNet significantly boosts the sentiment analysis performance over previous baselines as well as language-only fine-tuning of BERT and XLNet. On the CMU-MOSI dataset, MAG-XLNet achieves human-level multimodal sentiment analysis performance for the first time in the NLP community.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"2359-2369"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8005298/pdf/nihms-1680563.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25543966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 229

Topic-Based Measures of Conversation for Detecting Mild CognitiveImpairment 基于话题的谈话方法检测轻度认知障碍

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.nlpmc-1.9

Meysam Asgari, Liu Chen, H. Dodge

Conversation is a complex cognitive task that engages multiple aspects of cognitive functions to remember the discussed topics, monitor the semantic and linguistic elements, and recognize others’ emotions. In this paper, we propose a computational method based on the lexical coherence of consecutive utterances to quantify topical variations in semi-structured conversations of older adults with cognitive impairments. Extracting the lexical knowledge of conversational utterances, our method generate a set of novel conversational measures that indicate underlying cognitive deficits among subjects with mild cognitive impairment (MCI). Our preliminary results verifies the utility of the proposed conversation-based measures in distinguishing MCI from healthy controls.

对话是一项复杂的认知任务，涉及多个方面的认知功能，包括记住所讨论的话题，监控语义和语言元素，以及识别他人的情绪。在本文中，我们提出了一种基于连续话语的词汇连贯性的计算方法来量化认知障碍老年人半结构化对话中的话题变化。通过提取会话话语的词汇知识，我们的方法生成了一套新的会话测量方法，这些测量方法可以显示轻度认知障碍(MCI)受试者潜在的认知缺陷。我们的初步结果验证了所提出的基于对话的措施在区分MCI和健康对照方面的实用性。

引用次数: 7

Classifying Electronic Consults for Triage Status and Question Type. 分类电子会诊的分诊状态和问题类型。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.nlpmc-1.1

Xiyu Ding, Michael L Barnett, Ateev Mehrotra, Timothy A Miller

Electronic consult (eConsult) systems allow specialists more flexibility to respond to referrals more efficiently, thereby increasing access in under-resourced healthcare settings like safety net systems. Understanding the usage patterns of eConsult system is an important part of improving specialist efficiency. In this work, we develop and apply classifiers to a dataset of eConsult questions from primary care providers to specialists, classifying the messages for how they were triaged by the specialist office, and the underlying type of clinical question posed by the primary care provider. We show that pre-trained transformer models are strong baselines, with improving performance from domain-specific training and shared representations.

电子咨询(eConsult)系统使专家能够更灵活、更有效地对转诊作出反应，从而增加了资源不足的医疗保健环境(如安全网系统)的可及性。了解eConsult系统的使用模式是提高专家工作效率的重要组成部分。在这项工作中，我们开发并将分类器应用于从初级保健提供者到专家的eConsult问题数据集，对专家办公室如何对其进行分类的信息以及初级保健提供者提出的潜在临床问题类型进行分类。我们展示了预训练的变压器模型是强大的基线，通过特定领域的训练和共享表示提高了性能。

引用次数: 0

BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab. BENTO：基于 CodaLab 构建临床 NLP 管道的可视化平台。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.acl-demos.13

Yonghao Jin, Fei Li, Hong Yu

CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a "what you see is what you get" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.

CodaLab 是一个用于协作计算研究的开源网络平台。虽然 CodaLab 在研究界很受欢迎，但它的界面对创建可重复使用工具的支持有限，而这些工具可以很容易地应用于新数据集并组成管道。在临床领域，医疗笔记的自然语言处理（NLP）通常涉及多个步骤，如标记化、命名实体识别等。由于这些步骤需要不同的工具，而这些工具通常分散在不同的出版物中，因此研究人员很难使用它们来处理自己的数据集。在本文中，我们介绍了 BENTO，一个建立在 CodaLab 基础上的带有图形用户界面（GUI）的工作流管理平台，以促进临床 NLP 管道的构建过程。BENTO 配备了大量临床 NLP 工具，这些工具已利用医疗笔记和专家注释进行了预训练，可随时用于各种临床 NLP 任务。它还允许研究人员和开发人员创建自己的定制工具（如预训练的 NLP 模型），并以受控和可重复的方式使用它们。此外，图形用户界面还能让计算机背景有限的研究人员将工具组成 NLP 管道，然后以 "所见即所得"（WYSIWYG）的方式将管道应用于自己的数据集。虽然 BENTO 是为临床 NLP 应用而设计的，但其底层架构非常灵活，可适用于任何其他领域。

{"title":"BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.","authors":"Yonghao Jin, Fei Li, Hong Yu","doi":"10.18653/v1/2020.acl-demos.13","DOIUrl":"10.18653/v1/2020.acl-demos.13","url":null,"abstract":"CodaLab is an open-source web-based platform for collaborative computational research. Although CodaLab has gained popularity in the research community, its interface has limited support for creating reusable tools that can be easily applied to new datasets and composed into pipelines. In clinical domain, natural language processing (NLP) on medical notes generally involves multiple steps, like tokenization, named entity recognition, etc. Since these steps require different tools which are usually scattered in different publications, it is not easy for researchers to use them to process their own datasets. In this paper, we present BENTO, a workflow management platform with a graphic user interface (GUI) that is built on top of CodaLab, to facilitate the process of building clinical NLP pipelines. BENTO comes with a number of clinical NLP tools that have been pre-trained using medical notes and expert annotations and can be readily used for various clinical NLP tasks. It also allows researchers and developers to create their custom tools (e.g., pre-trained NLP models) and use them in a controlled and reproducible way. In addition, the GUI interface enables researchers with limited computer background to compose tools into NLP pipelines and then apply the pipelines on their own datasets in a \"what you see is what you get\" (WYSIWYG) way. Although BENTO is designed for clinical NLP applications, the underlying architecture is flexible to be tailored to any other domains.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"95-100"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7679080/pdf/nihms-1644629.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38630240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two-stage Federated Phenotyping and Patient Representation Learning. 两阶段联合表型和患者表征学习。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2019-08-01 DOI: 10.18653/v1/W19-5030

Dianbo Liu, Dmitriy Dligach, Timothy Miller

A large percentage of medical information is in unstructured text format in electronic medical record systems. Manual extraction of information from clinical notes is extremely time consuming. Natural language processing has been widely used in recent years for automatic information extraction from medical texts. However, algorithms trained on data from a single healthcare provider are not generalizable and error-prone due to the heterogeneity and uniqueness of medical documents. We develop a two-stage federated natural language processing method that enables utilization of clinical notes from different hospitals or clinics without moving the data, and demonstrate its performance using obesity and comorbities phenotyping as medical task. This approach not only improves the quality of a specific clinical task but also facilitates knowledge progression in the whole healthcare system, which is an essential part of learning health system. To the best of our knowledge, this is the first application of federated machine learning in clinical NLP.

在电子病历系统中，很大比例的医疗信息采用非结构化文本格式。人工从临床记录中提取信息非常耗时。近年来，自然语言处理被广泛应用于医学文本信息的自动提取。然而，由于医疗文档的异质性和唯一性，对来自单个医疗保健提供者的数据进行训练的算法不具有通用性，而且容易出错。我们开发了一种两阶段联合自然语言处理方法，可以在不移动数据的情况下利用来自不同医院或诊所的临床记录，并使用肥胖和合并症表型作为医疗任务来演示其性能。这种方法不仅提高了特定临床任务的质量，而且促进了整个医疗保健系统的知识进步，这是学习卫生系统的重要组成部分。据我们所知，这是联邦机器学习在临床NLP中的首次应用。

{"title":"Two-stage Federated Phenotyping and Patient Representation Learning.","authors":"Dianbo Liu, Dmitriy Dligach, Timothy Miller","doi":"10.18653/v1/W19-5030","DOIUrl":"10.18653/v1/W19-5030","url":null,"abstract":"A large percentage of medical information is in unstructured text format in electronic medical record systems. Manual extraction of information from clinical notes is extremely time consuming. Natural language processing has been widely used in recent years for automatic information extraction from medical texts. However, algorithms trained on data from a single healthcare provider are not generalizable and error-prone due to the heterogeneity and uniqueness of medical documents. We develop a two-stage federated natural language processing method that enables utilization of clinical notes from different hospitals or clinics without moving the data, and demonstrate its performance using obesity and comorbities phenotyping as medical task. This approach not only improves the quality of a specific clinical task but also facilitates knowledge progression in the whole healthcare system, which is an essential part of learning health system. To the best of our knowledge, this is the first application of federated machine learning in clinical NLP.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"283-291"},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8072229/pdf/nihms-1063931.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38915276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

We need to talk about standard splits. 我们需要讨论标准分割。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2019-07-01 DOI: 10.18653/v1/p19-1267

Kyle Gorman, Steven Bedrick

It is standard practice in speech & language technology to rank systems according to performance on a test set held out for evaluation. However, few researchers apply statistical tests to determine whether differences in performance are likely to arise by chance, and few examine the stability of system ranking across multiple training-testing splits. We conduct replication and reproduction experiments with nine part-of-speech taggers published between 2000 and 2018, each of which reports state-of-the-art performance on a widely-used "standard split". We fail to reliably reproduce some rankings using randomly generated splits. We suggest that randomly generated splits should be used in system comparison.

在语音和语言技术中，根据测试集的表现对系统进行排名是标准做法。然而，很少有研究人员应用统计测试来确定性能差异是否可能偶然出现，也很少有人检查多个训练-测试分裂之间系统排名的稳定性。我们对2000年至2018年间发布的9个词性标注器进行了复制和再现实验，每个标注器都在广泛使用的“标准分割”上报告了最先进的性能。使用随机生成的分割，我们无法可靠地再现一些排名。我们建议在系统比较中使用随机生成的分割。

引用次数: 103

Multimodal Transformer for Unaligned Multimodal Language Sequences. 未对齐多模态语言序列的多模态变换器

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2019-07-01 DOI: 10.18653/v1/p19-1656

Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov

Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.

人类语言通常是多模态的，包括自然语言、面部手势和声音行为。然而，对这种多模态人类语言时间序列数据建模存在两大挑战：1) 由于每种模态的序列采样率不同，导致固有的数据不对齐；以及 2) 不同模态的元素之间存在长程依赖关系。在本文中，我们引入了多模态变换器（MulT），以端到端方式通用地解决上述问题，而无需明确地对齐数据。我们模型的核心是定向成对跨模态注意力，它关注跨不同时间步长的多模态序列之间的交互，并潜移默化地将流从一种模态适应到另一种模态。在对齐和非对齐多模态时间序列上进行的综合实验表明，我们的模型在很大程度上优于最先进的方法。此外，经验分析表明，MulT 中提出的跨模态注意力机制能够捕捉到相关的跨模态信号。

{"title":"Multimodal Transformer for Unaligned Multimodal Language Sequences.","authors":"Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J Zico Kolter, Louis-Philippe Morency, Ruslan Salakhutdinov","doi":"10.18653/v1/p19-1656","DOIUrl":"10.18653/v1/p19-1656","url":null,"abstract":"Human language is often multimodal, which comprehends a mixture of natural language, facial gestures, and acoustic behaviors. However, two major challenges in modeling such multimodal human language time-series data exist: 1) inherent data non-alignment due to variable sampling rates for the sequences from each modality; and 2) long-range dependencies between elements across modalities. In this paper, we introduce the Multimodal Transformer (MulT) to generically address the above issues in an end-to-end manner without explicitly aligning the data. At the heart of our model is the directional pairwise cross-modal attention, which attends to interactions between multimodal sequences across distinct time steps and latently adapt streams from one modality to another. Comprehensive experiments on both aligned and non-aligned multimodal time-series show that our model outperforms state-of-the-art methods by a large margin. In addition, empirical analysis suggests that correlated crossmodal signals are able to be captured by the proposed crossmodal attention mechanism in MulT.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"6558-6569"},"PeriodicalIF":0.0,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7195022/pdf/nihms-1570579.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"37896067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hybrid Attention based Multimodal Network for Spoken Language Classification. 用于口语分类的基于注意力的多模式混合网络。

Proceedings of the conference. Association for Computational Linguistics. Meeting

Pub Date : 2018-08-01

Yue Gu, Kangning Yang, Shiyu Fu, Shuhong Chen, Xinyu Li, Ivan Marsic

We examine the utility of linguistic content and vocal characteristics for multimodal deep learning in human spoken language understanding. We present a deep multimodal network with both feature attention and modality attention to classify utterance-level speech data. The proposed hybrid attention architecture helps the system focus on learning informative representations for both modality-specific feature extraction and model fusion. The experimental results show that our system achieves state-of-the-art or competitive results on three published multimodal datasets. We also demonstrated the effectiveness and generalization of our system on a medical speech dataset from an actual trauma scenario. Furthermore, we provided a detailed comparison and analysis of traditional approaches and deep learning methods on both feature extraction and fusion.

我们研究了语言内容和声音特征在人类口语理解中对多模式深度学习的效用。我们提出了一个同时具有特征注意和模态注意的深度多模态网络来对话语级语音数据进行分类。所提出的混合注意力架构有助于系统专注于学习用于模态特定特征提取和模型融合的信息表示。实验结果表明，我们的系统在三个已发表的多模态数据集上取得了最先进或有竞争力的结果。我们还在实际创伤场景的医学语音数据集上展示了我们系统的有效性和通用性。此外，我们对传统方法和深度学习方法在特征提取和融合方面进行了详细的比较和分析。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the conference. Association for Computational Linguistics. Meeting

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀