首页 > 最新文献

Proceedings of the conference. Association for Computational Linguistics. Meeting最新文献

英文 中文
Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges. 自动总结临床试验证据:一个突出当前挑战的原型。
Sanjana Ramprasad, Iain J Marshall, Denis Jered McInerney, Byron C Wallace

We present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work (Marshall et al., 2020), the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-k such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials. We consider two architectures: A standard sequence-to-sequence model based on BART (Lewis et al., 2019), and a multi-headed architecture intended to provide greater transparency to end-users. Both models produce fluent and relevant summaries of evidence retrieved for queries, but their tendency to introduce unsupported statements render them inappropriate for use in this domain at present. The proposed architecture may help users verify outputs allowing users to trace generated tokens back to inputs. The demonstration video is available at: https://vimeo.com/735605060 The prototype, source code, and model weights are available at: https://sanjanaramprasad.github.io/trials-summarizer/.

我们提出了TrialsSummarizer,一个旨在自动总结与给定查询最相关的随机对照试验中出现的证据的系统。在先前工作的基础上(Marshall et al., 2020),系统检索与指定条件、干预措施和结果组合的查询匹配的试验出版物,并根据样本量和估计的研究质量对这些出版物进行排名。前k个这样的研究通过神经多文件摘要系统,产生这些试验的摘要。我们考虑了两种架构:基于BART的标准序列到序列模型(Lewis et al., 2019),以及旨在为最终用户提供更大透明度的多头架构。这两种模型都会生成为查询检索的证据的流畅和相关的摘要,但是它们倾向于引入不受支持的语句,这使得它们目前不适合在这个领域中使用。所建议的体系结构可以帮助用户验证输出,允许用户跟踪生成的令牌到输入。演示视频可在:https://vimeo.com/735605060。原型、源代码和模型权重可在:https://sanjanaramprasad.github.io/trials-summarizer/。
{"title":"Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges.","authors":"Sanjana Ramprasad,&nbsp;Iain J Marshall,&nbsp;Denis Jered McInerney,&nbsp;Byron C Wallace","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We present <i>TrialsSummarizer</i>, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work (Marshall et al., 2020), the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-<i>k</i> such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials. We consider two architectures: A standard sequence-to-sequence model based on BART (Lewis et al., 2019), and a multi-headed architecture intended to provide greater transparency to end-users. Both models produce fluent and relevant summaries of evidence retrieved for queries, but their tendency to introduce unsupported statements render them inappropriate for use in this domain at present. The proposed architecture may help users verify outputs allowing users to trace generated tokens back to inputs. The demonstration video is available at: https://vimeo.com/735605060 The prototype, source code, and model weights are available at: https://sanjanaramprasad.github.io/trials-summarizer/.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2023 ","pages":"236-247"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361334/pdf/nihms-1912129.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10240091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IRMA: the 335-million-word Italian coRpus for studying MisinformAtion. IRMA: 3.35亿字的意大利语语料库,用于研究错误信息。
Fabio Carrella, Alessandro Miani, Stephan Lewandowsky

The dissemination of false information on the internet has received considerable attention over the last decade. Misinformation often spreads faster than mainstream news, thus making manual fact checking inefficient or, at best, labor-intensive. Therefore, there is an increasing need to develop methods for automatic detection of misinformation. Although resources for creating such methods are available in English, other languages are often underrepresented in this effort. With this contribution, we present IRMA, a corpus containing over 600,000 Italian news articles (335+ million tokens) collected from 56 websites classified as 'untrustworthy' by professional factcheckers. The corpus is freely available and comprises a rich set of text- and website-level data, representing a turnkey resource to test hypotheses and develop automatic detection algorithms. It contains texts, titles, and dates (from 2004 to 2022), along with three types of semantic measures (i.e., keywords, topics at three different resolutions, and LIWC lexical features). IRMA also includes domainspecific information such as source type (e.g., political, health, conspiracy, etc.), quality, and higher-level metadata, including several metrics of website incoming traffic that allow to investigate user online behavior. IRMA constitutes the largest corpus of misinformation available today in Italian, making it a valid tool for advancing quantitative research on untrustworthy news detection and ultimately helping limit the spread of misinformation.

在过去的十年里,互联网上虚假信息的传播受到了相当大的关注。错误信息的传播速度往往快于主流新闻,从而使人工事实核查效率低下,或者充其量是劳动密集型的。因此,越来越需要开发自动检测错误信息的方法。尽管创建这类方法的资源有英文版本,但其他语言在这方面的表现往往不足。有了这一贡献,我们提出了IRMA,这是一个包含超过60万篇意大利新闻文章(3.35亿代币)的语料库,这些文章收集自56个被专业事实检查员分类为“不可信”的网站。该语料库是免费提供的,包括一组丰富的文本和网站级数据,代表了测试假设和开发自动检测算法的交钥匙资源。它包含文本、标题和日期(从2004年到2022年),以及三种类型的语义度量(即关键字、三种不同分辨率的主题和LIWC词汇特征)。IRMA还包括特定领域的信息,如来源类型(例如,政治、健康、阴谋等)、质量和更高层次的元数据,包括允许调查用户在线行为的网站传入流量的几个指标。IRMA构成了目前意大利最大的错误信息语料库,使其成为推进不可信新闻检测定量研究的有效工具,并最终有助于限制错误信息的传播。
{"title":"IRMA: the 335-million-word Italian coRpus for studying MisinformAtion.","authors":"Fabio Carrella, Alessandro Miani, Stephan Lewandowsky","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The dissemination of false information on the internet has received considerable attention over the last decade. Misinformation often spreads faster than mainstream news, thus making manual fact checking inefficient or, at best, labor-intensive. Therefore, there is an increasing need to develop methods for automatic detection of misinformation. Although resources for creating such methods are available in English, other languages are often underrepresented in this effort. With this contribution, we present IRMA, a corpus containing over 600,000 Italian news articles (335+ million tokens) collected from 56 websites classified as 'untrustworthy' by professional factcheckers. The corpus is freely available and comprises a rich set of text- and website-level data, representing a turnkey resource to test hypotheses and develop automatic detection algorithms. It contains texts, titles, and dates (from 2004 to 2022), along with three types of semantic measures (i.e., keywords, topics at three different resolutions, and LIWC lexical features). IRMA also includes domainspecific information such as source type (e.g., political, health, conspiracy, etc.), quality, and higher-level metadata, including several metrics of website incoming traffic that allow to investigate user online behavior. IRMA constitutes the largest corpus of misinformation available today in Italian, making it a valid tool for advancing quantitative research on untrustworthy news detection and ultimately helping limit the spread of misinformation.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2023 ","pages":"2339-2349"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7615326/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138300729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges 自动总结临床试验证据:一个突出当前挑战的原型
Pub Date : 2023-03-07 DOI: 10.48550/arXiv.2303.05392
S. Ramprasad, Denis Jered McInerney, Iain J. Marshal, Byron Wallace
In this work we present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality.The top-k such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials. We consider two architectures: A standard sequence-to-sequence model based on BART, and a multi-headed architecture intended to provide greater transparency and controllability to end-users.Both models produce fluent and relevant summaries of evidence retrieved for queries, but their tendency to introduce unsupported statements render them inappropriate for use in this domain at present.The proposed architecture may help users verify outputs allowing users to trace generated tokens back to inputs. The demonstration video can be found at https://vimeo.com/735605060The prototype, source code, and model weights are available at: https://sanjanaramprasad.github.io/trials-summarizer/
在这项工作中,我们提出了TrialsSummarizer,这是一个旨在自动总结与给定查询最相关的随机对照试验中提供的证据的系统。在先前工作的基础上,系统检索与指定条件、干预措施和结果组合的查询相匹配的试验出版物,并根据样本量和估计的研究质量对这些出版物进行排名。前k个这样的研究通过神经多文件摘要系统,产生这些试验的摘要。我们考虑两种体系结构:基于BART的标准序列到序列模型,以及旨在为最终用户提供更大透明度和可控制性的多头体系结构。这两种模型都会生成为查询检索的证据的流畅和相关的摘要,但是它们倾向于引入不受支持的语句,这使得它们目前不适合在这个领域中使用。所建议的体系结构可以帮助用户验证输出,允许用户跟踪生成的令牌到输入。演示视频可在https://vimeo.com/735605060The上找到,原型、源代码和模型权重可在https://sanjanaramprasad.github.io/trials-summarizer/上找到
{"title":"Automatically Summarizing Evidence from Clinical Trials: A Prototype Highlighting Current Challenges","authors":"S. Ramprasad, Denis Jered McInerney, Iain J. Marshal, Byron Wallace","doi":"10.48550/arXiv.2303.05392","DOIUrl":"https://doi.org/10.48550/arXiv.2303.05392","url":null,"abstract":"In this work we present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work, the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality.The top-k such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials. We consider two architectures: A standard sequence-to-sequence model based on BART, and a multi-headed architecture intended to provide greater transparency and controllability to end-users.Both models produce fluent and relevant summaries of evidence retrieved for queries, but their tendency to introduce unsupported statements render them inappropriate for use in this domain at present.The proposed architecture may help users verify outputs allowing users to trace generated tokens back to inputs. The demonstration video can be found at https://vimeo.com/735605060The prototype, source code, and model weights are available at: https://sanjanaramprasad.github.io/trials-summarizer/","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"71 1","pages":"236-247"},"PeriodicalIF":0.0,"publicationDate":"2023-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85881756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Self-Repetition in Abstractive Neural Summarizers. 抽象神经总结器中的自我重复。
Nikita Salkar, Thomas Trikalinos, Byron C Wallace, Ani Nenkova

We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of n-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5 and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three architectures have different propensities for repeating content across output summaries for inputs, with BART being particularly prone to self-repetition. Fine-tuning on more abstractive data, and on data featuring formulaic language, is associated with a higher rate of self-repetition. In qualitative analysis we find systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. Our approach to corpus level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition.

我们对神经总结器输出中的自我重复进行了定量和定性分析。我们衡量自我重复的方法是在同一系统的多个输出中出现长度为4或更长的n-grams的数量。我们分析了三种流行架构(BART, T5和Pegasus)的行为,并在五个数据集上进行了微调。在回归分析中,我们发现这三种架构在输入的输出摘要中重复内容的倾向不同,BART特别倾向于自我重复。对更抽象的数据和以公式化语言为特征的数据进行微调,与更高的自我重复率相关。在定性分析中,我们发现系统产生了诸如广告和免责声明等与被总结的内容无关的工件,以及微调领域中常见的公式化短语。我们对自我重复的语料库级分析方法可以帮助从业者为总结者清理训练数据,并最终支持最小化自我重复量的方法。
{"title":"Self-Repetition in Abstractive Neural Summarizers.","authors":"Nikita Salkar,&nbsp;Thomas Trikalinos,&nbsp;Byron C Wallace,&nbsp;Ani Nenkova","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of <i>n</i>-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5 and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three architectures have different propensities for repeating content across output summaries for inputs, with BART being particularly prone to self-repetition. Fine-tuning on more abstractive data, and on data featuring formulaic language, is associated with a higher rate of self-repetition. In qualitative analysis we find systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. Our approach to corpus level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2022 ","pages":"341-350"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361333/pdf/nihms-1912154.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10240591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-Repetition in Abstractive Neural Summarizers 抽象神经总结器中的自我重复
Pub Date : 2022-10-14 DOI: 10.48550/arXiv.2210.08145
Nikita Salkar, T. Trikalinos, Byron C. Wallace, A. Nenkova
We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of n-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5, and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three architectures have different propensities for repeating content across output summaries for inputs, with BART being particularly prone to self-repetition. Fine-tuning on more abstractive data, and on data featuring formulaic language is associated with a higher rate of self-repetition. In qualitative analysis, we find systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. Our approach to corpus-level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition.
我们对神经总结器输出中的自我重复进行了定量和定性分析。我们衡量自我重复的方法是在同一系统的多个输出中出现长度为4或更长的n-grams的数量。我们分析了三种流行架构(BART、T5和Pegasus)的行为,并对五个数据集进行了微调。在回归分析中,我们发现这三种架构在输入的输出摘要中重复内容的倾向不同,BART特别倾向于自我重复。对更抽象的数据和以公式化语言为特征的数据进行微调与更高的自我重复率相关。在定性分析中,我们发现系统产生诸如广告和免责声明等与被总结的内容无关的工件,以及微调领域中常见的公式化短语。我们对自我重复的语料库级分析方法可以帮助从业者为总结者清理训练数据,并最终支持最小化自我重复量的方法。
{"title":"Self-Repetition in Abstractive Neural Summarizers","authors":"Nikita Salkar, T. Trikalinos, Byron C. Wallace, A. Nenkova","doi":"10.48550/arXiv.2210.08145","DOIUrl":"https://doi.org/10.48550/arXiv.2210.08145","url":null,"abstract":"We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of n-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5, and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three architectures have different propensities for repeating content across output summaries for inputs, with BART being particularly prone to self-repetition. Fine-tuning on more abstractive data, and on data featuring formulaic language is associated with a higher rate of self-repetition. In qualitative analysis, we find systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. Our approach to corpus-level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"74 5 1","pages":"341-350"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83847902","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluating Factuality in Text Simplification. 文本简化中的真实性评价。
Pub Date : 2022-05-01 DOI: 10.18653/v1/2022.acl-long.506
Ashwin Devaraj, William Sheffield, Byron C Wallace, Junyi Jessy Li

Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.

自动简化模型旨在使输入文本更具可读性。这种方法有可能使更广泛的受众能够获得复杂的信息,例如,提供获取最近医学文献的途径,否则这些文献对于外行读者来说可能难以理解。然而,这样的模型可能会在自动简化的文本中引入错误,例如插入相应的原始文本不支持的语句,或者省略关键信息。在许多情况下,提供可读性更强但不准确的文本版本可能比完全不提供这种访问更糟糕。摘要模型中事实准确性(及其缺乏)的问题已受到高度关注,但自动简化文本的事实性尚未得到调查。我们引入了一个错误分类,我们使用它来分析从标准简化数据集和最先进的模型输出中提取的参考。我们发现,错误经常出现在现有的评估指标中,这促使人们需要研究确保自动化简化模型的事实准确性。
{"title":"Evaluating Factuality in Text Simplification.","authors":"Ashwin Devaraj,&nbsp;William Sheffield,&nbsp;Byron C Wallace,&nbsp;Junyi Jessy Li","doi":"10.18653/v1/2022.acl-long.506","DOIUrl":"https://doi.org/10.18653/v1/2022.acl-long.506","url":null,"abstract":"<p><p>Automated <i>simplification</i> models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2022 ","pages":"7331-7345"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9671157/pdf/nihms-1847771.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10641375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EchoGen: A New Benchmark Study on Generating Conclusions from Echocardiogram Notes. EchoGen:从超声心动图笔记生成结论的新基准研究。
Pub Date : 2022-05-01 DOI: 10.18653/v1/2022.bionlp-1.35
Liyan Tang, Shravan Kooragayalu, Yanshan Wang, Ying Ding, Greg Durrett, Justin F Rousseau, Yifan Peng

Generating a summary from findings has been recently explored (Zhang et al., 2018, 2020) in note types such as radiology reports that typically have short length. In this work, we focus on echocardiogram notes that is longer and more complex compared to previous note types. We formally define the task of echocardiography conclusion generation (EchoGen) as generating a conclusion given the findings section, with emphasis on key cardiac findings. To promote the development of EchoGen methods, we present a new benchmark, which consists of two datasets collected from two hospitals. We further compare both standard and state-of-the-art methods on this new benchmark, with an emphasis on factual consistency. To accomplish this, we develop a tool to automatically extract concept-attribute tuples from the text. We then propose an evaluation metric, FactComp, to compare concept-attribute tuples between the human reference and generated conclusions. Both automatic and human evaluations show that there is still a significant gap between human-written and machine-generated conclusions on echo reports in terms of factuality and overall quality.

最近已经探索了从研究结果中生成摘要(Zhang et al., 2018,2020),例如通常长度较短的放射学报告。在这项工作中,我们将重点放在超声心动图音符上,这些音符比以前的音符类型更长、更复杂。我们正式将超声心动图结论生成(EchoGen)的任务定义为根据发现部分生成结论,重点是关键的心脏发现。为了促进EchoGen方法的发展,我们提出了一个新的基准,它由来自两家医院的两个数据集组成。我们在这个新的基准上进一步比较了标准和最先进的方法,重点是事实的一致性。为了实现这一点,我们开发了一个工具来自动从文本中提取概念属性元组。然后,我们提出了一个评估指标FactComp,用于比较人类参考和生成结论之间的概念属性元组。自动评估和人工评估都表明,在真实性和总体质量方面,人工编写的结论与机器生成的结论之间仍然存在很大差距。
{"title":"EchoGen: A New Benchmark Study on Generating Conclusions from Echocardiogram Notes.","authors":"Liyan Tang,&nbsp;Shravan Kooragayalu,&nbsp;Yanshan Wang,&nbsp;Ying Ding,&nbsp;Greg Durrett,&nbsp;Justin F Rousseau,&nbsp;Yifan Peng","doi":"10.18653/v1/2022.bionlp-1.35","DOIUrl":"https://doi.org/10.18653/v1/2022.bionlp-1.35","url":null,"abstract":"<p><p>Generating a summary from findings has been recently explored (Zhang et al., 2018, 2020) in note types such as radiology reports that typically have short length. In this work, we focus on echocardiogram notes that is longer and more complex compared to previous note types. We formally define the task of echocardiography conclusion generation (<b>EchoGen</b>) as generating a conclusion given the findings section, with emphasis on key cardiac findings. To promote the development of EchoGen methods, we present a new benchmark, which consists of two datasets collected from two hospitals. We further compare both standard and state-of-the-art methods on this new benchmark, with an emphasis on factual consistency. To accomplish this, we develop a tool to automatically extract concept-attribute tuples from the text. We then propose an evaluation metric, <i>FactComp</i>, to compare concept-attribute tuples between the human reference and generated conclusions. Both automatic and human evaluations show that there is still a significant gap between human-written and machine-generated conclusions on echo reports in terms of factuality and overall quality.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"359-368"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9634991/pdf/nihms-1844028.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40669497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks. 使用Siamese网络评估UMLS元词典中生物医学词嵌入的词汇对齐。
Pub Date : 2022-05-01 DOI: 10.18653/v1/2022.insights-1.11
Goonmeet Bajaj, Vinh Nguyen, Thilini Wijesiriwardene, Hong Yung Yip, Vishesh Javangula, Srinivasan Parthasarathy, Amit Sheth, Olivier Bodenreider

Recent work uses a Siamese Network, initialized with BioWordVec embeddings (distributed word embeddings), for predicting synonymy among biomedical terms to automate a part of the UMLS (Unified Medical Language System) Metathesaurus construction process. We evaluate the use of contextualized word embeddings extracted from nine different biomedical BERT-based models for synonymy prediction in the UMLS by replacing BioWordVec embeddings with embeddings extracted from each biomedical BERT model using different feature extraction methods. Surprisingly, we find that Siamese Networks initialized with BioWordVec embeddings still outperform the Siamese Networks initialized with embedding extracted from biomedical BERT model.

最近的工作使用了一个Siamese网络,用BioWordVec嵌入(分布式词嵌入)初始化,用于预测生物医学术语之间的同义词,以自动完成UMLS(统一医学语言系统)元同义词库构建过程的一部分。通过使用不同的特征提取方法从每个生物医学BERT模型中提取的嵌入替换BioWordVec嵌入,我们评估了从9种不同的生物医学BERT模型中提取的上下文化词嵌入在UMLS同义词预测中的使用情况。令人惊讶的是,我们发现用BioWordVec嵌入初始化的Siamese网络仍然优于用生物医学BERT模型提取的嵌入初始化的Siamese网络。
{"title":"Evaluating Biomedical Word Embeddings for Vocabulary Alignment at Scale in the UMLS Metathesaurus Using Siamese Networks.","authors":"Goonmeet Bajaj, Vinh Nguyen, Thilini Wijesiriwardene, Hong Yung Yip, Vishesh Javangula, Srinivasan Parthasarathy, Amit Sheth, Olivier Bodenreider","doi":"10.18653/v1/2022.insights-1.11","DOIUrl":"10.18653/v1/2022.insights-1.11","url":null,"abstract":"<p><p>Recent work uses a Siamese Network, initialized with BioWordVec embeddings (distributed word embeddings), for predicting synonymy among biomedical terms to automate a part of the UMLS (Unified Medical Language System) Metathesaurus construction process. We evaluate the use of contextualized word embeddings extracted from nine different biomedical BERT-based models for synonymy prediction in the UMLS by replacing BioWordVec embeddings with embeddings extracted from each biomedical BERT model using different feature extraction methods. Surprisingly, we find that Siamese Networks initialized with BioWordVec embeddings still outperform the Siamese Networks initialized with embedding extracted from biomedical BERT model.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":" ","pages":"82-87"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9455661/pdf/nihms-1833238.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33461234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models. GPT-D:通过人工神经语言模型的故意退化诱导痴呆相关的语言异常。
Pub Date : 2022-05-01 DOI: 10.18653/v1/2022.acl-long.131
Changye Li, David Knopman, Weizhe Xu, Trevor Cohen, Serguei Pakhomov

Deep learning (DL) techniques involving fine-tuning large numbers of model parameters have delivered impressive performance on the task of discriminating between language produced by cognitively healthy individuals, and those with Alzheimer's disease (AD). However, questions remain about their ability to generalize beyond the small reference sets that are publicly available for research. As an alternative to fitting model parameters directly, we propose a novel method by which a Transformer DL model (GPT-2) pre-trained on general English text is paired with an artificially degraded version of itself (GPT-D), to compute the ratio between these two models' perplexities on language from cognitively healthy and impaired individuals. This technique approaches state-of-the-art performance on text data from a widely used "Cookie Theft" picture description task, and unlike established alternatives also generalizes well to spontaneous conversations. Furthermore, GPT-D generates text with characteristics known to be associated with AD, demonstrating the induction of dementia-related linguistic anomalies. Our study is a step toward better understanding of the relationships between the inner workings of generative neural language models, the language that they produce, and the deleterious effects of dementia on human speech and language characteristics.

深度学习(DL)技术涉及对大量模型参数进行微调,在区分认知健康个体和阿尔茨海默病(AD)患者产生的语言方面取得了令人印象深刻的成绩。然而,问题仍然是他们的能力,以推广超出小的参考集,是公开的研究。作为直接拟合模型参数的一种替代方法,我们提出了一种新的方法,通过对一般英语文本进行预训练的Transformer DL模型(GPT-2)与人工退化的Transformer DL模型(GPT-D)配对,计算这两个模型对认知健康个体和认知受损个体的语言困惑度之比。该技术在广泛使用的“Cookie盗窃”图片描述任务的文本数据上达到了最先进的性能,并且与现有的替代方案不同,它也可以很好地推广到自发对话中。此外,GPT-D生成的文本具有已知与AD相关的特征,证明了与痴呆症相关的语言异常的诱导。我们的研究朝着更好地理解生成神经语言模型的内部工作、它们产生的语言以及痴呆症对人类语言和语言特征的有害影响之间的关系迈出了一步。
{"title":"GPT-D: Inducing Dementia-related Linguistic Anomalies by Deliberate Degradation of Artificial Neural Language Models.","authors":"Changye Li, David Knopman, Weizhe Xu, Trevor Cohen, Serguei Pakhomov","doi":"10.18653/v1/2022.acl-long.131","DOIUrl":"10.18653/v1/2022.acl-long.131","url":null,"abstract":"<p><p>Deep learning (DL) techniques involving fine-tuning large numbers of model parameters have delivered impressive performance on the task of discriminating between language produced by cognitively healthy individuals, and those with Alzheimer's disease (AD). However, questions remain about their ability to generalize beyond the small reference sets that are publicly available for research. As an alternative to fitting model parameters directly, we propose a novel method by which a Transformer DL model (GPT-2) pre-trained on general English text is paired with an artificially degraded version of itself (GPT-D), to compute the ratio between these two models' <i>perplexities</i> on language from cognitively healthy and impaired individuals. This technique approaches state-of-the-art performance on text data from a widely used \"Cookie Theft\" picture description task, and unlike established alternatives also generalizes well to spontaneous conversations. Furthermore, GPT-D generates text with characteristics known to be associated with AD, demonstrating the induction of dementia-related linguistic anomalies. Our study is a step toward better understanding of the relationships between the inner workings of generative neural language models, the language that they produce, and the deleterious effects of dementia on human speech and language characteristics.</p>","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2022 ","pages":"1866-1877"},"PeriodicalIF":0.0,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11753619/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143026029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Factuality in Text Simplification 文本简化中的真实性评价
Pub Date : 2022-04-15 DOI: 10.48550/arXiv.2204.07562
Ashwin Devaraj, William Sheffield, Byron C. Wallace, Junyi Jessy Li
Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.
自动简化模型旨在使输入文本更具可读性。这种方法有可能使更广泛的受众能够获得复杂的信息,例如,提供获取最近医学文献的途径,否则这些文献对于外行读者来说可能难以理解。然而,这样的模型可能会在自动简化的文本中引入错误,例如插入相应的原始文本不支持的语句,或者省略关键信息。在许多情况下,提供可读性更强但不准确的文本版本可能比完全不提供这种访问更糟糕。摘要模型中事实准确性(及其缺乏)的问题已受到高度关注,但自动简化文本的事实性尚未得到调查。我们引入了一个错误分类,我们使用它来分析从标准简化数据集和最先进的模型输出中提取的参考。我们发现,错误经常出现在现有的评估指标中,这促使人们需要研究确保自动化简化模型的事实准确性。
{"title":"Evaluating Factuality in Text Simplification","authors":"Ashwin Devaraj, William Sheffield, Byron C. Wallace, Junyi Jessy Li","doi":"10.48550/arXiv.2204.07562","DOIUrl":"https://doi.org/10.48550/arXiv.2204.07562","url":null,"abstract":"Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"16 1","pages":"7331-7345"},"PeriodicalIF":0.0,"publicationDate":"2022-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87084766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
期刊
Proceedings of the conference. Association for Computational Linguistics. Meeting
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1