首页 > 最新文献

International Workshop on Health Text Mining and Information Analysis最新文献

英文 中文
Curriculum-guided Abstractive Summarization for Mental Health Online Posts 以课程为导向的心理健康网络帖子摘要
Pub Date : 2023-02-02 DOI: 10.48550/arXiv.2302.00954
Sajad Sotudeh, Nazli Goharian, Hanieh Deilamsalehy, Franck Dernoncourt
Automatically generating short summaries from users’ online mental health posts could save counselors’ reading time and reduce their fatigue so that they can provide timely responses to those seeking help for improving their mental state. Recent Transformers-based summarization models have presented a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have a prominent shortcoming; their training strategy is not quite efficient, which restricts the model’s performance. In this paper, we include a curriculum learning approach to reweigh the training samples, bringing about an efficient learning procedure. We apply our model on extreme summarization dataset of MentSum posts —-a dataset of mental health related posts from Reddit social media. Compared to the state-of-the-art model, our proposed method makes substantial gains in terms of Rouge and Bertscore evaluation metrics, yielding 3.5% Rouge-1, 10.4% Rouge-2, and 4.7% Rouge-L, 1.5% Bertscore relative improvements.
从用户的在线心理健康帖子中自动生成简短的摘要,可以节省咨询师的阅读时间,减少他们的疲劳,以便他们能够及时回应寻求帮助的人,以改善他们的心理状态。最近基于transformer的摘要模型提出了一种很有前途的抽象摘要方法。他们超越了句子选择和提取策略来处理更复杂的任务,如新单词生成和句子释义。然而,这些模型有一个突出的缺点;他们的训练策略不是很有效,这限制了模型的性能。在本文中,我们采用课程学习的方法来重估训练样本的权重,从而实现一个高效的学习过程。我们将模型应用于MentSum帖子的极端汇总数据集——一个来自Reddit社交媒体的心理健康相关帖子的数据集。与最先进的模型相比,我们提出的方法在Rouge和Bertscore评估指标方面取得了巨大的进步,产生了3.5%的Rouge-1, 10.4%的Rouge-2和4.7%的Rouge- l, 1.5%的Bertscore相对改进。
{"title":"Curriculum-guided Abstractive Summarization for Mental Health Online Posts","authors":"Sajad Sotudeh, Nazli Goharian, Hanieh Deilamsalehy, Franck Dernoncourt","doi":"10.48550/arXiv.2302.00954","DOIUrl":"https://doi.org/10.48550/arXiv.2302.00954","url":null,"abstract":"Automatically generating short summaries from users’ online mental health posts could save counselors’ reading time and reduce their fatigue so that they can provide timely responses to those seeking help for improving their mental state. Recent Transformers-based summarization models have presented a promising approach to abstractive summarization. They go beyond sentence selection and extractive strategies to deal with more complicated tasks such as novel word generation and sentence paraphrasing. Nonetheless, these models have a prominent shortcoming; their training strategy is not quite efficient, which restricts the model’s performance. In this paper, we include a curriculum learning approach to reweigh the training samples, bringing about an efficient learning procedure. We apply our model on extreme summarization dataset of MentSum posts —-a dataset of mental health related posts from Reddit social media. Compared to the state-of-the-art model, our proposed method makes substantial gains in terms of Rouge and Bertscore evaluation metrics, yielding 3.5% Rouge-1, 10.4% Rouge-2, and 4.7% Rouge-L, 1.5% Bertscore relative improvements.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123191111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proxy-based Zero-Shot Entity Linking by Effective Candidate Retrieval 基于有效候选检索的代理零射击实体链接
Pub Date : 2023-01-30 DOI: 10.48550/arXiv.2301.13318
Maciej Wiatrak, Eirini Arvaniti, Angus Brayne, Jonas Vetterle, Aaron Sim
A recent advancement in the domain of biomedical Entity Linking is the development of powerful two-stage algorithms – an initial candidate retrieval stage that generates a shortlist of entities for each mention, followed by a candidate ranking stage. However, the effectiveness of both stages are inextricably dependent on computationally expensive components. Specifically, in candidate retrieval via dense representation retrieval it is important to have hard negative samples, which require repeated forward passes and nearest neighbour searches across the entire entity label set throughout training. In this work, we show that pairing a proxy-based metric learning loss with an adversarial regularizer provides an efficient alternative to hard negative sampling in the candidate retrieval stage. In particular, we show competitive performance on the recall@1 metric, thereby providing the option to leave out the expensive candidate ranking step. Finally, we demonstrate how the model can be used in a zero-shot setting to discover out of knowledge base biomedical entities.
生物医学实体链接领域的最新进展是强大的两阶段算法的发展-初始候选检索阶段,为每次提及生成候选实体列表,然后是候选排名阶段。然而,这两个阶段的有效性都不可避免地依赖于计算昂贵的组件。具体来说,在通过密集表示检索的候选检索中,重要的是要有硬负样本,这需要在整个训练过程中在整个实体标签集上重复向前传递和最近邻搜索。在这项工作中,我们证明了将基于代理的度量学习损失与对抗正则化器配对,在候选检索阶段提供了硬负抽样的有效替代方案。特别是,我们在recall@1指标上显示了竞争性能,从而提供了省略昂贵的候选人排名步骤的选项。最后,我们演示了如何在零射击设置中使用该模型来发现知识库之外的生物医学实体。
{"title":"Proxy-based Zero-Shot Entity Linking by Effective Candidate Retrieval","authors":"Maciej Wiatrak, Eirini Arvaniti, Angus Brayne, Jonas Vetterle, Aaron Sim","doi":"10.48550/arXiv.2301.13318","DOIUrl":"https://doi.org/10.48550/arXiv.2301.13318","url":null,"abstract":"A recent advancement in the domain of biomedical Entity Linking is the development of powerful two-stage algorithms – an initial candidate retrieval stage that generates a shortlist of entities for each mention, followed by a candidate ranking stage. However, the effectiveness of both stages are inextricably dependent on computationally expensive components. Specifically, in candidate retrieval via dense representation retrieval it is important to have hard negative samples, which require repeated forward passes and nearest neighbour searches across the entire entity label set throughout training. In this work, we show that pairing a proxy-based metric learning loss with an adversarial regularizer provides an efficient alternative to hard negative sampling in the candidate retrieval stage. In particular, we show competitive performance on the recall@1 metric, thereby providing the option to leave out the expensive candidate ranking step. Finally, we demonstrate how the model can be used in a zero-shot setting to discover out of knowledge base biomedical entities.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130324122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text 去识别对临床文本下游命名实体识别的影响
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.1
Hanna Berg, Aron Henriksson, H. Dalianis
The impact of de-identification on data quality and, in particular, utility for developing models for downstream tasks has been more thoroughly studied for structured data than for unstructured text. While previous studies indicate that text de-identification has a limited impact on models for downstream tasks, it remains unclear what the impact is with various levels and forms of de-identification, in particular concerning the trade-off between precision and recall. In this paper, the impact of de-identification is studied on downstream named entity recognition in Swedish clinical text. The results indicate that de-identification models with moderate to high precision lead to similar downstream performance, while low precision has a substantial negative impact. Furthermore, different strategies for concealing sensitive information affect performance to different degrees, ranging from pseudonymisation having a low impact to the removal of entire sentences with sensitive information having a high impact. This study indicates that it is possible to increase the recall of models for identifying sensitive information without negatively affecting the use of de-identified text data for training models for clinical named entity recognition; however, there is ultimately a trade-off between the level of de-identification and the subsequent utility of the data.
去标识化对数据质量的影响,特别是对开发下游任务模型的效用,对结构化数据的研究比对非结构化文本的研究更深入。虽然以前的研究表明,文本去识别对下游任务的模型影响有限,但目前尚不清楚不同层次和形式的去识别对模型的影响,特别是关于准确性和召回率之间的权衡。在本文中,去识别研究对下游命名实体识别瑞典临床文本的影响。结果表明,中高精度的去识别模型会导致相似的下游性能,而低精度的去识别模型会产生实质性的负面影响。此外,不同的隐藏敏感信息的策略对性能的影响程度不同,从假名化影响低到删除包含敏感信息的整个句子影响高。本研究表明,在不负面影响临床命名实体识别训练模型使用去标识文本数据的情况下,有可能增加识别敏感信息模型的召回率;然而,最终在去识别级别和随后的数据效用之间存在权衡。
{"title":"The Impact of De-identification on Downstream Named Entity Recognition in Clinical Text","authors":"Hanna Berg, Aron Henriksson, H. Dalianis","doi":"10.18653/v1/2020.louhi-1.1","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.1","url":null,"abstract":"The impact of de-identification on data quality and, in particular, utility for developing models for downstream tasks has been more thoroughly studied for structured data than for unstructured text. While previous studies indicate that text de-identification has a limited impact on models for downstream tasks, it remains unclear what the impact is with various levels and forms of de-identification, in particular concerning the trade-off between precision and recall. In this paper, the impact of de-identification is studied on downstream named entity recognition in Swedish clinical text. The results indicate that de-identification models with moderate to high precision lead to similar downstream performance, while low precision has a substantial negative impact. Furthermore, different strategies for concealing sensitive information affect performance to different degrees, ranging from pseudonymisation having a low impact to the removal of entire sentences with sensitive information having a high impact. This study indicates that it is possible to increase the recall of models for identifying sensitive information without negatively affecting the use of de-identified text data for training models for clinical named entity recognition; however, there is ultimately a trade-off between the level of de-identification and the subsequent utility of the data.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116644944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Detection of Mental Health from Reddit via Deep Contextualized Representations 基于深度情境化表征的Reddit用户心理健康检测
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.16
Zhengping Jiang, Sarah Ita Levitan, Jonathan Zomick, Julia Hirschberg
We address the problem of automatic detection of psychiatric disorders from the linguistic content of social media posts. We build a large scale dataset of Reddit posts from users with eight disorders and a control user group. We extract and analyze linguistic characteristics of posts and identify differences between diagnostic groups. We build strong classification models based on deep contextualized word representations and show that they outperform previously applied statistical models with simple linguistic features by large margins. We compare user-level and post-level classification performance, as well as an ensembled multiclass model.
我们解决了从社交媒体帖子的语言内容中自动检测精神疾病的问题。我们建立了一个大规模的Reddit帖子数据集,这些帖子来自患有八种疾病的用户和一个控制用户组。我们提取和分析帖子的语言特征,并确定诊断组之间的差异。我们建立了基于深度上下文化词表示的强分类模型,并表明它们比以前应用的具有简单语言特征的统计模型要好得多。我们比较了用户级和后级分类性能,以及集成的多类模型。
{"title":"Detection of Mental Health from Reddit via Deep Contextualized Representations","authors":"Zhengping Jiang, Sarah Ita Levitan, Jonathan Zomick, Julia Hirschberg","doi":"10.18653/v1/2020.louhi-1.16","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.16","url":null,"abstract":"We address the problem of automatic detection of psychiatric disorders from the linguistic content of social media posts. We build a large scale dataset of Reddit posts from users with eight disorders and a control user group. We extract and analyze linguistic characteristics of posts and identify differences between diagnostic groups. We build strong classification models based on deep contextualized word representations and show that they outperform previously applied statistical models with simple linguistic features by large margins. We compare user-level and post-level classification performance, as well as an ensembled multiclass model.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124698699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Simple Hierarchical Multi-Task Neural End-To-End Entity Linking for Biomedical Text 生物医学文本的简单分层多任务神经端到端实体链接
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.2
Maciej Wiatrak, Juha Iso-Sipilä
Recognising and linking entities is a crucial first step to many tasks in biomedical text analysis, such as relation extraction and target identification. Traditionally, biomedical entity linking methods rely heavily on heuristic rules and predefined, often domain-specific features. The features try to capture the properties of entities and complex multi-step architectures to detect, and subsequently link entity mentions. We propose a significant simplification to the biomedical entity linking setup that does not rely on any heuristic methods. The system performs all the steps of the entity linking task jointly in either single or two stages. We explore the use of hierarchical multi-task learning, using mention recognition and entity typing tasks as auxiliary tasks. We show that hierarchical multi-task models consistently outperform single-task models when trained tasks are homogeneous. We evaluate the performance of our models on the biomedical entity linking benchmarks using MedMentions and BC5CDR datasets. We achieve state-of-theart results on the challenging MedMentions dataset, and comparable results on BC5CDR.
识别和链接实体是生物医学文本分析中许多任务至关重要的第一步,例如关系提取和目标识别。传统上,生物医学实体链接方法严重依赖启发式规则和预定义的,通常是特定于领域的特征。这些特征试图捕获实体的属性和复杂的多步骤架构来检测,并随后链接实体提及。我们提出了一个重要的简化生物医学实体链接设置,不依赖于任何启发式方法。系统在单个或两个阶段中联合执行实体链接任务的所有步骤。我们探索了分层多任务学习的使用,使用提及识别和实体输入任务作为辅助任务。我们表明,当训练的任务是同质的时,分层多任务模型始终优于单任务模型。我们使用medmention和BC5CDR数据集评估了我们的模型在生物医学实体链接基准上的性能。我们在具有挑战性的medmention数据集上取得了最先进的结果,并在BC5CDR上取得了可比较的结果。
{"title":"Simple Hierarchical Multi-Task Neural End-To-End Entity Linking for Biomedical Text","authors":"Maciej Wiatrak, Juha Iso-Sipilä","doi":"10.18653/v1/2020.louhi-1.2","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.2","url":null,"abstract":"Recognising and linking entities is a crucial first step to many tasks in biomedical text analysis, such as relation extraction and target identification. Traditionally, biomedical entity linking methods rely heavily on heuristic rules and predefined, often domain-specific features. The features try to capture the properties of entities and complex multi-step architectures to detect, and subsequently link entity mentions. We propose a significant simplification to the biomedical entity linking setup that does not rely on any heuristic methods. The system performs all the steps of the entity linking task jointly in either single or two stages. We explore the use of hierarchical multi-task learning, using mention recognition and entity typing tasks as auxiliary tasks. We show that hierarchical multi-task models consistently outperform single-task models when trained tasks are homogeneous. We evaluate the performance of our models on the biomedical entity linking benchmarks using MedMentions and BC5CDR datasets. We achieve state-of-theart results on the challenging MedMentions dataset, and comparable results on BC5CDR.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"405 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123201563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains 低资源域卫生资料的上下文感知自动文本简化
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.13
Tarek Sakakini, Jong Yoon Lee, Aditya Duri, R. F. Azevedo, V. Sadauskas, Kuangxiao Gu, S. Bhat, D. Morrow, J. Graumlich, Saqib Walayat, M. Hasegawa-Johnson, Thomas S. Huang, Ann M. Willemsen-Dunlap, Donald J. Halpin
Healthcare systems have increased patients’ exposure to their own health materials to enhance patients’ health levels, but this has been impeded by patients’ lack of understanding of their health material. We address potential barriers to their comprehension by developing a context-aware text simplification system for health material. Given the scarcity of annotated parallel corpora in healthcare domains, we design our system to be independent of a parallel corpus, complementing the availability of data-driven neural methods when such corpora are available. Our system compensates for the lack of direct supervision using a biomedical lexical database: Unified Medical Language System (UMLS). Compared to a competitive prior approach that uses a tool for identifying biomedical concepts and a consumer-directed vocabulary list, we empirically show the enhanced accuracy of our system due to improved handling of ambiguous terms. We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting. Finally, we show the direct impact of our system on laypeople’s comprehension of health material via a human subjects’ study (n=160).
卫生保健系统增加了患者对自己的卫生材料的接触,以提高患者的健康水平,但由于患者对自己的卫生材料缺乏了解,这一点受到了阻碍。我们解决潜在的障碍,他们的理解通过开发一个上下文感知文本简化系统的卫生材料。考虑到医疗保健领域中标注的并行语料库的稀缺性,我们将系统设计为独立于并行语料库的系统,当这些语料库可用时,补充数据驱动的神经方法的可用性。我们的系统弥补了缺乏直接监督使用生物医学词汇数据库:统一医学语言系统(UMLS)。与使用识别生物医学概念的工具和消费者导向词汇表的竞争性先前方法相比,我们的经验表明,由于改进了对歧义术语的处理,我们的系统的准确性得到了提高。我们还展示了在这种低资源环境下,我们的系统比直接监督的神经方法的准确性更高。最后,我们通过一项人类受试者研究(n=160)展示了我们的系统对外行人对健康材料理解的直接影响。
{"title":"Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains","authors":"Tarek Sakakini, Jong Yoon Lee, Aditya Duri, R. F. Azevedo, V. Sadauskas, Kuangxiao Gu, S. Bhat, D. Morrow, J. Graumlich, Saqib Walayat, M. Hasegawa-Johnson, Thomas S. Huang, Ann M. Willemsen-Dunlap, Donald J. Halpin","doi":"10.18653/v1/2020.louhi-1.13","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.13","url":null,"abstract":"Healthcare systems have increased patients’ exposure to their own health materials to enhance patients’ health levels, but this has been impeded by patients’ lack of understanding of their health material. We address potential barriers to their comprehension by developing a context-aware text simplification system for health material. Given the scarcity of annotated parallel corpora in healthcare domains, we design our system to be independent of a parallel corpus, complementing the availability of data-driven neural methods when such corpora are available. Our system compensates for the lack of direct supervision using a biomedical lexical database: Unified Medical Language System (UMLS). Compared to a competitive prior approach that uses a tool for identifying biomedical concepts and a consumer-directed vocabulary list, we empirically show the enhanced accuracy of our system due to improved handling of ambiguous terms. We also show the enhanced accuracy of our system over directly-supervised neural methods in this low-resource setting. Finally, we show the direct impact of our system on laypeople’s comprehension of health material via a human subjects’ study (n=160).","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116418530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multitask Learning of Negation and Speculation using Transformers 利用变形器进行多任务否定与思辨学习
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.9
Aditya P. Khandelwal, Benita Kathleen Britto
Detecting negation and speculation in language has been a task of considerable interest to the biomedical community, as it is a key component of Information Extraction systems from Biomedical documents. Prior work has individually addressed Negation Detection and Speculation Detection, and both have been addressed in the same way, using 2 stage pipelined approach: Cue Detection followed by Scope Resolution. In this paper, we propose Multitask learning approaches over 2 sets of tasks: Negation Cue Detection & Speculation Cue Detection, and Negation Scope Resolution & Speculation Scope Resolution. We utilise transformer-based architectures like BERT, XLNet and RoBERTa as our core model architecture, and finetune these using the Multitask learning approaches. We show that this Multitask Learning approach outperforms the single task learning approach, and report new state-of-the-art results on Negation and Speculation Scope Resolution on the BioScope Corpus and the SFU Review Corpus.
检测语言中的否定和推测一直是生物医学界相当感兴趣的任务,因为它是从生物医学文档中提取信息系统的关键组成部分。之前的工作分别解决了否定检测和推测检测,并且两者都以相同的方式解决,使用2阶段流水线方法:提示检测然后是范围分辨率。在本文中,我们提出了两组任务的多任务学习方法:否定线索检测和猜测线索检测,以及否定范围解决和猜测范围解决。我们利用BERT、XLNet和RoBERTa等基于转换器的架构作为我们的核心模型架构,并使用多任务学习方法对这些架构进行微调。我们证明了这种多任务学习方法优于单任务学习方法,并报告了在BioScope语料库和SFU评论语料库上关于否定和推测范围分辨率的最新研究结果。
{"title":"Multitask Learning of Negation and Speculation using Transformers","authors":"Aditya P. Khandelwal, Benita Kathleen Britto","doi":"10.18653/v1/2020.louhi-1.9","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.9","url":null,"abstract":"Detecting negation and speculation in language has been a task of considerable interest to the biomedical community, as it is a key component of Information Extraction systems from Biomedical documents. Prior work has individually addressed Negation Detection and Speculation Detection, and both have been addressed in the same way, using 2 stage pipelined approach: Cue Detection followed by Scope Resolution. In this paper, we propose Multitask learning approaches over 2 sets of tasks: Negation Cue Detection & Speculation Cue Detection, and Negation Scope Resolution & Speculation Scope Resolution. We utilise transformer-based architectures like BERT, XLNet and RoBERTa as our core model architecture, and finetune these using the Multitask learning approaches. We show that this Multitask Learning approach outperforms the single task learning approach, and report new state-of-the-art results on Negation and Speculation Scope Resolution on the BioScope Corpus and the SFU Review Corpus.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121155318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Information retrieval for animal disease surveillance: a pattern-based approach. 动物疾病监测的信息检索:基于模式的方法。
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.8
S. Valentin, R. Lancelot, M. Roche
Animal diseases-related news articles are richin information useful for risk assessment. In this paper, we explore a method to automatically retrieve sentence-level epidemiological information. Our method is an incremental approach to create and expand patterns at both lexical and syntactic levels. Expert knowledge input are used at different steps of the approach. Distributed vector representations (word embedding) were used to expand the patterns at the lexical level, thus alleviating manual curation. We showed that expert validation was crucial to improve the precision of automatically generated patterns.
与动物疾病有关的新闻文章提供了对风险评估有用的丰富信息。在本文中,我们探索了一种自动检索句子级流行病学信息的方法。我们的方法是一种增量方法,用于在词法和语法级别上创建和扩展模式。专家知识输入在方法的不同步骤中使用。使用分布式向量表示(词嵌入)在词汇级扩展模式,从而减轻人工管理。研究表明,专家验证对于提高自动生成模式的精度至关重要。
{"title":"Information retrieval for animal disease surveillance: a pattern-based approach.","authors":"S. Valentin, R. Lancelot, M. Roche","doi":"10.18653/v1/2020.louhi-1.8","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.8","url":null,"abstract":"Animal diseases-related news articles are richin information useful for risk assessment. In this paper, we explore a method to automatically retrieve sentence-level epidemiological information. Our method is an incremental approach to create and expand patterns at both lexical and syntactic levels. Expert knowledge input are used at different steps of the approach. Distributed vector representations (word embedding) were used to expand the patterns at the lexical level, thus alleviating manual curation. We showed that expert validation was crucial to improve the precision of automatically generated patterns.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126595025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Normalization of Long-tail Adverse Drug Reactions in Social Media 社交媒体中药物不良反应的长尾规范化
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.6
Emmanouil Manousogiannis, S. Mesbah, A. Bozzon, Robert-Jan Sips, Zoltan Szlanik, Selene Baez
The automatic mapping of Adverse Drug Reaction (ADR) reports from user-generated content to concepts in a controlled medical vocabulary provides valuable insights for monitoring public health. While state-of-the-art deep learning-based sequence classification techniques achieve impressive performance for medical concepts with large amounts of training data, they show their limit with long-tail concepts that have a low number of training samples. The above hinders their adaptability to the changes of layman’s terminology and the constant emergence of new informal medical terms. Our objective in this paper is to tackle the problem of normalizing long-tail ADR mentions in user-generated content. In this paper, we exploit the implicit semantics of rare ADRs for which we have few training samples, in order to detect the most similar class for the given ADR. The evaluation results demonstrate that our proposed approach addresses the limitations of the existing techniques when the amount of training data is limited.
将药物不良反应(ADR)报告从用户生成的内容自动映射到受控医学词汇表中的概念,为监测公共卫生提供了有价值的见解。虽然最先进的基于深度学习的序列分类技术在具有大量训练数据的医学概念上取得了令人印象深刻的表现,但它们在具有少量训练样本的长尾概念上显示出其局限性。这阻碍了他们适应外行术语的变化和新的非正式医学术语的不断出现。我们在本文中的目标是解决用户生成内容中长尾ADR提及的规范化问题。在本文中,我们利用我们只有很少训练样本的罕见ADR的隐式语义,以检测给定ADR的最相似类。评估结果表明,当训练数据量有限时,我们提出的方法解决了现有技术的局限性。
{"title":"Normalization of Long-tail Adverse Drug Reactions in Social Media","authors":"Emmanouil Manousogiannis, S. Mesbah, A. Bozzon, Robert-Jan Sips, Zoltan Szlanik, Selene Baez","doi":"10.18653/v1/2020.louhi-1.6","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.6","url":null,"abstract":"The automatic mapping of Adverse Drug Reaction (ADR) reports from user-generated content to concepts in a controlled medical vocabulary provides valuable insights for monitoring public health. While state-of-the-art deep learning-based sequence classification techniques achieve impressive performance for medical concepts with large amounts of training data, they show their limit with long-tail concepts that have a low number of training samples. The above hinders their adaptability to the changes of layman’s terminology and the constant emergence of new informal medical terms. Our objective in this paper is to tackle the problem of normalizing long-tail ADR mentions in user-generated content. In this paper, we exploit the implicit semantics of rare ADRs for which we have few training samples, in order to detect the most similar class for the given ADR. The evaluation results demonstrate that our proposed approach addresses the limitations of the existing techniques when the amount of training data is limited.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128403730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Biomedical Event Extraction as Multi-turn Question Answering 基于多回合问答的生物医学事件提取
Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.louhi-1.10
Xinglong Wang, Leon Weber, U. Leser
Biomedical event extraction from natural text is a challenging task as it searches for complex and often nested structures describing specific relationships between multiple molecular entities, such as genes, proteins, or cellular components. It usually is implemented by a complex pipeline of individual tools to solve the different relation extraction subtasks. We present an alternative approach where the detection of relationships between entities is described uniformly as questions, which are iteratively answered by a question answering (QA) system based on the domain-specific language model SciBERT. This model outperforms two strong baselines in two biomedical event extraction corpora in a Knowledge Base Population setting, and also achieves competitive performance in BioNLP challenge evaluation settings.
从自然文本中提取生物医学事件是一项具有挑战性的任务,因为它搜索描述多个分子实体(如基因、蛋白质或细胞成分)之间特定关系的复杂且通常嵌套的结构。它通常由复杂的工具管道实现,以解决不同的关系提取子任务。我们提出了一种替代方法,其中实体之间关系的检测被统一描述为问题,这些问题由基于领域特定语言模型SciBERT的问答(QA)系统迭代回答。该模型在知识库人口设置的两个生物医学事件提取语料库中优于两个强基线,并且在BioNLP挑战评估设置中也达到了具有竞争力的性能。
{"title":"Biomedical Event Extraction as Multi-turn Question Answering","authors":"Xinglong Wang, Leon Weber, U. Leser","doi":"10.18653/v1/2020.louhi-1.10","DOIUrl":"https://doi.org/10.18653/v1/2020.louhi-1.10","url":null,"abstract":"Biomedical event extraction from natural text is a challenging task as it searches for complex and often nested structures describing specific relationships between multiple molecular entities, such as genes, proteins, or cellular components. It usually is implemented by a complex pipeline of individual tools to solve the different relation extraction subtasks. We present an alternative approach where the detection of relationships between entities is described uniformly as questions, which are iteratively answered by a question answering (QA) system based on the domain-specific language model SciBERT. This model outperforms two strong baselines in two biomedical event extraction corpora in a Knowledge Base Population setting, and also achieves competitive performance in BioNLP challenge evaluation settings.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125130730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
期刊
International Workshop on Health Text Mining and Information Analysis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1