Workshop on Biomedical Natural Language Processing最新文献

英文中文

Scalable Few-Shot Learning of Robust Biomedical Name Representations 鲁棒生物医学名称表示的可扩展少镜头学习

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.bionlp-1.3

Pieter Fivez, Simon Suster, Walter Daelemans

Recent research on robust representations of biomedical names has focused on modeling large amounts of fine-grained conceptual distinctions using complex neural encoders. In this paper, we explore the opposite paradigm: training a simple encoder architecture using only small sets of names sampled from high-level biomedical concepts. Our encoder post-processes pretrained representations of biomedical names, and is effective for various types of input representations, both domain-specific or unsupervised. We validate our proposed few-shot learning approach on multiple biomedical relatedness benchmarks, and show that it allows for continual learning, where we accumulate information from various conceptual hierarchies to consistently improve encoder performance. Given these findings, we propose our approach as a low-cost alternative for exploring the impact of conceptual distinctions on robust biomedical name representations.

最近对生物医学名称鲁棒表示的研究主要集中在使用复杂的神经编码器对大量细粒度的概念区分进行建模。在本文中，我们探索了相反的范例:仅使用从高级生物医学概念中采样的小组名称来训练简单的编码器架构。我们的编码器对生物医学名称的预训练表示进行后处理，并且对各种类型的输入表示都有效，包括特定领域或无监督的输入表示。我们在多个生物医学相关性基准上验证了我们提出的少镜头学习方法，并表明它允许持续学习，我们从各种概念层次中积累信息以持续提高编码器性能。鉴于这些发现，我们提出我们的方法作为一种低成本的替代方法，用于探索概念差异对稳健的生物医学名称表征的影响。

引用次数: 3

Prediction Models for Risk of Type-2 Diabetes Using Health Claims 使用健康声明的2型糖尿病风险预测模型

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/W18-2322

M. Nagata, Kohichi Takai, K. Yasuda, P. Heracleous, Akio Yoneyama

This study focuses on highly accurate prediction of the onset of type-2 diabetes. We investigated whether prediction accuracy can be improved by utilizing lab test data obtained from health checkups and incorporating health claim text data such as medically diagnosed diseases with ICD10 codes and pharmacy information. In a previous study, prediction accuracy was increased slightly by adding diagnosis disease name and independent variables such as prescription medicine. Therefore, in the current study we explored more suitable models for prediction by using state-of-the-art techniques such as XGBoost and long short-term memory (LSTM) based on recurrent neural networks. In the current study, text data was vectorized using word2vec, and the prediction model was compared with logistic regression. The results obtained confirmed that onset of type-2 diabetes can be predicted with a high degree of accuracy when the XGBoost model is used.

本研究的重点是高度准确地预测2型糖尿病的发病。我们研究了利用从健康检查中获得的实验室测试数据和结合健康声明文本数据(如医学诊断疾病与ICD10代码和药房信息)是否可以提高预测准确性。在之前的研究中，通过添加诊断疾病名称和处方药物等自变量，预测准确率略有提高。因此，在当前的研究中，我们通过使用最先进的技术，如XGBoost和基于循环神经网络的长短期记忆(LSTM)，探索了更合适的预测模型。在本研究中，使用word2vec对文本数据进行矢量化，并将预测模型与逻辑回归进行比较。研究结果证实，使用XGBoost模型可以高度准确地预测2型糖尿病的发病。

引用次数: 13

Identifying Key Sentences for Precision Oncology Using Semi-Supervised Learning 使用半监督学习识别精确肿瘤学的关键句子

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/W18-2305

J. Seva, Martin Wackerbauer, U. Leser

We present a machine learning pipeline that identifies key sentences in abstracts of oncological articles to aid evidence-based medicine. This problem is characterized by the lack of gold standard datasets, data imbalance and thematic differences between available silver standard corpora. Additionally, available training and target data differs with regard to their domain (professional summaries vs. sentences in abstracts). This makes supervised machine learning inapplicable. We propose the use of two semi-supervised machine learning approaches: To mitigate difficulties arising from heterogeneous data sources, overcome data imbalance and create reliable training data we propose using transductive learning from positive and unlabelled data (PU Learning). For obtaining a realistic classification model, we propose the use of abstracts summarised in relevant sentences as unlabelled examples through Self-Training. The best model achieves 84% accuracy and 0.84 F1 score on our dataset

我们提出了一个机器学习管道，可以识别肿瘤学文章摘要中的关键句子，以帮助循证医学。这一问题的特点是缺乏黄金标准数据集、数据不平衡以及可用的白银标准语料库之间的主题差异。此外，可用的训练数据和目标数据在其领域(专业摘要与摘要中的句子)方面有所不同。这使得监督式机器学习不适用。我们建议使用两种半监督机器学习方法:为了减轻异构数据源带来的困难，克服数据不平衡并创建可靠的训练数据，我们建议使用来自正数据和未标记数据的转导学习(PU学习)。为了获得一个真实的分类模型，我们建议使用相关句子中总结的摘要作为通过自我训练的无标记示例。在我们的数据集上，最好的模型达到了84%的准确率和0.84的F1分数

引用次数: 4

ChicHealth @ MEDIQA 2021: Exploring the limits of pre-trained seq2seq models for medical summarization ChicHealth @ MEDIQA 2021:探索预先训练的seq2seq模型用于医学总结的局限性

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/2021.bionlp-1.29

Liwen Xu, Yan Zhang, Lei Hong, Yi Cai, Szui Sung

In this article, we will describe our system for MEDIQA2021 shared tasks. First, we will describe the method of the second task, multiple answer summary (MAS). For extracting abstracts, we follow the rules of (CITATION). First, the candidate sentences are roughly estimated by using the Roberta model. Then the Markov chain model is used to evaluate the sentences in a fine-grained manner. Our team won the first place in overall performance, with the fourth place in MAS task, the seventh place in RRS task and the eleventh place in QS task. For the QS and RRS tasks, we investigate the performanceS of the end-to-end pre-trained seq2seq model. Experiments show that the methods of adversarial training and reverse translation are beneficial to improve the fine tuning performance.

在本文中，我们将描述MEDIQA2021共享任务的系统。首先，我们将描述第二个任务的方法，多答案总结(MAS)。摘要的提取遵循(CITATION)的规则。首先，使用Roberta模型对候选句子进行粗略估计。然后使用马尔可夫链模型对句子进行细粒度评估。我们的团队在综合成绩上获得了第一名，MAS任务第四名，RRS任务第七名，QS任务第十一名。对于QS和RRS任务，我们研究了端到端预训练seq2seq模型的性能。实验表明，对抗训练和反向翻译的方法有利于提高微调性能。

引用次数: 9

Gaussian Distributed Prototypical Network for Few-shot Genomic Variant Detection 基于高斯分布原型网络的基因组变异检测

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.bionlp-1.2

Jiarun Cao, Niels Peek, A. Renehan, S. Ananiadou

Automatically identifying genetic mutations in the cancer literature using text mining technology has been an important way to study the vast amount of cancer medical literature. However, novel knowledge regarding the genetic variants proliferates rapidly, though current supervised learning models struggle with discovering these unknown entity types. Few-shot learning allows a model to perform effectively with great generalization on new entity types, which has not been explored in recognizing cancer mutation detection. This paper addresses cancer mutation detection tasks with few-shot learning paradigms. We propose GDPN framework, which models the label dependency from the training examples in the support set and approximates the transition scores via Gaussian distribution. The experiments on three benchmark cancer mutation datasets show the effectiveness of our proposed model.

利用文本挖掘技术自动识别癌症文献中的基因突变已成为研究大量癌症医学文献的重要途径。然而，关于遗传变异的新知识迅速扩散，尽管目前的监督学习模型很难发现这些未知的实体类型。Few-shot学习允许模型在新的实体类型上进行有效的泛化，这在识别癌症突变检测中尚未得到探索。本文讨论了基于少采样学习范式的癌症突变检测任务。我们提出了GDPN框架，该框架从支持集中的训练样本中建模标签依赖，并通过高斯分布近似过渡分数。在三个基准癌症突变数据集上的实验表明了该模型的有效性。

引用次数: 0

Clinical Event Detection with Hybrid Neural Architecture 基于混合神经结构的临床事件检测

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/W17-2345

A. Maharana, Meliha Yetisgen-Yildiz

Event detection from clinical notes has been traditionally solved with rule based and statistical natural language processing (NLP) approaches that require extensive domain knowledge and feature engineering. In this paper, we have explored the feasibility of approaching this task with recurrent neural networks, clinical word embeddings and introduced a hybrid architecture to improve detection for entities with smaller representation in the dataset. A comparative analysis is also done which reveals the complementary behavior of neural networks and conditional random fields in clinical entity detection.

传统上，临床记录中的事件检测是通过基于规则和统计的自然语言处理(NLP)方法来解决的，这些方法需要广泛的领域知识和特征工程。在本文中，我们探索了用递归神经网络、临床词嵌入来完成这项任务的可行性，并引入了一种混合架构来改进对数据集中具有较小代表性的实体的检测。通过对比分析，揭示了神经网络与条件随机场在临床实体检测中的互补行为。

引用次数: 4

Biomedical Event Extraction using Abstract Meaning Representation 基于抽象意义表示的生物医学事件提取

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/W17-2315

Sudha Rao, D. Marcu, Kevin Knight, Hal Daumé

We propose a novel, Abstract Meaning Representation (AMR) based approach to identifying molecular events/interactions in biomedical text. Our key contributions are: (1) an empirical validation of our hypothesis that an event is a subgraph of the AMR graph, (2) a neural network-based model that identifies such an event subgraph given an AMR, and (3) a distant supervision based approach to gather additional training data. We evaluate our approach on the 2013 Genia Event Extraction dataset and show promising results.

我们提出了一种新颖的，基于抽象意义表示(AMR)的方法来识别生物医学文本中的分子事件/相互作用。我们的主要贡献是:(1)对我们假设的经验验证，即事件是AMR图的子图，(2)基于神经网络的模型，该模型可以识别给定AMR的事件子图，以及(3)基于远程监督的方法来收集额外的训练数据。我们在2013年Genia事件提取数据集上评估了我们的方法，并显示出有希望的结果。

引用次数: 76

Quantifying Clinical Outcome Measures in Patients with Epilepsy Using the Electronic Health Record 利用电子健康记录量化癫痫患者的临床结果

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.bionlp-1.36

Kevin Xie, B. Litt, D. Roth, C. Ellis

A wealth of important clinical information lies untouched in the Electronic Health Record, often in the form of unstructured textual documents. For patients with Epilepsy, such information includes outcome measures like Seizure Frequency and Dates of Last Seizure, key parameters that guide all therapy for these patients. Transformer models have been able to extract such outcome measures from unstructured clinical note text as sentences with human-like accuracy; however, these sentences are not yet usable in a quantitative analysis for large-scale studies. In this study, we developed a pipeline to quantify these outcome measures. We used text summarization models to convert unstructured sentences into specific formats, and then employed rules-based quantifiers to calculate seizure frequencies and dates of last seizure. We demonstrated that our pipeline of models does not excessively propagate errors and we analyzed its mistakes. We anticipate that our methods can be generalized outside of epilepsy to other disorders to drive large-scale clinical research.

电子健康记录中包含大量重要的临床信息，这些信息通常以非结构化文本文档的形式存在。对于癫痫患者，这些信息包括诸如癫痫发作频率和最后一次发作日期等结果测量，这些关键参数指导这些患者的所有治疗。Transformer模型已经能够从非结构化的临床记录文本中提取出这样的结果度量，作为句子，具有类似人类的准确性;然而，这些句子还不能用于大规模研究的定量分析。在这项研究中，我们开发了一个管道来量化这些结果测量。我们使用文本摘要模型将非结构化句子转换为特定格式，然后使用基于规则的量词来计算癫痫发作频率和上次癫痫发作的日期。我们证明了我们的模型管道不会过度传播错误，并分析了它的错误。我们期望我们的方法可以推广到癫痫以外的其他疾病，以推动大规模的临床研究。

引用次数: 7

Extraction of Regulatory Events using Kernel-based Classifiers and Distant Supervision 基于核分类器和远程监督的调节事件提取

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/W16-3011

Andre Lamurias, M. J. Rodrigues, L. Clarke, Francisco M. Couto

This paper describes our system to extract binary regulatory relations from text, used to participate in the SeeDev task of BioNLP-ST 2016. Our system was based on machine learning, using support vector machines with a shallow linguistic kernel to identify each type of relation. Additionally, we employed a distant supervised approach to increase the size of the training data. Our submission obtained the third best precision of the SeeDev-binary task. Although the distant supervised approach did not significantly improve the results, we expect that by exploring other techniques to use unlabeled data should lead to better results.

本文描述了我们的系统从文本中提取二进制调节关系，用于参与BioNLP-ST 2016的SeeDev任务。我们的系统基于机器学习，使用支持向量机和浅层语言内核来识别每种类型的关系。此外，我们采用远程监督方法来增加训练数据的大小。我们的提交获得了SeeDev-binary任务的第三高精度。尽管远程监督方法并没有显著改善结果，但我们希望通过探索其他技术来使用未标记的数据应该会带来更好的结果。

引用次数: 3

Biomedical Document Classification with Literature Graph Representations of Bibliographies and Entities 用文献图表示书目和实体的生物医学文献分类

Workshop on Biomedical Natural Language Processing

Pub Date : 1900-01-01 DOI: 10.18653/v1/2023.bionlp-1.36

Ryuki Ida, Makoto Miwa, Yutaka Sasaki

This paper proposes a new document classification method that incorporates the representations of a literature graph created from bibliographic and entity information.Recently, document classification performance has been significantly improved with large pre-trained language models; however, there still remain documents that are difficult to classify. External information, such as bibliographic information, citation links, descriptions of entities, and medical taxonomies, has been considered one of the keys to dealing with such documents in document classification. Although several document classification methods using external information have been proposed, they only consider limited relationships, e.g., word co-occurrence and citation relationships. However, there are multiple types of external information.To overcome the limitation of the conventional use of external information, we propose a document classification model that simultaneously considers bibliographic and entity information to deeply model the relationships among documents using the representations of the literature graph.The experimental results show that our proposed method outperforms existing methods on two document classification datasets in the biomedical domain with the help of the literature graph.

本文提出了一种新的文献分类方法，该方法结合了由书目信息和实体信息创建的文献图的表示。近年来，大型预训练语言模型显著提高了文档分类性能;然而，仍有一些文件难以分类。外部信息，如书目信息、引文链接、实体描述和医学分类法，被认为是文档分类中处理此类文档的关键之一。虽然已经提出了几种使用外部信息的文档分类方法，但它们只考虑了有限的关系，例如词共现关系和引文关系。然而，外部信息有多种类型。为了克服传统使用外部信息的局限性，我们提出了一种同时考虑书目信息和实体信息的文档分类模型，利用文献图的表示对文档之间的关系进行深度建模。实验结果表明，在文献图的帮助下，我们提出的方法在生物医学领域的两个文档分类数据集上优于现有方法。

{"title":"Biomedical Document Classification with Literature Graph Representations of Bibliographies and Entities","authors":"Ryuki Ida, Makoto Miwa, Yutaka Sasaki","doi":"10.18653/v1/2023.bionlp-1.36","DOIUrl":"https://doi.org/10.18653/v1/2023.bionlp-1.36","url":null,"abstract":"This paper proposes a new document classification method that incorporates the representations of a literature graph created from bibliographic and entity information.Recently, document classification performance has been significantly improved with large pre-trained language models; however, there still remain documents that are difficult to classify. External information, such as bibliographic information, citation links, descriptions of entities, and medical taxonomies, has been considered one of the keys to dealing with such documents in document classification. Although several document classification methods using external information have been proposed, they only consider limited relationships, e.g., word co-occurrence and citation relationships. However, there are multiple types of external information.To overcome the limitation of the conventional use of external information, we propose a document classification model that simultaneously considers bibliographic and entity information to deeply model the relationships among documents using the representations of the literature graph.The experimental results show that our proposed method outperforms existing methods on two document classification datasets in the biomedical domain with the help of the literature graph.","PeriodicalId":200974,"journal":{"name":"Workshop on Biomedical Natural Language Processing","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130746137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Workshop on Biomedical Natural Language Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀