Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating

International Workshop on Health Text Mining and Information Analysis Pub Date : 2020-11-01 DOI:10.18653/v1/2020.louhi-1.14

Minghao Zhu, Yoonkyoung Song, Ge Jin, Keyuan Jiang

{"title":"Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating","authors":"Minghao Zhu, Yoonkyoung Song, Ge Jin, Keyuan Jiang","doi":"10.18653/v1/2020.louhi-1.14","DOIUrl":null,"url":null,"abstract":"Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use: the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p < 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.","PeriodicalId":448872,"journal":{"name":"International Workshop on Health Text Mining and Information Analysis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Health Text Mining and Information Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.louhi-1.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use: the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p < 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用预训练RoBERTa语言模型识别药物效果的个人体验推文及其更新

上市后监测，即对药品安全使用的监测，是药物警戒的重要组成部分。能够收集与药品使用相关的个人经验可以帮助我们深入了解人体对不同药物的反应。热门社交媒体服务Twitter被认为是收集个人用药体验信息的重要替代数据源。在自然语言处理中，识别个人体验推文是一项具有挑战性的分类任务。在本研究中，我们利用基于Facebook稳健优化的BERT预训练方法(RoBERTa)的三种方法来预测与药物使用相关的个人体验推文:第一种方法将预训练的RoBERTa模型与分类器结合，第二种方法将更新的预训练RoBERTa模型使用未标记推文语料库与分类器结合，第三种方法将RoBERTa模型与我们的未标记推文重新训练并与分类器结合。我们的研究结果表明，所有这些方法在分类性能上都优于已发表的方法(Word Embedding + LSTM) (p < 0.05)，并且使用与药物相关的推文更新预训练的语言模型甚至可以进一步提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

International Workshop on Health Text Mining and Information Analysis

自引率

0.00%

发文量