首页 > 最新文献

Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task最新文献

英文 中文
Approaching SMM4H with Merged Models and Multi-task Learning 用合并模型和多任务学习逼近SMM4H
Tilia Ellendorff, Lenz Furrer, N. Colic, Noëmi Aepli, Fabio Rinaldi
We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.
我们描述了我们提交给第四版的健康应用社交媒体挖掘(SMM4H)共享任务。我们的团队(UZH)参与了两个子任务:在推文中提到的不利影响的自动分类(任务1)和个人健康经历提及的可概括识别(任务4)。对于我们的提交,我们在多任务学习场景(任务1)中利用基于神经转换架构(BERT)的预训练语言表示(任务1和4)和CNN-BiLSTM(-CRF)网络的集成。这些系统被放置在精心制作的特定领域预处理步骤管道之上。
{"title":"Approaching SMM4H with Merged Models and Multi-task Learning","authors":"Tilia Ellendorff, Lenz Furrer, N. Colic, Noëmi Aepli, Fabio Rinaldi","doi":"10.18653/v1/W19-3208","DOIUrl":"https://doi.org/10.18653/v1/W19-3208","url":null,"abstract":"We describe our submissions to the 4th edition of the Social Media Mining for Health Applications (SMM4H) shared task. Our team (UZH) participated in two sub-tasks: Automatic classifications of adverse effects mentions in tweets (Task 1) and Generalizable identification of personal health experience mentions (Task 4). For our submissions, we exploited ensembles based on a pre-trained language representation with a neural transformer architecture (BERT) (Tasks 1 and 4) and a CNN-BiLSTM(-CRF) network within a multi-task learning scenario (Task 1). These systems are placed on top of a carefully crafted pipeline of domain-specific preprocessing steps.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126305317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Neural Network to Identify Personal Health Experience Mention in Tweets Using BioBERT Embeddings 使用BioBERT嵌入识别推文中提到的个人健康经验的神经网络
Shubham Gondane
This paper describes the system developed by team ASU-NLP for the Social Media Mining for Health Applications(SMM4H) shared task 4. We extract feature embeddings from the BioBERT (Lee et al., 2019) model which has been fine-tuned on the training dataset and use that as inputs to a dense fully connected neural network. We achieve above average scores among the participant systems with the overall F1-score, accuracy, precision, recall as 0.8036, 0.8456, 0.9783, 0.6818 respectively.
本文描述了由ASU-NLP团队为健康应用的社交媒体挖掘(SMM4H)共享任务4开发的系统。我们从BioBERT (Lee et al., 2019)模型中提取特征嵌入,该模型已经在训练数据集上进行了微调,并将其用作密集全连接神经网络的输入。我们在参与者系统中取得了高于平均水平的成绩,总体f1得分、准确率、精密度、召回率分别为0.8036、0.8456、0.9783、0.6818。
{"title":"Neural Network to Identify Personal Health Experience Mention in Tweets Using BioBERT Embeddings","authors":"Shubham Gondane","doi":"10.18653/v1/W19-3218","DOIUrl":"https://doi.org/10.18653/v1/W19-3218","url":null,"abstract":"This paper describes the system developed by team ASU-NLP for the Social Media Mining for Health Applications(SMM4H) shared task 4. We extract feature embeddings from the BioBERT (Lee et al., 2019) model which has been fine-tuned on the training dataset and use that as inputs to a dense fully connected neural network. We achieve above average scores among the participant systems with the overall F1-score, accuracy, precision, recall as 0.8036, 0.8456, 0.9783, 0.6818 respectively.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128690222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Deep Learning for Identification of Adverse Effect Mentions In Twitter Data 利用深度学习识别Twitter数据中的不利影响
P. Barry, Ozlem Uzuner
Social Media Mining for Health Applications (SMM4H) Adverse Effect Mentions Shared Task challenges participants to accurately identify spans of text within a tweet that correspond to Adverse Effects (AEs) resulting from medication usage (Weissenbacher et al., 2019). This task features a training data set of 2,367 tweets, in addition to a 1,000 tweet evaluation data set. The solution presented here features a bidirectional Long Short-term Memory Network (bi-LSTM) for the generation of character-level embeddings. It uses a second bi-LSTM trained on both character and token level embeddings to feed a Conditional Random Field (CRF) which provides the final classification. This paper further discusses the deep learning algorithms used in our solution.
社交媒体健康应用挖掘(SMM4H)不利影响提到共享任务挑战参与者准确识别推文中与药物使用导致的不利影响(ae)相对应的文本跨度(Weissenbacher等人,2019)。该任务的特征是一个包含2367条推文的训练数据集,以及1000条推文评估数据集。本文提出的解决方案具有双向长短期记忆网络(bi-LSTM),用于生成字符级嵌入。它使用在字符和标记级嵌入上训练的第二个bi-LSTM来提供条件随机场(CRF),该CRF提供最终分类。本文进一步讨论了我们的解决方案中使用的深度学习算法。
{"title":"Deep Learning for Identification of Adverse Effect Mentions In Twitter Data","authors":"P. Barry, Ozlem Uzuner","doi":"10.18653/v1/W19-3215","DOIUrl":"https://doi.org/10.18653/v1/W19-3215","url":null,"abstract":"Social Media Mining for Health Applications (SMM4H) Adverse Effect Mentions Shared Task challenges participants to accurately identify spans of text within a tweet that correspond to Adverse Effects (AEs) resulting from medication usage (Weissenbacher et al., 2019). This task features a training data set of 2,367 tweets, in addition to a 1,000 tweet evaluation data set. The solution presented here features a bidirectional Long Short-term Memory Network (bi-LSTM) for the generation of character-level embeddings. It uses a second bi-LSTM trained on both character and token level embeddings to feed a Conditional Random Field (CRF) which provides the final classification. This paper further discusses the deep learning algorithms used in our solution.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127440229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
BIGODM System in the Social Media Mining for Health Applications Shared Task 2019 社交媒体挖掘中的BIGODM系统健康应用共享任务2019
Chen-Kai Wang, Hong-Jie Dai, Bo-Hung Wang
In this study, we describe our methods to automatically classify Twitter posts conveying events of adverse drug reaction (ADR). Based on our previous experience in tackling the ADR classification task, we empirically applied the vote-based under-sampling ensemble approach along with linear support vector machine (SVM) to develop our classifiers as part of our participation in ACL 2019 Social Media Mining for Health Applications (SMM4H) shared task 1. The best-performed model on the test sets were trained on a merged corpus consisting of the datasets released by SMM4H 2017 and 2019. By using VUE, the corpus was randomly under-sampled with 2:1 ratio between the negative and positive classes to create an ensemble using the linear kernel trained with features including bag-of-word, domain knowledge, negation and word embedding. The best performing model achieved an F-measure of 0.551 which is about 5% higher than the average F-scores of 16 teams.
在这项研究中,我们描述了我们的方法来自动分类Twitter帖子传递药物不良反应(ADR)事件。基于我们之前处理ADR分类任务的经验,我们经验地应用基于投票的欠采样集成方法以及线性支持向量机(SVM)来开发我们的分类器,作为我们参与ACL 2019健康应用社交媒体挖掘(SMM4H)共享任务1的一部分。测试集上表现最好的模型是在由SMM4H 2017和2019发布的数据集组成的合并语料库上训练的。通过使用VUE,语料库以2:1的负类和正类比例随机欠采样,使用包含词袋、领域知识、否定和词嵌入等特征训练的线性核来创建集成。表现最好的模型的f值为0.551,比16个团队的平均f值高出约5%。
{"title":"BIGODM System in the Social Media Mining for Health Applications Shared Task 2019","authors":"Chen-Kai Wang, Hong-Jie Dai, Bo-Hung Wang","doi":"10.18653/v1/W19-3220","DOIUrl":"https://doi.org/10.18653/v1/W19-3220","url":null,"abstract":"In this study, we describe our methods to automatically classify Twitter posts conveying events of adverse drug reaction (ADR). Based on our previous experience in tackling the ADR classification task, we empirically applied the vote-based under-sampling ensemble approach along with linear support vector machine (SVM) to develop our classifiers as part of our participation in ACL 2019 Social Media Mining for Health Applications (SMM4H) shared task 1. The best-performed model on the test sets were trained on a merged corpus consisting of the datasets released by SMM4H 2017 and 2019. By using VUE, the corpus was randomly under-sampled with 2:1 ratio between the negative and positive classes to create an ensemble using the linear kernel trained with features including bag-of-word, domain knowledge, negation and word embedding. The best performing model achieved an F-measure of 0.551 which is about 5% higher than the average F-scores of 16 teams.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114579854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Lexical Normalization of User-Generated Medical Text 用户生成医学文本的词汇规范化
A. Dirkson, S. Verberne, G. van Oortmerssen, Wessel Kraaij
In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum.
在医学领域,用户生成的社交媒体文本越来越多地被用作科学医学文献的有价值的补充知识来源。口语语言的使用和拼写错误使这种知识的提取变得复杂。然而,这些数据的词法规范化还没有得到适当的解决。本文提出了一种用于医疗社交媒体的无监督、数据驱动的拼写纠正模块。我们的方法优于最先进的拼写纠正,可以检测错误,F0.5为0.888。此外,我们提出了一个新的语料库拼写错误的检测和纠正在医疗病人论坛。
{"title":"Lexical Normalization of User-Generated Medical Text","authors":"A. Dirkson, S. Verberne, G. van Oortmerssen, Wessel Kraaij","doi":"10.18653/v1/W19-3202","DOIUrl":"https://doi.org/10.18653/v1/W19-3202","url":null,"abstract":"In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123175983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Detection of Adverse Drug Reaction in Tweets Using a Combination of Heterogeneous Word Embeddings 基于异构词嵌入组合的推文药物不良反应检测
S. Aroyehun, Alexander Gelbukh
This paper details our approach to the task of detecting reportage of adverse drug reaction in tweets as part of the 2019 social media mining for healthcare applications shared task. We employed a combination of three types of word representations as input to a LSTM model. With this approach, we achieved an F1 score of 0.5209.
本文详细介绍了我们在推特中检测药物不良反应报道的任务方法,这是2019年医疗保健应用共享任务的社交媒体挖掘的一部分。我们使用了三种类型的单词表示的组合作为LSTM模型的输入。通过这种方法,我们获得了0.5209的F1分数。
{"title":"Detection of Adverse Drug Reaction in Tweets Using a Combination of Heterogeneous Word Embeddings","authors":"S. Aroyehun, Alexander Gelbukh","doi":"10.18653/v1/W19-3224","DOIUrl":"https://doi.org/10.18653/v1/W19-3224","url":null,"abstract":"This paper details our approach to the task of detecting reportage of adverse drug reaction in tweets as part of the 2019 social media mining for healthcare applications shared task. We employed a combination of three types of word representations as input to a LSTM model. With this approach, we achieved an F1 score of 0.5209.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131899077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
NLP@UNED at SMM4H 2019: Neural Networks Applied to Automatic Classifications of Adverse Effects Mentions in Tweets NLP@UNED在SMM4H 2019:神经网络应用于推文中提到的不利影响的自动分类
Javier Cortes-Tejada, Juan Martínez-Romo, Lourdes Araujo
This paper describes a system for automatically classifying adverse effects mentions in tweets developed for the task 1 at Social Media Mining for Health Applications (SMM4H) Shared Task 2019. We have developed a system based on LSTM neural networks inspired by the excellent results obtained by deep learning classifiers in the last edition of this task. The network is trained along with Twitter GloVe pre-trained word embeddings.
本文描述了一个自动分类推文中提到的不利影响的系统,该系统是为2019年社交媒体挖掘健康应用(SMM4H)共享任务1开发的。我们开发了一个基于LSTM神经网络的系统,灵感来自于本任务上一版中深度学习分类器获得的出色结果。该网络与Twitter GloVe预训练的词嵌入一起训练。
{"title":"NLP@UNED at SMM4H 2019: Neural Networks Applied to Automatic Classifications of Adverse Effects Mentions in Tweets","authors":"Javier Cortes-Tejada, Juan Martínez-Romo, Lourdes Araujo","doi":"10.18653/v1/W19-3213","DOIUrl":"https://doi.org/10.18653/v1/W19-3213","url":null,"abstract":"This paper describes a system for automatically classifying adverse effects mentions in tweets developed for the task 1 at Social Media Mining for Health Applications (SMM4H) Shared Task 2019. We have developed a system based on LSTM neural networks inspired by the excellent results obtained by deep learning classifiers in the last edition of this task. The network is trained along with Twitter GloVe pre-trained word embeddings.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128095978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
HITSZ-ICRC: A Report for SMM4H Shared Task 2019-Automatic Classification and Extraction of Adverse Effect Mentions in Tweets hitsz -红十字国际委员会:2019年SMM4H共享任务报告——推文中不利影响提及的自动分类和提取
Shuai Chen, Yuanhang Huang, Xiao-Ping Huang, Haoming Qin, Jun Yan, Buzhou Tang
This is the system description of the Harbin Institute of Technology Shenzhen (HITSZ) team for the first and second subtasks of the fourth Social Media Mining for Health Applications (SMM4H) shared task in 2019. The two subtasks are automatic classification and extraction of adverse effect mentions in tweets. The systems for the two subtasks are based on bidirectional encoder representations from transformers (BERT), and achieves promising results. Among the systems we developed for subtask1, the best F1-score was 0.6457, for subtask2, the best relaxed F1-score and the best strict F1-score were 0.614 and 0.407 respectively. Our system ranks first among all systems on subtask1.
这是哈尔滨工业大学深圳分校(HITSZ)团队对2019年第四届健康应用社交媒体挖掘(SMM4H)共享任务第一、二个子任务的系统描述。这两个子任务是tweets中不利影响提及的自动分类和提取。这两个子任务的系统基于变压器的双向编码器表示(BERT),并取得了令人满意的结果。其中,subtask1的最佳f1得分为0.6457,subtask2的最佳宽松f1得分为0.614,最佳严格f1得分为0.407。我们的系统在子任务1上排名第一。
{"title":"HITSZ-ICRC: A Report for SMM4H Shared Task 2019-Automatic Classification and Extraction of Adverse Effect Mentions in Tweets","authors":"Shuai Chen, Yuanhang Huang, Xiao-Ping Huang, Haoming Qin, Jun Yan, Buzhou Tang","doi":"10.18653/v1/W19-3206","DOIUrl":"https://doi.org/10.18653/v1/W19-3206","url":null,"abstract":"This is the system description of the Harbin Institute of Technology Shenzhen (HITSZ) team for the first and second subtasks of the fourth Social Media Mining for Health Applications (SMM4H) shared task in 2019. The two subtasks are automatic classification and extraction of adverse effect mentions in tweets. The systems for the two subtasks are based on bidirectional encoder representations from transformers (BERT), and achieves promising results. Among the systems we developed for subtask1, the best F1-score was 0.6457, for subtask2, the best relaxed F1-score and the best strict F1-score were 0.614 and 0.407 respectively. Our system ranks first among all systems on subtask1.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122603612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Towards Text Processing Pipelines to Identify Adverse Drug Events-related Tweets: University of Michigan @ SMM4H 2019 Task 1 迈向文本处理管道以识别与不良药物事件相关的推文:密歇根大学@ SMM4H 2019任务1
V.G.Vinod Vydiswaran, Grace Ganzel, Bryan Romas, D. Yu, Amy M. Austin, N. Bhomia, S. Chan, S. Hall, Van Le, Aaron Miller, Olawunmi Oduyebo, Aulia Song, Radhika Sondhi, D. Teng, H. Tseng, Kim Vuong, Stephanie Zimmerman
We participated in Task 1 of the Social Media Mining for Health Applications (SMM4H) 2019 Shared Tasks on detecting mentions of adverse drug events (ADEs) in tweets. Our approach relied on a text processing pipeline for tweets, and training traditional machine learning and deep learning models. Our submitted runs performed above average for the task.
我们参与了社交媒体健康应用挖掘(SMM4H) 2019年共享任务的任务1,该任务涉及检测推文中提到的药物不良事件(ADEs)。我们的方法依赖于推文的文本处理管道,以及训练传统的机器学习和深度学习模型。我们提交的运行执行高于任务的平均水平。
{"title":"Towards Text Processing Pipelines to Identify Adverse Drug Events-related Tweets: University of Michigan @ SMM4H 2019 Task 1","authors":"V.G.Vinod Vydiswaran, Grace Ganzel, Bryan Romas, D. Yu, Amy M. Austin, N. Bhomia, S. Chan, S. Hall, Van Le, Aaron Miller, Olawunmi Oduyebo, Aulia Song, Radhika Sondhi, D. Teng, H. Tseng, Kim Vuong, Stephanie Zimmerman","doi":"10.18653/v1/W19-3217","DOIUrl":"https://doi.org/10.18653/v1/W19-3217","url":null,"abstract":"We participated in Task 1 of the Social Media Mining for Health Applications (SMM4H) 2019 Shared Tasks on detecting mentions of adverse drug events (ADEs) in tweets. Our approach relied on a text processing pipeline for tweets, and training traditional machine learning and deep learning models. Our submitted runs performed above average for the task.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126902937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
KFU NLP Team at SMM4H 2019 Tasks: Want to Extract Adverse Drugs Reactions from Tweets? BERT to The Rescue KFU NLP团队在SMM4H 2019任务:想从推文中提取药物不良反应?伯特救援
Z. Miftahutdinov, I. Alimova, E. Tutubalina
This paper describes a system developed for the Social Media Mining for Health (SMM4H) 2019 shared tasks. Specifically, we participated in three tasks. The goals of the first two tasks are to classify whether a tweet contains mentions of adverse drug reactions (ADR) and extract these mentions, respectively. The objective of the third task is to build an end-to-end solution: first, detect ADR mentions and then map these entities to concepts in a controlled vocabulary. We investigate the use of a language representation model BERT trained to obtain semantic representations of social media texts. Our experiments on a dataset of user reviews showed that BERT is superior to state-of-the-art models based on recurrent neural networks. The BERT-based system for Task 1 obtained an F1 of 57.38%, with improvements up to +7.19% F1 over a score averaged across all 43 submissions. The ensemble of neural networks with a voting scheme for named entity recognition ranked first among 9 teams at the SMM4H 2019 Task 2 and obtained a relaxed F1 of 65.8%. The end-to-end model based on BERT for ADR normalization ranked first at the SMM4H 2019 Task 3 and obtained a relaxed F1 of 43.2%.
本文描述了为社交媒体挖掘健康(SMM4H) 2019共享任务开发的系统。具体来说,我们参与了三个任务。前两个任务的目标是对tweet是否包含药物不良反应(ADR)的提及进行分类,并分别提取这些提及。第三个任务的目标是构建端到端解决方案:首先,检测ADR提及,然后将这些实体映射到受控词汇表中的概念。我们研究了使用经过训练的语言表示模型BERT来获得社交媒体文本的语义表示。我们在用户评论数据集上的实验表明,BERT优于基于循环神经网络的最先进模型。任务1的基于bert的系统获得了57.38%的F1,比所有43份提交的平均分数提高了7.19%的F1。在SMM4H 2019 Task 2中,基于命名实体识别投票方案的神经网络集成在9个团队中排名第一,获得了65.8%的宽松F1。基于BERT的ADR归一化端到端模型在SMM4H 2019 Task 3中排名第一,获得了43.2%的放松F1。
{"title":"KFU NLP Team at SMM4H 2019 Tasks: Want to Extract Adverse Drugs Reactions from Tweets? BERT to The Rescue","authors":"Z. Miftahutdinov, I. Alimova, E. Tutubalina","doi":"10.18653/v1/W19-3207","DOIUrl":"https://doi.org/10.18653/v1/W19-3207","url":null,"abstract":"This paper describes a system developed for the Social Media Mining for Health (SMM4H) 2019 shared tasks. Specifically, we participated in three tasks. The goals of the first two tasks are to classify whether a tweet contains mentions of adverse drug reactions (ADR) and extract these mentions, respectively. The objective of the third task is to build an end-to-end solution: first, detect ADR mentions and then map these entities to concepts in a controlled vocabulary. We investigate the use of a language representation model BERT trained to obtain semantic representations of social media texts. Our experiments on a dataset of user reviews showed that BERT is superior to state-of-the-art models based on recurrent neural networks. The BERT-based system for Task 1 obtained an F1 of 57.38%, with improvements up to +7.19% F1 over a score averaged across all 43 submissions. The ensemble of neural networks with a voting scheme for named entity recognition ranked first among 9 teams at the SMM4H 2019 Task 2 and obtained a relaxed F1 of 65.8%. The end-to-end model based on BERT for ADR normalization ranked first at the SMM4H 2019 Task 3 and obtained a relaxed F1 of 43.2%.","PeriodicalId":265570,"journal":{"name":"Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116983806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1