Bidirectional Long Short-Term Memory-Based Detection of Adverse Drug Reaction Posts Using Korean Social Networking Services Data: Deep Learning Approaches.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS JMIR Medical Informatics Pub Date : 2024-11-20 DOI:10.2196/45289

Chung-Chun Lee, Seunghee Lee, Mi-Hwa Song, Jong-Yeup Kim, Suehyun Lee

{"title":"Bidirectional Long Short-Term Memory-Based Detection of Adverse Drug Reaction Posts Using Korean Social Networking Services Data: Deep Learning Approaches.","authors":"Chung-Chun Lee, Seunghee Lee, Mi-Hwa Song, Jong-Yeup Kim, Suehyun Lee","doi":"10.2196/45289","DOIUrl":null,"url":null,"abstract":"Background: Social networking services (SNS) closely reflect the lives of individuals in modern society and generate large amounts of data. Previous studies have extracted drug information using relevant SNS data. In particular, it is important to detect adverse drug reactions (ADRs) early using drug surveillance systems. To this end, various deep learning methods have been used to analyze data in multiple languages in addition to English.Objective: A cautionary drug that can cause ADRs in older patients was selected, and Korean SNS data containing this drug information were collected. Based on this information, we aimed to develop a deep learning model that classifies drug ADR posts based on a recurrent neural network.Methods: In previous studies, ketoprofen, which has a high prescription frequency and, thus, was referred to the most in posts secured from SNS data, was selected as the target drug. Blog posts, café posts, and NAVER Q&A posts from 2005 to 2020 were collected from NAVER, a portal site containing drug-related information, and natural language processing techniques were applied to analyze data written in Korean. Posts containing highly relevant drug names and ADR word pairs were filtered through association analysis, and training data were generated through manual labeling tasks. Using the training data, an embedded layer of word2vec was formed, and a Bidirectional Long Short-Term Memory (Bi-LSTM) classification model was generated. Then, we evaluated the area under the curve with other machine learning models. In addition, the entire process was further verified using the nonsteroidal anti-inflammatory drug aceclofenac.Results: Among the nonsteroidal anti-inflammatory drugs, Korean SNS posts containing information on ketoprofen and aceclofenac were secured, and the generic name lexicon, ADR lexicon, and Korean stop word lexicon were generated. In addition, to improve the accuracy of the classification model, an embedding layer was created considering the association between the drug name and the ADR word. In the ADR post classification test, ketoprofen and aceclofenac achieved 85% and 80% accuracy, respectively.Conclusions: Here, we propose a process for developing a model for classifying ADR posts using SNS data. After analyzing drug name-ADR patterns, we filtered high-quality data by extracting posts, including known ADR words based on the analysis. Based on these data, we developed a model that classifies ADR posts. This confirmed that a model that can leverage social data to monitor ADRs automatically is feasible.","PeriodicalId":56334,"journal":{"name":"JMIR Medical Informatics","volume":"12 ","pages":"e45289"},"PeriodicalIF":3.8000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11601139/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.2196/45289","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Social networking services (SNS) closely reflect the lives of individuals in modern society and generate large amounts of data. Previous studies have extracted drug information using relevant SNS data. In particular, it is important to detect adverse drug reactions (ADRs) early using drug surveillance systems. To this end, various deep learning methods have been used to analyze data in multiple languages in addition to English.

Objective: A cautionary drug that can cause ADRs in older patients was selected, and Korean SNS data containing this drug information were collected. Based on this information, we aimed to develop a deep learning model that classifies drug ADR posts based on a recurrent neural network.

Methods: In previous studies, ketoprofen, which has a high prescription frequency and, thus, was referred to the most in posts secured from SNS data, was selected as the target drug. Blog posts, café posts, and NAVER Q&A posts from 2005 to 2020 were collected from NAVER, a portal site containing drug-related information, and natural language processing techniques were applied to analyze data written in Korean. Posts containing highly relevant drug names and ADR word pairs were filtered through association analysis, and training data were generated through manual labeling tasks. Using the training data, an embedded layer of word2vec was formed, and a Bidirectional Long Short-Term Memory (Bi-LSTM) classification model was generated. Then, we evaluated the area under the curve with other machine learning models. In addition, the entire process was further verified using the nonsteroidal anti-inflammatory drug aceclofenac.

Results: Among the nonsteroidal anti-inflammatory drugs, Korean SNS posts containing information on ketoprofen and aceclofenac were secured, and the generic name lexicon, ADR lexicon, and Korean stop word lexicon were generated. In addition, to improve the accuracy of the classification model, an embedding layer was created considering the association between the drug name and the ADR word. In the ADR post classification test, ketoprofen and aceclofenac achieved 85% and 80% accuracy, respectively.

Conclusions: Here, we propose a process for developing a model for classifying ADR posts using SNS data. After analyzing drug name-ADR patterns, we filtered high-quality data by extracting posts, including known ADR words based on the analysis. Based on these data, we developed a model that classifies ADR posts. This confirmed that a model that can leverage social data to monitor ADRs automatically is feasible.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用韩国社交网络服务数据，基于双向长短期记忆检测药物不良反应帖子：深度学习方法。

背景：社交网络服务（SNS）密切反映了现代社会中个人的生活，并产生了大量数据。以往的研究利用相关的 SNS 数据提取药物信息。特别是，利用药物监测系统及早发现药物不良反应（ADRs）非常重要。为此，各种深度学习方法已被用于分析除英语外的多种语言数据：我们选择了一种可导致老年患者 ADR 的警戒药物，并收集了包含该药物信息的韩国 SNS 数据。基于这些信息，我们旨在开发一种基于递归神经网络对药物 ADR 帖子进行分类的深度学习模型：在之前的研究中，我们选择了处方频率较高的酮洛芬作为目标药物，因此从 SNS 数据中获取的帖子中提及酮洛芬的最多。研究人员从包含药物相关信息的门户网站 NAVER 收集了 2005 年至 2020 年的博客帖子、咖啡馆帖子和 NAVER 问答帖子，并应用自然语言处理技术分析了用韩语撰写的数据。通过关联分析筛选出含有高度相关的药物名称和 ADR 词对的帖子，并通过手动标记任务生成训练数据。利用训练数据形成了 word2vec 的嵌入层，并生成了双向长短期记忆（Bi-LSTM）分类模型。然后，我们评估了与其他机器学习模型的曲线下面积。此外，我们还使用非甾体抗炎药醋氯芬酸进一步验证了整个过程：在非甾体抗炎药中，我们获取了包含酮洛芬和醋氯芬酸信息的韩国 SNS 帖子，并生成了通用名称词典、ADR 词库和韩语停滞词词典。此外，为了提高分类模型的准确性，考虑到药物名称和 ADR 词之间的关联，还创建了一个嵌入层。在 ADR 后分类测试中，酮洛芬和醋氯芬酸的准确率分别达到了 85% 和 80%：在此，我们提出了一种利用 SNS 数据开发 ADR 帖子分类模型的方法。在分析了药物名称-ADR 模式后，我们通过提取帖子来过滤高质量数据，包括基于分析的已知 ADR 词。基于这些数据，我们开发了一个可对 ADR 帖子进行分类的模型。这证实了利用社交数据自动监测 ADR 的模型是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

JMIR Medical Informatics Medicine-Health Informatics

CiteScore

7.90

自引率

3.10%

发文量

173

审稿时长

12 weeks

期刊介绍： JMIR Medical Informatics (JMI, ISSN 2291-9694) is a top-rated, tier A journal which focuses on clinical informatics, big data in health and health care, decision support for health professionals, electronic health records, ehealth infrastructures and implementation. It has a focus on applied, translational research, with a broad readership including clinicians, CIOs, engineers, industry and health informatics professionals. Published by JMIR Publications, publisher of the Journal of Medical Internet Research (JMIR), the leading eHealth/mHealth journal (Impact Factor 2016: 5.175), JMIR Med Inform has a slightly different scope (emphasizing more on applications for clinicians and health professionals rather than consumers/citizens, which is the focus of JMIR), publishes even faster, and also allows papers which are more technical or more formative than what would be published in the Journal of Medical Internet Research.