{"title":"从相似性到概率:预测药物不良反应的特征工程","authors":"Nahla H. Barakat, Ahmed H. ElSabbagh","doi":"10.32604/iasc.2022.022104","DOIUrl":null,"url":null,"abstract":"Social media recently became convenient platforms for different groups with common concerns to share their experiences, including Adverse Drug Reactions (ADRs). In this paper, we propose a two stage intelligent algorithm which we call “Simi_to_Prob”, that utilizes social media forums; for ranking ADRs, and evaluating the ADRs prevalence considering different age and gender groups as its first stage. In the second stage, ADRs are predicted utilizing a different data set from the Food and Drug Administration (FDA). In particular, Natural Language Processing (NLP) is used on social media to extract ranked lists of ADRs, which are then validated using novel intrinsic evaluation methods. In the second stage, feature engineering is used to extend the input feature space, then a two stage supervised machine learning method is used to predict future ADRs incidences. Our results show correct ranked list of ADRs for three antihypertensive drugs, where high Spearman’s rank correlation coefficients (rs) of of 0.7458, 0.6678 and 0.5929 were obtained between SIDER database for drug ADRs, and our obtained lists from social media. Furthermore, Relatedness between ADRs and age and gender groups achieved high area under the ROC curve (AUC) reaching 0.959. The second stage results showed high AUCs of 0.96 and 0.99 for the prediction of future ADRs probabilities. The proposed algorithm shows that mining social media can provide reliable source of information, and additional features that can be used to boost supervised machine learning methods’ performance in different domains including Pharmacovigilance research.","PeriodicalId":50357,"journal":{"name":"Intelligent Automation and Soft Computing","volume":"68 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From Similarities to Probabilities: Feature Engineering for Predicting Drugs’ Adverse Reactions\",\"authors\":\"Nahla H. Barakat, Ahmed H. ElSabbagh\",\"doi\":\"10.32604/iasc.2022.022104\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media recently became convenient platforms for different groups with common concerns to share their experiences, including Adverse Drug Reactions (ADRs). In this paper, we propose a two stage intelligent algorithm which we call “Simi_to_Prob”, that utilizes social media forums; for ranking ADRs, and evaluating the ADRs prevalence considering different age and gender groups as its first stage. In the second stage, ADRs are predicted utilizing a different data set from the Food and Drug Administration (FDA). In particular, Natural Language Processing (NLP) is used on social media to extract ranked lists of ADRs, which are then validated using novel intrinsic evaluation methods. In the second stage, feature engineering is used to extend the input feature space, then a two stage supervised machine learning method is used to predict future ADRs incidences. Our results show correct ranked list of ADRs for three antihypertensive drugs, where high Spearman’s rank correlation coefficients (rs) of of 0.7458, 0.6678 and 0.5929 were obtained between SIDER database for drug ADRs, and our obtained lists from social media. Furthermore, Relatedness between ADRs and age and gender groups achieved high area under the ROC curve (AUC) reaching 0.959. The second stage results showed high AUCs of 0.96 and 0.99 for the prediction of future ADRs probabilities. The proposed algorithm shows that mining social media can provide reliable source of information, and additional features that can be used to boost supervised machine learning methods’ performance in different domains including Pharmacovigilance research.\",\"PeriodicalId\":50357,\"journal\":{\"name\":\"Intelligent Automation and Soft Computing\",\"volume\":\"68 1\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Intelligent Automation and Soft Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.32604/iasc.2022.022104\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Automation and Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/iasc.2022.022104","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
From Similarities to Probabilities: Feature Engineering for Predicting Drugs’ Adverse Reactions
Social media recently became convenient platforms for different groups with common concerns to share their experiences, including Adverse Drug Reactions (ADRs). In this paper, we propose a two stage intelligent algorithm which we call “Simi_to_Prob”, that utilizes social media forums; for ranking ADRs, and evaluating the ADRs prevalence considering different age and gender groups as its first stage. In the second stage, ADRs are predicted utilizing a different data set from the Food and Drug Administration (FDA). In particular, Natural Language Processing (NLP) is used on social media to extract ranked lists of ADRs, which are then validated using novel intrinsic evaluation methods. In the second stage, feature engineering is used to extend the input feature space, then a two stage supervised machine learning method is used to predict future ADRs incidences. Our results show correct ranked list of ADRs for three antihypertensive drugs, where high Spearman’s rank correlation coefficients (rs) of of 0.7458, 0.6678 and 0.5929 were obtained between SIDER database for drug ADRs, and our obtained lists from social media. Furthermore, Relatedness between ADRs and age and gender groups achieved high area under the ROC curve (AUC) reaching 0.959. The second stage results showed high AUCs of 0.96 and 0.99 for the prediction of future ADRs probabilities. The proposed algorithm shows that mining social media can provide reliable source of information, and additional features that can be used to boost supervised machine learning methods’ performance in different domains including Pharmacovigilance research.
期刊介绍:
An International Journal seeks to provide a common forum for the dissemination of accurate results about the world of intelligent automation, artificial intelligence, computer science, control, intelligent data science, modeling and systems engineering. It is intended that the articles published in the journal will encompass both the short and the long term effects of soft computing and other related fields such as robotics, control, computer, vision, speech recognition, pattern recognition, data mining, big data, data analytics, machine intelligence, cyber security and deep learning. It further hopes it will address the existing and emerging relationships between automation, systems engineering, system of systems engineering and soft computing. The journal will publish original and survey papers on artificial intelligence, intelligent automation and computer engineering with an emphasis on current and potential applications of soft computing. It will have a broad interest in all engineering disciplines, computer science, and related technological fields such as medicine, biology operations research, technology management, agriculture and information technology.