Pub Date : 2024-08-13DOI: 10.1109/TNB.2024.3442912
Ghazaleh Babanejaddehaki;Aijun An;Heidar Davoudi
Given the persistent global challenge presented by rapidly spreading diseases, as evidenced notably by the widespread impact of the COVID-19 pandemic on both human health and economies worldwide, the necessity of developing effective infectious disease prediction models has become of utmost importance. In this context, the utilization of online social media platforms as valuable tools in healthcare settings has gained prominence, offering direct avenues for disseminating critical health information to the public in a timely and accessible manner. Propelled by the ubiquitous accessibility of the internet through computers and mobile devices, these platforms promise to revolutionize traditional detection methods, providing more immediate and reliable epidemiological insights. Leveraging this paradigm shift, our proposed framework harnesses Twitter data associated with infectious disease symptoms, employing ontology to identify and curate relevant tweets. Central to our methodology is a hybrid model that integrates XGBoost and Bidirectional Long Short-Term Memory (BiLSTM) architectures. The integration of XGBoost addresses the challenge of handling small dataset sizes, inherent during outbreaks due to limited time series data. XGBoost serves as a cornerstone for minimizing the loss function and identifying optimal features from our multivariate time series data. Subsequently, the combined dataset, comprising original features and predicted values by XGBoost, is channeled into the BiLSTM for further processing. Through extensive experimentation with a dataset spanning multiple infectious disease outbreaks, our hybrid model demonstrates superior predictive performance compared to state-of-the-art and baseline models. By enhancing forecasting accuracy and outbreak tracking capabilities, our model offers promising prospects for assisting health authorities in mitigating fatalities and proactively preparing for potential outbreaks.
{"title":"Ontology-Based Data Collection for a Hybrid Outbreak Detection Method Using Social Media","authors":"Ghazaleh Babanejaddehaki;Aijun An;Heidar Davoudi","doi":"10.1109/TNB.2024.3442912","DOIUrl":"10.1109/TNB.2024.3442912","url":null,"abstract":"Given the persistent global challenge presented by rapidly spreading diseases, as evidenced notably by the widespread impact of the COVID-19 pandemic on both human health and economies worldwide, the necessity of developing effective infectious disease prediction models has become of utmost importance. In this context, the utilization of online social media platforms as valuable tools in healthcare settings has gained prominence, offering direct avenues for disseminating critical health information to the public in a timely and accessible manner. Propelled by the ubiquitous accessibility of the internet through computers and mobile devices, these platforms promise to revolutionize traditional detection methods, providing more immediate and reliable epidemiological insights. Leveraging this paradigm shift, our proposed framework harnesses Twitter data associated with infectious disease symptoms, employing ontology to identify and curate relevant tweets. Central to our methodology is a hybrid model that integrates XGBoost and Bidirectional Long Short-Term Memory (BiLSTM) architectures. The integration of XGBoost addresses the challenge of handling small dataset sizes, inherent during outbreaks due to limited time series data. XGBoost serves as a cornerstone for minimizing the loss function and identifying optimal features from our multivariate time series data. Subsequently, the combined dataset, comprising original features and predicted values by XGBoost, is channeled into the BiLSTM for further processing. Through extensive experimentation with a dataset spanning multiple infectious disease outbreaks, our hybrid model demonstrates superior predictive performance compared to state-of-the-art and baseline models. By enhancing forecasting accuracy and outbreak tracking capabilities, our model offers promising prospects for assisting health authorities in mitigating fatalities and proactively preparing for potential outbreaks.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 4","pages":"591-602"},"PeriodicalIF":3.7,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1109/TNB.2024.3441689
Yan Wang;Jie Hong;Yuting Lu;Nan Sheng;Yuan Fu;Lili Yang;Lingyu Meng;Lan Huang;Hao Wang
Pancreatic cancer is one of the most malignant cancers with rapid progression and poor prognosis. The use of transcriptional data can be effective in finding new biomarkers for pancreatic cancer. Many network-based methods used to identify cancer biomarkers are proposed, among which the combination of network controllability appears. However, most of the existing methods do not study RNA, rely on priori and mutations information, or can only achieve classification tasks. In this study, we propose a method combined Relational Graph Convolutional Network and Deep Q-Network called RDDriver to identify pancreatic cancer biomarkers based on multi-layer heterogeneous transcriptional regulation network. Firstly, we construct a regulation network containing long non-coding RNA, microRNA, and messenger RNA. Secondly, Relational Graph Convolutional Network is used to learn the node representation. Finally, we use the idea of Deep Q-Network to build a model, which score and prioritize each RNA with the Popov-Belevitch-Hautus criterion. We train RDDriver on three small simulated networks, and calculate the average score after applying the model parameters to the regulation networks separately. To demonstrate the effectiveness of the method, we perform experiments for comparison between RDDriver and other eight methods based on the approximate benchmark of three types cancer drivers RNAs.
{"title":"A Controllability Reinforcement Learning Method for Pancreatic Cancer Biomarker Identification","authors":"Yan Wang;Jie Hong;Yuting Lu;Nan Sheng;Yuan Fu;Lili Yang;Lingyu Meng;Lan Huang;Hao Wang","doi":"10.1109/TNB.2024.3441689","DOIUrl":"10.1109/TNB.2024.3441689","url":null,"abstract":"Pancreatic cancer is one of the most malignant cancers with rapid progression and poor prognosis. The use of transcriptional data can be effective in finding new biomarkers for pancreatic cancer. Many network-based methods used to identify cancer biomarkers are proposed, among which the combination of network controllability appears. However, most of the existing methods do not study RNA, rely on priori and mutations information, or can only achieve classification tasks. In this study, we propose a method combined Relational Graph Convolutional Network and Deep Q-Network called RDDriver to identify pancreatic cancer biomarkers based on multi-layer heterogeneous transcriptional regulation network. Firstly, we construct a regulation network containing long non-coding RNA, microRNA, and messenger RNA. Secondly, Relational Graph Convolutional Network is used to learn the node representation. Finally, we use the idea of Deep Q-Network to build a model, which score and prioritize each RNA with the Popov-Belevitch-Hautus criterion. We train RDDriver on three small simulated networks, and calculate the average score after applying the model parameters to the regulation networks separately. To demonstrate the effectiveness of the method, we perform experiments for comparison between RDDriver and other eight methods based on the approximate benchmark of three types cancer drivers RNAs.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 4","pages":"556-563"},"PeriodicalIF":3.7,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10633729","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141971020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1109/TNB.2024.3441590
Xiwei Tang;Yiqiang Zhou;Mengyun Yang;Wenjun Li
Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer’s encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index ( ${r}_{m}^{{2}}$