Pub Date : 2024-08-16DOI: 10.1109/TNB.2024.3444922
Xiangjin Hu;Haoran Yi;Hao Cheng;Yijing Zhao;Dongqi Zhang;Jinxin Li;Jingjing Ruan;Jin Zhang;Xinguo Lu
Computational synthetic lethality (SL) method has become a promising strategy to identify SL gene pairs for targeted cancer therapy and cancer medicine development. Feature representation for integrating various biological networks is crutial to improve the identification performance. However, previous feature representation, such as matrix factorization and graph neural network, projects gene features onto latent variables by keeping a specific geometric metric. There is a lack of models of gene representational latent space with considerating multiple dimentionalities correlation and preserving latent geometric structures in both sample and feature spaces. Therefore, we propose a novel method to model gene Latent Space using matrix Tri-Factorization (LSTF) to obtain gene representation with embedding variables resulting from the potential interpretation of synthetic lethality. Meanwhile, manifold subspace regularization is applied to the tri-factorization to capture the geometrical manifold structure in the latent space with gene PPI functional and GO semantic embeddings. Then, SL gene pairs are identified by the reconstruction of the associations with gene representations in the latent space. The experimental results illustrate that LSTF is superior to other state-of-the-art methods. Case study demonstrate the effectiveness of the predicted SL associations.
计算合成致死率(SL)方法已成为为癌症靶向治疗和癌症药物开发识别SL基因对的一种有前途的策略。整合各种生物网络的特征表示对于提高识别性能至关重要。然而,以往的特征表示方法,如矩阵因式分解和图神经网络,都是通过保持特定的几何度量将基因特征投射到潜在变量上。目前还缺乏同时考虑多维度相关性和保留样本空间与特征空间中潜在几何结构的基因表征潜在空间模型。因此,我们提出了一种利用矩阵三因子化(LSTF)对基因潜空间进行建模的新方法,以获得具有合成致死率潜在解释所产生的嵌入变量的基因表征。同时,将流形子空间正则化应用于三因子化,以捕捉潜空间中带有基因 PPI 功能嵌入和 GO 语义嵌入的几何流形结构。然后,通过重建潜空间中与基因表征的关联来识别 SL 基因对。实验结果表明,LSTF 优于其他最先进的方法。案例研究证明了预测 SL 关联的有效性。
{"title":"Multiple Heterogeneous Networks Representation With Latent Space for Synthetic Lethality Prediction","authors":"Xiangjin Hu;Haoran Yi;Hao Cheng;Yijing Zhao;Dongqi Zhang;Jinxin Li;Jingjing Ruan;Jin Zhang;Xinguo Lu","doi":"10.1109/TNB.2024.3444922","DOIUrl":"10.1109/TNB.2024.3444922","url":null,"abstract":"Computational synthetic lethality (SL) method has become a promising strategy to identify SL gene pairs for targeted cancer therapy and cancer medicine development. Feature representation for integrating various biological networks is crutial to improve the identification performance. However, previous feature representation, such as matrix factorization and graph neural network, projects gene features onto latent variables by keeping a specific geometric metric. There is a lack of models of gene representational latent space with considerating multiple dimentionalities correlation and preserving latent geometric structures in both sample and feature spaces. Therefore, we propose a novel method to model gene Latent Space using matrix Tri-Factorization (LSTF) to obtain gene representation with embedding variables resulting from the potential interpretation of synthetic lethality. Meanwhile, manifold subspace regularization is applied to the tri-factorization to capture the geometrical manifold structure in the latent space with gene PPI functional and GO semantic embeddings. Then, SL gene pairs are identified by the reconstruction of the associations with gene representations in the latent space. The experimental results illustrate that LSTF is superior to other state-of-the-art methods. Case study demonstrate the effectiveness of the predicted SL associations.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 4","pages":"564-571"},"PeriodicalIF":3.7,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141992287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-14DOI: 10.1109/TNB.2024.3443244
Wenjie Yao;Ankang Wei;Zhen Xiao;Weizhong Zhao;Xianjun Shen;Xingpeng Jiang;Tingting He
Detecting side effects of drugs is a fundamental task in drug development. With the expansion of publicly available biomedical data, researchers have proposed many computational methods for predicting drug-side effect associations (DSAs), among which network-based methods attract wide attention in the biomedical field. However, the problem of data scarcity poses a great challenge for existing DSAs prediction models. Although several data augmentation methods have been proposed to address this issue, most of existing methods employ a random way to manipulate the original networks, which ignores the causality of existence of DSAs, leading to the poor performance on the task of DSAs prediction. In this paper, we propose a counterfactual inference-based data augmentation method for improving the performance of the task. First, we construct a heterogeneous information network (HIN) by integrating multiple biomedical data. Based on the community detection on the HIN, a counterfactual inference-based method is designed to derive augmented links, and an augmented HIN is obtained accordingly. Then, a meta-path-based graph neural network is applied to learn high-quality representations of drugs and side effects, on which the predicted DSAs are obtained. Finally, comprehensive experiments are conducted, and the results demonstrate the effectiveness of the proposed counterfactual inference-based data augmentation for the task of DSAs prediction.
{"title":"An Improved Framework for Drug-Side Effect Associations Prediction via Counterfactual Inference-Based Data Augmentation","authors":"Wenjie Yao;Ankang Wei;Zhen Xiao;Weizhong Zhao;Xianjun Shen;Xingpeng Jiang;Tingting He","doi":"10.1109/TNB.2024.3443244","DOIUrl":"10.1109/TNB.2024.3443244","url":null,"abstract":"Detecting side effects of drugs is a fundamental task in drug development. With the expansion of publicly available biomedical data, researchers have proposed many computational methods for predicting drug-side effect associations (DSAs), among which network-based methods attract wide attention in the biomedical field. However, the problem of data scarcity poses a great challenge for existing DSAs prediction models. Although several data augmentation methods have been proposed to address this issue, most of existing methods employ a random way to manipulate the original networks, which ignores the causality of existence of DSAs, leading to the poor performance on the task of DSAs prediction. In this paper, we propose a counterfactual inference-based data augmentation method for improving the performance of the task. First, we construct a heterogeneous information network (HIN) by integrating multiple biomedical data. Based on the community detection on the HIN, a counterfactual inference-based method is designed to derive augmented links, and an augmented HIN is obtained accordingly. Then, a meta-path-based graph neural network is applied to learn high-quality representations of drugs and side effects, on which the predicted DSAs are obtained. Finally, comprehensive experiments are conducted, and the results demonstrate the effectiveness of the proposed counterfactual inference-based data augmentation for the task of DSAs prediction.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 4","pages":"540-547"},"PeriodicalIF":3.7,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141982181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-13DOI: 10.1109/TNB.2024.3442912
Ghazaleh Babanejaddehaki;Aijun An;Heidar Davoudi
Given the persistent global challenge presented by rapidly spreading diseases, as evidenced notably by the widespread impact of the COVID-19 pandemic on both human health and economies worldwide, the necessity of developing effective infectious disease prediction models has become of utmost importance. In this context, the utilization of online social media platforms as valuable tools in healthcare settings has gained prominence, offering direct avenues for disseminating critical health information to the public in a timely and accessible manner. Propelled by the ubiquitous accessibility of the internet through computers and mobile devices, these platforms promise to revolutionize traditional detection methods, providing more immediate and reliable epidemiological insights. Leveraging this paradigm shift, our proposed framework harnesses Twitter data associated with infectious disease symptoms, employing ontology to identify and curate relevant tweets. Central to our methodology is a hybrid model that integrates XGBoost and Bidirectional Long Short-Term Memory (BiLSTM) architectures. The integration of XGBoost addresses the challenge of handling small dataset sizes, inherent during outbreaks due to limited time series data. XGBoost serves as a cornerstone for minimizing the loss function and identifying optimal features from our multivariate time series data. Subsequently, the combined dataset, comprising original features and predicted values by XGBoost, is channeled into the BiLSTM for further processing. Through extensive experimentation with a dataset spanning multiple infectious disease outbreaks, our hybrid model demonstrates superior predictive performance compared to state-of-the-art and baseline models. By enhancing forecasting accuracy and outbreak tracking capabilities, our model offers promising prospects for assisting health authorities in mitigating fatalities and proactively preparing for potential outbreaks.
{"title":"Ontology-Based Data Collection for a Hybrid Outbreak Detection Method Using Social Media","authors":"Ghazaleh Babanejaddehaki;Aijun An;Heidar Davoudi","doi":"10.1109/TNB.2024.3442912","DOIUrl":"10.1109/TNB.2024.3442912","url":null,"abstract":"Given the persistent global challenge presented by rapidly spreading diseases, as evidenced notably by the widespread impact of the COVID-19 pandemic on both human health and economies worldwide, the necessity of developing effective infectious disease prediction models has become of utmost importance. In this context, the utilization of online social media platforms as valuable tools in healthcare settings has gained prominence, offering direct avenues for disseminating critical health information to the public in a timely and accessible manner. Propelled by the ubiquitous accessibility of the internet through computers and mobile devices, these platforms promise to revolutionize traditional detection methods, providing more immediate and reliable epidemiological insights. Leveraging this paradigm shift, our proposed framework harnesses Twitter data associated with infectious disease symptoms, employing ontology to identify and curate relevant tweets. Central to our methodology is a hybrid model that integrates XGBoost and Bidirectional Long Short-Term Memory (BiLSTM) architectures. The integration of XGBoost addresses the challenge of handling small dataset sizes, inherent during outbreaks due to limited time series data. XGBoost serves as a cornerstone for minimizing the loss function and identifying optimal features from our multivariate time series data. Subsequently, the combined dataset, comprising original features and predicted values by XGBoost, is channeled into the BiLSTM for further processing. Through extensive experimentation with a dataset spanning multiple infectious disease outbreaks, our hybrid model demonstrates superior predictive performance compared to state-of-the-art and baseline models. By enhancing forecasting accuracy and outbreak tracking capabilities, our model offers promising prospects for assisting health authorities in mitigating fatalities and proactively preparing for potential outbreaks.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 4","pages":"591-602"},"PeriodicalIF":3.7,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141975609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1109/TNB.2024.3441689
Yan Wang;Jie Hong;Yuting Lu;Nan Sheng;Yuan Fu;Lili Yang;Lingyu Meng;Lan Huang;Hao Wang
Pancreatic cancer is one of the most malignant cancers with rapid progression and poor prognosis. The use of transcriptional data can be effective in finding new biomarkers for pancreatic cancer. Many network-based methods used to identify cancer biomarkers are proposed, among which the combination of network controllability appears. However, most of the existing methods do not study RNA, rely on priori and mutations information, or can only achieve classification tasks. In this study, we propose a method combined Relational Graph Convolutional Network and Deep Q-Network called RDDriver to identify pancreatic cancer biomarkers based on multi-layer heterogeneous transcriptional regulation network. Firstly, we construct a regulation network containing long non-coding RNA, microRNA, and messenger RNA. Secondly, Relational Graph Convolutional Network is used to learn the node representation. Finally, we use the idea of Deep Q-Network to build a model, which score and prioritize each RNA with the Popov-Belevitch-Hautus criterion. We train RDDriver on three small simulated networks, and calculate the average score after applying the model parameters to the regulation networks separately. To demonstrate the effectiveness of the method, we perform experiments for comparison between RDDriver and other eight methods based on the approximate benchmark of three types cancer drivers RNAs.
{"title":"A Controllability Reinforcement Learning Method for Pancreatic Cancer Biomarker Identification","authors":"Yan Wang;Jie Hong;Yuting Lu;Nan Sheng;Yuan Fu;Lili Yang;Lingyu Meng;Lan Huang;Hao Wang","doi":"10.1109/TNB.2024.3441689","DOIUrl":"10.1109/TNB.2024.3441689","url":null,"abstract":"Pancreatic cancer is one of the most malignant cancers with rapid progression and poor prognosis. The use of transcriptional data can be effective in finding new biomarkers for pancreatic cancer. Many network-based methods used to identify cancer biomarkers are proposed, among which the combination of network controllability appears. However, most of the existing methods do not study RNA, rely on priori and mutations information, or can only achieve classification tasks. In this study, we propose a method combined Relational Graph Convolutional Network and Deep Q-Network called RDDriver to identify pancreatic cancer biomarkers based on multi-layer heterogeneous transcriptional regulation network. Firstly, we construct a regulation network containing long non-coding RNA, microRNA, and messenger RNA. Secondly, Relational Graph Convolutional Network is used to learn the node representation. Finally, we use the idea of Deep Q-Network to build a model, which score and prioritize each RNA with the Popov-Belevitch-Hautus criterion. We train RDDriver on three small simulated networks, and calculate the average score after applying the model parameters to the regulation networks separately. To demonstrate the effectiveness of the method, we perform experiments for comparison between RDDriver and other eight methods based on the approximate benchmark of three types cancer drivers RNAs.","PeriodicalId":13264,"journal":{"name":"IEEE Transactions on NanoBioscience","volume":"23 4","pages":"556-563"},"PeriodicalIF":3.7,"publicationDate":"2024-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10633729","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141971020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-12DOI: 10.1109/TNB.2024.3441590
Xiwei Tang;Yiqiang Zhou;Mengyun Yang;Wenjun Li
Bioinformatics is a rapidly evolving field that applies computational methods to analyze and interpret biological data. A key task in bioinformatics is identifying novel drug-target interactions (DTIs), which plays a crucial role in drug discovery. Most computational approaches treat DTI prediction as a binary classification problem, determining whether drug-target pairs interact. However, with the growing availability of drug-target binding affinity data, this binary task can be reframed as a regression problem focused on drug-target affinity (DTA). DTA quantifies the strength of drug-target binding, offering more detailed insights than DTI and serving as a valuable tool for virtual screening in drug discovery. Accurately predicting compound interactions with targets can accelerate the drug development process. In this study, we introduce a deep learning model named TC-DTA for DTA prediction, leveraging convolutional neural networks (CNN) and the encoder module of the transformer architecture. We begin by extracting raw drug SMILES strings and protein amino acid sequences from the dataset, which are then represented using various encoding methods. Subsequently, we employ CNN and the transformer’s encoder module to extract features from the drug SMILES strings and protein sequences, respectively. Finally, the feature information is concatenated and input into a multi-layer perceptron to predict binding affinity scores. We evaluated our model on two benchmark DTA datasets, Davis and KIBA, comparing it with methods such as KronRLS, SimBoost, and DeepDTA. Our model, TC-DTA, outperformed these baseline methods based on evaluation metrics like Mean Squared Error (MSE), Concordance Index (CI), and Regression towards the Mean Index ( ${r}_{m}^{{2}}$