Pub Date : 2024-09-24DOI: 10.1109/TCBB.2024.3467135
Ben Xu, Jianping Chen, Yunzhe Wang, Qiming Fu, You Lu
Graph neural networks offer an effective avenue for predicting drug-target interactions. In this domain, researchers have found that constructing heterogeneous information networks based on metapaths using diverse biological datasets enhances prediction performance. However, the performance of such methods is closely tied to the selection of metapaths and the compatibility between metapath subgraphs and graph neural networks. Most existing approaches still rely on fixed strategies for selecting metapaths and often fail to fully exploit node information along the metapaths, limiting the improvement in model performance. This paper introduces a novel method for predicting drug-target interactions by optimizing metapaths in heterogeneous information networks. On one hand, the method formulates the metapath optimization problem as a Markov decision process, using the enhancement of downstream network performance as a reward signal. Through iterative training of a reinforcement learning agent, a high-quality set of metapaths is learned. On the other hand, to fully leverage node information along the metapaths, the paper constructs subgraphs based on nodes along the metapaths. Different depths of subgraphs are processed using different graph convolutional neural network. The proposed method is validated using standard heterogeneous biological benchmark datasets. Experimental results on standard datasets show significant advantages over traditional methods.
{"title":"Reinforced Metapath Optimization in Heterogeneous Information Networks for Drug-Target Interaction Prediction.","authors":"Ben Xu, Jianping Chen, Yunzhe Wang, Qiming Fu, You Lu","doi":"10.1109/TCBB.2024.3467135","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467135","url":null,"abstract":"<p><p>Graph neural networks offer an effective avenue for predicting drug-target interactions. In this domain, researchers have found that constructing heterogeneous information networks based on metapaths using diverse biological datasets enhances prediction performance. However, the performance of such methods is closely tied to the selection of metapaths and the compatibility between metapath subgraphs and graph neural networks. Most existing approaches still rely on fixed strategies for selecting metapaths and often fail to fully exploit node information along the metapaths, limiting the improvement in model performance. This paper introduces a novel method for predicting drug-target interactions by optimizing metapaths in heterogeneous information networks. On one hand, the method formulates the metapath optimization problem as a Markov decision process, using the enhancement of downstream network performance as a reward signal. Through iterative training of a reinforcement learning agent, a high-quality set of metapaths is learned. On the other hand, to fully leverage node information along the metapaths, the paper constructs subgraphs based on nodes along the metapaths. Different depths of subgraphs are processed using different graph convolutional neural network. The proposed method is validated using standard heterogeneous biological benchmark datasets. Experimental results on standard datasets show significant advantages over traditional methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cancer is a complex genomic mutation disease, and identifying cancer driver genes promotes the development of targeted drugs and personalized therapies. The current computational method takes less consideration of the relationship among features and the effect of noise in protein-protein interaction(PPI) data, resulting in a low recognition rate. In this paper, we propose a cancer driver genes identification method based on dynamic incentive model, DIM. This method firstly constructs a hypergraph to reduce the impact of false positive data in PPI. Then, the importance of genes in each hyperedge in hypergraph is considered from three perspectives, network and functional score(NFS) is proposed. By analyzing the relation among features, the dynamic incentive model is proposed to fuse NFS, the differential expression score of mRNA and the differential expression score of miRNA. DIM is compared with some classical methods on breast cancer, lung cancer, prostate cancer, and pan-cancer datasets. The results show that DIM has the best performance on statistical evaluation indicators, functional consistency and the partial area under the ROC curve, and has good cross-cancer capability.
癌症是一种复杂的基因组突变疾病,识别癌症驱动基因有助于靶向药物和个性化疗法的开发。目前的计算方法较少考虑蛋白质-蛋白质相互作用(PPI)数据中特征之间的关系和噪声的影响,导致识别率较低。本文提出了一种基于动态激励模型(DIM)的癌症驱动基因识别方法。该方法首先构建了一个超图,以减少 PPI 中假阳性数据的影响。然后,从网络和功能得分(NFS)三个角度考虑超图中每个超边中基因的重要性。通过分析特征之间的关系,提出了融合 NFS、mRNA 差异表达得分和 miRNA 差异表达得分的动态激励模型。在乳腺癌、肺癌、前列腺癌和泛癌症数据集上,将 DIM 与一些经典方法进行了比较。结果表明,DIM 在统计评价指标、功能一致性和 ROC 曲线下部分面积方面表现最佳,并具有良好的跨癌症能力。
{"title":"Identification of cancer driver genes based on dynamic incentive model.","authors":"Zhipeng Hu, Gaoshi Li, Xinlong Luo, Wei Peng, Jiafei Liu, Xiaoshu Zhu, Jingli Wu","doi":"10.1109/TCBB.2024.3467119","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467119","url":null,"abstract":"<p><p>Cancer is a complex genomic mutation disease, and identifying cancer driver genes promotes the development of targeted drugs and personalized therapies. The current computational method takes less consideration of the relationship among features and the effect of noise in protein-protein interaction(PPI) data, resulting in a low recognition rate. In this paper, we propose a cancer driver genes identification method based on dynamic incentive model, DIM. This method firstly constructs a hypergraph to reduce the impact of false positive data in PPI. Then, the importance of genes in each hyperedge in hypergraph is considered from three perspectives, network and functional score(NFS) is proposed. By analyzing the relation among features, the dynamic incentive model is proposed to fuse NFS, the differential expression score of mRNA and the differential expression score of miRNA. DIM is compared with some classical methods on breast cancer, lung cancer, prostate cancer, and pan-cancer datasets. The results show that DIM has the best performance on statistical evaluation indicators, functional consistency and the partial area under the ROC curve, and has good cross-cancer capability.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24DOI: 10.1109/TCBB.2024.3467033
Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias
Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.
{"title":"Partition Based Algorithms for Rearrangement Distances with Flexible Intergenic Regions.","authors":"Gabriel Siqueira, Alexsandro Oliveira Alexandrino, Andre Rodrigues Oliveira, Geraldine Jean, Guillaume Fertin, Zanoni Dias","doi":"10.1109/TCBB.2024.3467033","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467033","url":null,"abstract":"<p><p>Genome Rearrangement distance problems are used in Computational Biology to estimate the evolutionary distance between genomes. These problems consist of minimizing the number of rearrangement events necessary to transform one genome into another. Two commonly used rearrangement events are reversal and transposition. The first studied problems ignored nucleotides outside genes (called intergenic regions), or assumed that genomes have a single copy of each gene. Recent works made advancements in more general problems considering the number of nucleotides in intergenic regions, and replicated genes. Nevertheless, genomes tend to have wildly different quantities of nucleotides on their intergenic regions, which poses a problem when comparing these regions exactly. To overcome this limitation, our work considers some flexibility when matching intergenic regions that do not have the same number of nucleotides. We propose new problems seeking the minimum number of reversals, or reversals and transpositions, necessary to transform one genome into another, while considering flexible intergenic region information. We show approximations for these problems by exploring their relationship with the Signed Minimum Common Flexible Intergenic String Partition problem. We also present different heuristics for the partition problem, and conduct experimental tests on simulated genomes to assess the performance of our algorithms.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-24DOI: 10.1109/TCBB.2024.3467261
Jiashun Wu, Yan Liu, Yiheng Zhu, Dong-Jun Yu
Accurate identification of antifreeze proteins (AFPs) is crucial in developing biomimetic synthetic anti-icing materials and low-temperature organ preservation materials. Although numerous machine learning-based methods have been proposed for AFPs prediction, the complex and diverse nature of AFPs limits the prediction performance of existing methods. In this study, we propose AFP-Deep, a new deep learning method to predict antifreeze proteins by integrating embedding from protein sequences with pre-trained protein language models and evolutionary contexts with hybrid feature extraction networks. The experimental results demonstrated that the main advantage of AFP-Deep is its utilization of pre-trained protein language models, which can extract discriminative global contextual features from protein sequences. Additionally, the hybrid deep neural networks designed for protein language models and evolutionary context feature extraction enhance the correlation between embeddings and antifreeze pattern. The performance evaluation results show that AFP-Deep achieves superior performance compared to state-of-the-art models on benchmark datasets, achieving an AUPRC of 0.724 and 0.924, respectively.
{"title":"Improving Antifreeze Proteins Prediction with Protein Language Models and Hybrid Feature Extraction Networks.","authors":"Jiashun Wu, Yan Liu, Yiheng Zhu, Dong-Jun Yu","doi":"10.1109/TCBB.2024.3467261","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3467261","url":null,"abstract":"<p><p>Accurate identification of antifreeze proteins (AFPs) is crucial in developing biomimetic synthetic anti-icing materials and low-temperature organ preservation materials. Although numerous machine learning-based methods have been proposed for AFPs prediction, the complex and diverse nature of AFPs limits the prediction performance of existing methods. In this study, we propose AFP-Deep, a new deep learning method to predict antifreeze proteins by integrating embedding from protein sequences with pre-trained protein language models and evolutionary contexts with hybrid feature extraction networks. The experimental results demonstrated that the main advantage of AFP-Deep is its utilization of pre-trained protein language models, which can extract discriminative global contextual features from protein sequences. Additionally, the hybrid deep neural networks designed for protein language models and evolutionary context feature extraction enhance the correlation between embeddings and antifreeze pattern. The performance evaluation results show that AFP-Deep achieves superior performance compared to state-of-the-art models on benchmark datasets, achieving an AUPRC of 0.724 and 0.924, respectively.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142345926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N7 -methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called "GenoM7GNet," for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.
{"title":"GenoM7GNet: An Efficient N<sup>7</sup>-Methylguanosine Site Prediction Approach Based on a Nucleotide Language Model.","authors":"Chuang Li, Heshi Wang, Yanhua Wen, Rui Yin, Xiangxiang Zeng, Keqin Li","doi":"10.1109/TCBB.2024.3459870","DOIUrl":"https://doi.org/10.1109/TCBB.2024.3459870","url":null,"abstract":"<p><p>N<sup>7</sup> -methylguanosine (m7G), one of the mainstream post-transcriptional RNA modifications, occupies an exceedingly significant place in medical treatments. However, classic approaches for identifying m7G sites are costly both in time and equipment. Meanwhile, the existing machine learning methods extract limited hidden information from RNA sequences, thus making it difficult to improve the accuracy. Therefore, we put forward to a deep learning network, called \"GenoM7GNet,\" for m7G site identification. This model utilizes a Bidirectional Encoder Representation from Transformers (BERT) and is pretrained on nucleotide sequences data to capture hidden patterns from RNA sequences for m7G site prediction. Moreover, through detailed comparative experiments with various deep learning models, we discovered that the one-dimensional convolutional neural network (CNN) exhibits outstanding performance in sequence feature learning and classification. The proposed GenoM7GNet model achieved 0.953in accuracy, 0.932in sensitivity, 0.976in specificity, 0.907in Matthews Correlation Coefficient and 0.984in Area Under the receiver operating characteristic Curve on performance evaluation. Extensive experimental results further prove that our GenoM7GNet model markedly surpasses other state-of-the-art models in predicting m7G sites, exhibiting high computing performance.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142286167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-17DOI: 10.1109/tcbb.2024.3462730
Mengzhen Li, Mustafa Coşkun, Mehmet Koyutürk
{"title":"Topological-Similarity Based Canonical Representations for Biological Link Prediction","authors":"Mengzhen Li, Mustafa Coşkun, Mehmet Koyutürk","doi":"10.1109/tcbb.2024.3462730","DOIUrl":"https://doi.org/10.1109/tcbb.2024.3462730","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-13DOI: 10.1109/tcbb.2024.3433523
Fernando H. C. Dias, Alexandru I. Tomescu
{"title":"Accurate Flow Decomposition via Robust Integer Linear Programming","authors":"Fernando H. C. Dias, Alexandru I. Tomescu","doi":"10.1109/tcbb.2024.3433523","DOIUrl":"https://doi.org/10.1109/tcbb.2024.3433523","url":null,"abstract":"","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}