Pub Date : 2024-11-14DOI: 10.1007/s12539-024-00644-9
Gaili Li, Yongna Yuan, Ruisheng Zhang
The investigation of molecular interactions between ligands and their target molecules is becoming more significant as protein structure data continues to develop. In this study, we introduce PLA-STGCNnet, a deep fusion spatial-temporal graph neural network designed to study protein-ligand interactions based on the 3D structural data of protein-ligand complexes. Unlike 1D protein sequences or 2D ligand graphs, the 3D graph representation offers a more precise portrayal of the complex interactions between proteins and ligands. Research studies have shown that our fusion model, PLA-STGCNnet, outperforms individual algorithms in accurately predicting binding affinity. The advantage of a fusion model is the ability to fully combine the advantages of multiple different models and improve overall performance by combining their features and outputs. Our fusion model shows satisfactory performance on different data sets, which proves its generalization ability and stability. The fusion-based model showed good performance in protein-ligand affinity prediction, and we successfully applied the model to drug screening. Our research underscores the promise of fusion spatial-temporal graph neural networks in addressing complex challenges in protein-ligand affinity prediction. The Python scripts for implementing various model components are accessible at https://github.com/ligaili01/PLA-STGCN.
{"title":"Predicting Protein-Ligand Binding Affinity Using Fusion Model of Spatial-Temporal Graph Neural Network and 3D Structure-Based Complex Graph.","authors":"Gaili Li, Yongna Yuan, Ruisheng Zhang","doi":"10.1007/s12539-024-00644-9","DOIUrl":"https://doi.org/10.1007/s12539-024-00644-9","url":null,"abstract":"<p><p>The investigation of molecular interactions between ligands and their target molecules is becoming more significant as protein structure data continues to develop. In this study, we introduce PLA-STGCNnet, a deep fusion spatial-temporal graph neural network designed to study protein-ligand interactions based on the 3D structural data of protein-ligand complexes. Unlike 1D protein sequences or 2D ligand graphs, the 3D graph representation offers a more precise portrayal of the complex interactions between proteins and ligands. Research studies have shown that our fusion model, PLA-STGCNnet, outperforms individual algorithms in accurately predicting binding affinity. The advantage of a fusion model is the ability to fully combine the advantages of multiple different models and improve overall performance by combining their features and outputs. Our fusion model shows satisfactory performance on different data sets, which proves its generalization ability and stability. The fusion-based model showed good performance in protein-ligand affinity prediction, and we successfully applied the model to drug screening. Our research underscores the promise of fusion spatial-temporal graph neural networks in addressing complex challenges in protein-ligand affinity prediction. The Python scripts for implementing various model components are accessible at https://github.com/ligaili01/PLA-STGCN.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142619766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate prediction of anticancer drug responses is essential for developing personalized treatment plans in order to improve cancer patient survival rates and reduce healthcare costs. To this end, we propose a drug sensitivity prediction model based on multi-stage multi-modal drug representations (ModDRDSP) to reflect the properties of drugs more comprehensively, and to better model the complex interactions between cells and drugs. Specifically, we adopt the SMILES representation learning method based on the deep hierarchical bi-directional GRU network (DSBiGRU) and the molecular graph representation learning method based on the deep message-crossing network (DMCN) for the multi-modal information of drugs. Additionally, we integrate the multi-omics information of cell lines based on a convolutional neural network (CNN). Finally, we use an ensemble deep forest algorithm for the prediction of drug sensitivity. After validation, the ModDRDSP shows impressive performance which outperforms the four current industry-leading models. More importantly, ablation experiments demonstrate the validity of each module of the proposed model, and case studies show the good results of ModDRDSP for predicting drug sensitivity, further establishing the superiority of ModDRDSP in terms of performance.
{"title":"Drug Sensitivity Prediction Based on Multi-stage Multi-modal Drug Representation Learning.","authors":"Jinmiao Song, Mingjie Wei, Shuang Zhao, Hui Zhai, Qiguo Dai, Xiaodong Duan","doi":"10.1007/s12539-024-00668-1","DOIUrl":"https://doi.org/10.1007/s12539-024-00668-1","url":null,"abstract":"<p><p>Accurate prediction of anticancer drug responses is essential for developing personalized treatment plans in order to improve cancer patient survival rates and reduce healthcare costs. To this end, we propose a drug sensitivity prediction model based on multi-stage multi-modal drug representations (ModDRDSP) to reflect the properties of drugs more comprehensively, and to better model the complex interactions between cells and drugs. Specifically, we adopt the SMILES representation learning method based on the deep hierarchical bi-directional GRU network (DSBiGRU) and the molecular graph representation learning method based on the deep message-crossing network (DMCN) for the multi-modal information of drugs. Additionally, we integrate the multi-omics information of cell lines based on a convolutional neural network (CNN). Finally, we use an ensemble deep forest algorithm for the prediction of drug sensitivity. After validation, the ModDRDSP shows impressive performance which outperforms the four current industry-leading models. More importantly, ablation experiments demonstrate the validity of each module of the proposed model, and case studies show the good results of ModDRDSP for predicting drug sensitivity, further establishing the superiority of ModDRDSP in terms of performance.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142619765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1007/s12539-024-00660-9
Yajing Guo, Xiujuan Lei, Shuyu Li
Circular RNA (circRNA) has the capacity to bind with RNA binding protein (RBP), thereby exerting a substantial impact on diseases. Predicting binding sites aids in comprehending the interaction mechanism, thereby offering insights for disease treatment strategies. Here, we propose a novel approach based on temporal convolutional network (TCN) and cross multi-head attention mechanism to predict circRNA-RBP binding sites (circTCA). First, we employ two distinct encoding methodologies to obtain two raw matrices of circRNA sequences. Then, two parallel TCN blocks extract shallow and abstract features of the two matrices separately. The fusion of the two is achieved through cross multi-head attention mechanism and after this, global expectation pooling assigns weights to the concatenated feature. Finally, the task of classifying the input sequence is entrusted to a fully connected (FC) layer. We compare circTCA with other five methods and conduct ablation experiments to demonstrate its effectiveness. We also conduct feature visualization and assess the motifs extracted by circTCA with existing motifs. All in all, circTCA is effective for binding sites prediction of circRNA and RBP.
{"title":"An Integrated TCN-CrossMHA Model for Predicting circRNA-RBP Binding Sites.","authors":"Yajing Guo, Xiujuan Lei, Shuyu Li","doi":"10.1007/s12539-024-00660-9","DOIUrl":"https://doi.org/10.1007/s12539-024-00660-9","url":null,"abstract":"<p><p>Circular RNA (circRNA) has the capacity to bind with RNA binding protein (RBP), thereby exerting a substantial impact on diseases. Predicting binding sites aids in comprehending the interaction mechanism, thereby offering insights for disease treatment strategies. Here, we propose a novel approach based on temporal convolutional network (TCN) and cross multi-head attention mechanism to predict circRNA-RBP binding sites (circTCA). First, we employ two distinct encoding methodologies to obtain two raw matrices of circRNA sequences. Then, two parallel TCN blocks extract shallow and abstract features of the two matrices separately. The fusion of the two is achieved through cross multi-head attention mechanism and after this, global expectation pooling assigns weights to the concatenated feature. Finally, the task of classifying the input sequence is entrusted to a fully connected (FC) layer. We compare circTCA with other five methods and conduct ablation experiments to demonstrate its effectiveness. We also conduct feature visualization and assess the motifs extracted by circTCA with existing motifs. All in all, circTCA is effective for binding sites prediction of circRNA and RBP.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142581680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-28DOI: 10.1007/s12539-024-00664-5
Yuhong Su, Xincheng Zeng, Lingfeng Zhang, Yanlin Bian, Yangjing Wang, Buyong Ma
Antibodies against Aβ peptide have been recently approved to treat Alzheimer's disease, underscoring the importance of understanding their interactions for developing more potent treatments. Here we investigated the interaction between anti-Aβ antibodies and various peptides using a deep learning model. Our model, ABTrans, was trained on dodecapeptide sequences from phage display experiments and known anti-Aβ antibody sequences sourced from public sources. It classified the binding ability between anti-Aβ antibodies and dodecapeptides into four levels: not binding, weak binding, medium binding, and strong binding, achieving an accuracy of 0.83. Using ABTrans, we examined the cross-reaction of anti-Aβ antibodies with other human amyloidogenic proteins, revealing that Aducanumab and Donanemab exhibited the least cross-reactivity. Additionally, we systematically screened interactions between eleven selected anti-Aβ antibodies and all human proteins to identify potential off-target candidates.
{"title":"ABTrans: A Transformer-based Model for Predicting Interaction between Anti-Aβ Antibodies and Peptides.","authors":"Yuhong Su, Xincheng Zeng, Lingfeng Zhang, Yanlin Bian, Yangjing Wang, Buyong Ma","doi":"10.1007/s12539-024-00664-5","DOIUrl":"https://doi.org/10.1007/s12539-024-00664-5","url":null,"abstract":"<p><p>Antibodies against Aβ peptide have been recently approved to treat Alzheimer's disease, underscoring the importance of understanding their interactions for developing more potent treatments. Here we investigated the interaction between anti-Aβ antibodies and various peptides using a deep learning model. Our model, ABTrans, was trained on dodecapeptide sequences from phage display experiments and known anti-Aβ antibody sequences sourced from public sources. It classified the binding ability between anti-Aβ antibodies and dodecapeptides into four levels: not binding, weak binding, medium binding, and strong binding, achieving an accuracy of 0.83. Using ABTrans, we examined the cross-reaction of anti-Aβ antibodies with other human amyloidogenic proteins, revealing that Aducanumab and Donanemab exhibited the least cross-reactivity. Additionally, we systematically screened interactions between eleven selected anti-Aβ antibodies and all human proteins to identify potential off-target candidates.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
CRISPR/Cas base editors offer precise conversion of single nucleotides without inducing double-strand breaks. This technology finds extensive applications in gene therapy, gene function analysis, and other domains. However, a crucial challenge lies in selecting the appropriate guide RNAs (gRNAs) for base editing. Although various gRNAs design tools exist, creating a simplified base-editing library with diverse protospacer adjacent motifs (PAM) sequences for gRNAs screening remains a challenge. We present a user-friendly web tool, BES-Designer ( https://bes-designer.aielab.net ), for gRNAs design based on base editors, aimed at streamlining the creation of a base-editing library. BES-Designer incorporates our proposed rules for target sequence simplification, helping researchers narrow down the scope of biological experiments in the lab. It allows users to design target sequences with various PAMs and editing types simultaneously, and prioritize them in the simplified base-editing library. This tool has been experimentally proven to achieve a 30% simplification efficiency on the base-editing-library.
{"title":"BES-Designer: A Web Tool to Design Guide RNAs for Base Editing to Simplify Library.","authors":"Qian Zhou, Qian Gao, Yujia Gao, Youhua Zhang, Yanjun Chen, Min Li, Pengcheng Wei, Zhenyu Yue","doi":"10.1007/s12539-024-00663-6","DOIUrl":"https://doi.org/10.1007/s12539-024-00663-6","url":null,"abstract":"<p><p>CRISPR/Cas base editors offer precise conversion of single nucleotides without inducing double-strand breaks. This technology finds extensive applications in gene therapy, gene function analysis, and other domains. However, a crucial challenge lies in selecting the appropriate guide RNAs (gRNAs) for base editing. Although various gRNAs design tools exist, creating a simplified base-editing library with diverse protospacer adjacent motifs (PAM) sequences for gRNAs screening remains a challenge. We present a user-friendly web tool, BES-Designer ( https://bes-designer.aielab.net ), for gRNAs design based on base editors, aimed at streamlining the creation of a base-editing library. BES-Designer incorporates our proposed rules for target sequence simplification, helping researchers narrow down the scope of biological experiments in the lab. It allows users to design target sequences with various PAMs and editing types simultaneously, and prioritize them in the simplified base-editing library. This tool has been experimentally proven to achieve a 30% simplification efficiency on the base-editing-library.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-23DOI: 10.1007/s12539-024-00653-8
Qianqian Song, Taobo Hu, Baosheng Liang, Shihai Li, Yang Li, Jinbo Wu, Shu Wang, Xiaohua Zhou
The development of third-generation sequencing has accelerated the boom of single nucleotide polymorphism (SNP) calling methods, but evaluating accuracy remains challenging owing to the absence of the SNP gold standard. The definitions for without-gold-standard and performance metrics and their estimation are urgently needed. Additionally, the possible correlations between different SNP loci should also be further explored. To address these challenges, we first introduced the concept of a gold standard and imperfect gold standard under the consistency framework and gave the corresponding definitions of sensitivity and specificity. A latent class model (LCM) was established to estimate the sensitivity and specificity of callers. Furthermore, we incorporated different dependency structures into LCM to investigate their impact on sensitivity and specificity. The performance of LCM was illustrated by comparing the accuracy of BCFtools, DeepVariant, FreeBayes, and GATK on various datasets. Through estimations across multiple datasets, the results indicate that LCM is well-suitable for evaluating callers without the SNP gold standard, and accurate inclusion of the dependency between variations is crucial for better performance ranking. DeepVariant has a higher sum of sensitivity and specificity than other callers, followed by GATK and BCFtools. FreeBayes has low sensitivity but high specificity. Notably, appropriate sequencing coverage is another important factor for precise callers' evaluation. Most importantly, a web interface for assessing and comparing different callers was developed to simplify the evaluation process.
第三代测序技术的发展加速了单核苷酸多态性(SNP)调用方法的蓬勃发展,但由于 SNP 金标准的缺失,评估其准确性仍具有挑战性。目前急需对无金标准和性能指标进行定义和估算。此外,还应进一步探讨不同 SNP 位点之间可能存在的相关性。为了应对这些挑战,我们首先介绍了一致性框架下金标准和不完全金标准的概念,并给出了灵敏度和特异性的相应定义。我们建立了一个潜类模型(LCM)来估算调用者的灵敏度和特异度。此外,我们还在 LCM 中加入了不同的依赖结构,以研究它们对灵敏度和特异性的影响。通过比较 BCFtools、DeepVariant、FreeBayes 和 GATK 在不同数据集上的准确性,说明了 LCM 的性能。通过对多个数据集的估算,结果表明 LCM 非常适合在没有 SNP 黄金标准的情况下评估调用者,而准确纳入变异之间的依赖性对于更好的性能排名至关重要。DeepVariant 的灵敏度和特异性之和高于其他调用器,其次是 GATK 和 BCFtools。FreeBayes 的灵敏度较低,但特异性较高。值得注意的是,适当的测序覆盖率是评估精确调用者的另一个重要因素。最重要的是,我们开发了一个用于评估和比较不同调用仪的网络界面,以简化评估过程。
{"title":"cascAGS: Comparative Analysis of SNP Calling Methods for Human Genome Data in the Absence of Gold Standard.","authors":"Qianqian Song, Taobo Hu, Baosheng Liang, Shihai Li, Yang Li, Jinbo Wu, Shu Wang, Xiaohua Zhou","doi":"10.1007/s12539-024-00653-8","DOIUrl":"https://doi.org/10.1007/s12539-024-00653-8","url":null,"abstract":"<p><p>The development of third-generation sequencing has accelerated the boom of single nucleotide polymorphism (SNP) calling methods, but evaluating accuracy remains challenging owing to the absence of the SNP gold standard. The definitions for without-gold-standard and performance metrics and their estimation are urgently needed. Additionally, the possible correlations between different SNP loci should also be further explored. To address these challenges, we first introduced the concept of a gold standard and imperfect gold standard under the consistency framework and gave the corresponding definitions of sensitivity and specificity. A latent class model (LCM) was established to estimate the sensitivity and specificity of callers. Furthermore, we incorporated different dependency structures into LCM to investigate their impact on sensitivity and specificity. The performance of LCM was illustrated by comparing the accuracy of BCFtools, DeepVariant, FreeBayes, and GATK on various datasets. Through estimations across multiple datasets, the results indicate that LCM is well-suitable for evaluating callers without the SNP gold standard, and accurate inclusion of the dependency between variations is crucial for better performance ranking. DeepVariant has a higher sum of sensitivity and specificity than other callers, followed by GATK and BCFtools. FreeBayes has low sensitivity but high specificity. Notably, appropriate sequencing coverage is another important factor for precise callers' evaluation. Most importantly, a web interface for assessing and comparing different callers was developed to simplify the evaluation process.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142499766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-21DOI: 10.1007/s12539-024-00659-2
Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz
k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.
{"title":"Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.","authors":"Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz","doi":"10.1007/s12539-024-00659-2","DOIUrl":"https://doi.org/10.1007/s12539-024-00659-2","url":null,"abstract":"<p><p>k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1007/s12539-024-00657-4
Abicumaran Uthamacumaran
Pediatric glioblastoma is a complex dynamical disease that is difficult to treat due to its multiple adaptive behaviors driven largely by phenotypic plasticity. Integrated data science and network theory pipelines offer novel approaches to studying glioblastoma cell fate dynamics, particularly phenotypic transitions over time. Here we used various single-cell trajectory inference algorithms to infer signaling dynamics regulating pediatric glioblastoma-immune cell networks. We identified GATA2, PTPRZ1, TPT1, MTRNR2L1/2, OLIG1/2, SOX11, FXYD6, SEZ6L, PDGFRA, EGFR, S100B, WNT, TNF , and NF-kB as critical transition genes or signals regulating glioblastoma-immune network dynamics, revealing potential clinically relevant targets. Further, we reconstructed glioblastoma cell fate attractors and found complex bifurcation dynamics within glioblastoma phenotypic transitions, suggesting that a causal pattern may be driving glioblastoma evolution and cell fate decision-making. Together, our findings have implications for developing targeted therapies against glioblastoma, and the continued integration of quantitative approaches and artificial intelligence (AI) to understand pediatric glioblastoma tumor-immune interactions.
{"title":"Cell Fate Dynamics Reconstruction Identifies TPT1 and PTPRZ1 Feedback Loops as Master Regulators of Differentiation in Pediatric Glioblastoma-Immune Cell Networks.","authors":"Abicumaran Uthamacumaran","doi":"10.1007/s12539-024-00657-4","DOIUrl":"https://doi.org/10.1007/s12539-024-00657-4","url":null,"abstract":"<p><p>Pediatric glioblastoma is a complex dynamical disease that is difficult to treat due to its multiple adaptive behaviors driven largely by phenotypic plasticity. Integrated data science and network theory pipelines offer novel approaches to studying glioblastoma cell fate dynamics, particularly phenotypic transitions over time. Here we used various single-cell trajectory inference algorithms to infer signaling dynamics regulating pediatric glioblastoma-immune cell networks. We identified GATA2, PTPRZ1, TPT1, MTRNR2L1/2, OLIG1/2, SOX11, FXYD6, SEZ6L, PDGFRA, EGFR, S100B, WNT, TNF <math><mi>α</mi></math> , and NF-kB as critical transition genes or signals regulating glioblastoma-immune network dynamics, revealing potential clinically relevant targets. Further, we reconstructed glioblastoma cell fate attractors and found complex bifurcation dynamics within glioblastoma phenotypic transitions, suggesting that a causal pattern may be driving glioblastoma evolution and cell fate decision-making. Together, our findings have implications for developing targeted therapies against glioblastoma, and the continued integration of quantitative approaches and artificial intelligence (AI) to understand pediatric glioblastoma tumor-immune interactions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1007/s12539-024-00661-8
Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan
The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.
{"title":"misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy.","authors":"Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan","doi":"10.1007/s12539-024-00661-8","DOIUrl":"https://doi.org/10.1007/s12539-024-00661-8","url":null,"abstract":"<p><p>The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.
{"title":"Plant lncRNA-miRNA Interaction Prediction Based on Counterfactual Heterogeneous Graph Attention Network.","authors":"Yu He, ZiLan Ning, XingHui Zhu, YinQiong Zhang, ChunHai Liu, SiWei Jiang, ZheMing Yuan, HongYan Zhang","doi":"10.1007/s12539-024-00652-9","DOIUrl":"https://doi.org/10.1007/s12539-024-00652-9","url":null,"abstract":"<p><p>Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}