首页 > 最新文献

Interdisciplinary Sciences: Computational Life Sciences最新文献

英文 中文
BES-Designer: A Web Tool to Design Guide RNAs for Base Editing to Simplify Library. BES-Designer:设计用于碱基编辑的引导 RNA 以简化文库的网络工具。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-28 DOI: 10.1007/s12539-024-00663-6
Qian Zhou, Qian Gao, Yujia Gao, Youhua Zhang, Yanjun Chen, Min Li, Pengcheng Wei, Zhenyu Yue

CRISPR/Cas base editors offer precise conversion of single nucleotides without inducing double-strand breaks. This technology finds extensive applications in gene therapy, gene function analysis, and other domains. However, a crucial challenge lies in selecting the appropriate guide RNAs (gRNAs) for base editing. Although various gRNAs design tools exist, creating a simplified base-editing library with diverse protospacer adjacent motifs (PAM) sequences for gRNAs screening remains a challenge. We present a user-friendly web tool, BES-Designer ( https://bes-designer.aielab.net ), for gRNAs design based on base editors, aimed at streamlining the creation of a base-editing library. BES-Designer incorporates our proposed rules for target sequence simplification, helping researchers narrow down the scope of biological experiments in the lab. It allows users to design target sequences with various PAMs and editing types simultaneously, and prioritize them in the simplified base-editing library. This tool has been experimentally proven to achieve a 30% simplification efficiency on the base-editing-library.

CRISPR/Cas 碱基编辑器可精确转换单个核苷酸,而不会导致双链断裂。这项技术在基因治疗、基因功能分析和其他领域有着广泛的应用。然而,为碱基编辑选择合适的引导 RNA(gRNA)是一项关键挑战。虽然存在各种 gRNAs 设计工具,但创建一个具有多种原间隔邻接基序(PAM)的简化碱基编辑库来筛选 gRNAs 仍然是一项挑战。我们提出了一种用户友好型网络工具 BES-Designer ( https://bes-designer.aielab.net ) ,用于基于碱基编辑器设计 gRNAs,旨在简化碱基编辑库的创建过程。BES-Designer 融合了我们提出的目标序列简化规则,帮助研究人员缩小实验室生物实验的范围。它允许用户同时设计具有各种 PAM 和编辑类型的目标序列,并在简化的碱基编辑库中对其进行优先排序。实验证明,该工具的碱基编辑库简化效率高达 30%。
{"title":"BES-Designer: A Web Tool to Design Guide RNAs for Base Editing to Simplify Library.","authors":"Qian Zhou, Qian Gao, Yujia Gao, Youhua Zhang, Yanjun Chen, Min Li, Pengcheng Wei, Zhenyu Yue","doi":"10.1007/s12539-024-00663-6","DOIUrl":"https://doi.org/10.1007/s12539-024-00663-6","url":null,"abstract":"<p><p>CRISPR/Cas base editors offer precise conversion of single nucleotides without inducing double-strand breaks. This technology finds extensive applications in gene therapy, gene function analysis, and other domains. However, a crucial challenge lies in selecting the appropriate guide RNAs (gRNAs) for base editing. Although various gRNAs design tools exist, creating a simplified base-editing library with diverse protospacer adjacent motifs (PAM) sequences for gRNAs screening remains a challenge. We present a user-friendly web tool, BES-Designer ( https://bes-designer.aielab.net ), for gRNAs design based on base editors, aimed at streamlining the creation of a base-editing library. BES-Designer incorporates our proposed rules for target sequence simplification, helping researchers narrow down the scope of biological experiments in the lab. It allows users to design target sequences with various PAMs and editing types simultaneously, and prioritize them in the simplified base-editing library. This tool has been experimentally proven to achieve a 30% simplification efficiency on the base-editing-library.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142521781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
cascAGS: Comparative Analysis of SNP Calling Methods for Human Genome Data in the Absence of Gold Standard. cascAGS:缺乏黄金标准时人类基因组数据 SNP 调用方法的比较分析
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-23 DOI: 10.1007/s12539-024-00653-8
Qianqian Song, Taobo Hu, Baosheng Liang, Shihai Li, Yang Li, Jinbo Wu, Shu Wang, Xiaohua Zhou

The development of third-generation sequencing has accelerated the boom of single nucleotide polymorphism (SNP) calling methods, but evaluating accuracy remains challenging owing to the absence of the SNP gold standard. The definitions for without-gold-standard and performance metrics and their estimation are urgently needed. Additionally, the possible correlations between different SNP loci should also be further explored. To address these challenges, we first introduced the concept of a gold standard and imperfect gold standard under the consistency framework and gave the corresponding definitions of sensitivity and specificity. A latent class model (LCM) was established to estimate the sensitivity and specificity of callers. Furthermore, we incorporated different dependency structures into LCM to investigate their impact on sensitivity and specificity. The performance of LCM was illustrated by comparing the accuracy of BCFtools, DeepVariant, FreeBayes, and GATK on various datasets. Through estimations across multiple datasets, the results indicate that LCM is well-suitable for evaluating callers without the SNP gold standard, and accurate inclusion of the dependency between variations is crucial for better performance ranking. DeepVariant has a higher sum of sensitivity and specificity than other callers, followed by GATK and BCFtools. FreeBayes has low sensitivity but high specificity. Notably, appropriate sequencing coverage is another important factor for precise callers' evaluation. Most importantly, a web interface for assessing and comparing different callers was developed to simplify the evaluation process.

第三代测序技术的发展加速了单核苷酸多态性(SNP)调用方法的蓬勃发展,但由于 SNP 金标准的缺失,评估其准确性仍具有挑战性。目前急需对无金标准和性能指标进行定义和估算。此外,还应进一步探讨不同 SNP 位点之间可能存在的相关性。为了应对这些挑战,我们首先介绍了一致性框架下金标准和不完全金标准的概念,并给出了灵敏度和特异性的相应定义。我们建立了一个潜类模型(LCM)来估算调用者的灵敏度和特异度。此外,我们还在 LCM 中加入了不同的依赖结构,以研究它们对灵敏度和特异性的影响。通过比较 BCFtools、DeepVariant、FreeBayes 和 GATK 在不同数据集上的准确性,说明了 LCM 的性能。通过对多个数据集的估算,结果表明 LCM 非常适合在没有 SNP 黄金标准的情况下评估调用者,而准确纳入变异之间的依赖性对于更好的性能排名至关重要。DeepVariant 的灵敏度和特异性之和高于其他调用器,其次是 GATK 和 BCFtools。FreeBayes 的灵敏度较低,但特异性较高。值得注意的是,适当的测序覆盖率是评估精确调用者的另一个重要因素。最重要的是,我们开发了一个用于评估和比较不同调用仪的网络界面,以简化评估过程。
{"title":"cascAGS: Comparative Analysis of SNP Calling Methods for Human Genome Data in the Absence of Gold Standard.","authors":"Qianqian Song, Taobo Hu, Baosheng Liang, Shihai Li, Yang Li, Jinbo Wu, Shu Wang, Xiaohua Zhou","doi":"10.1007/s12539-024-00653-8","DOIUrl":"https://doi.org/10.1007/s12539-024-00653-8","url":null,"abstract":"<p><p>The development of third-generation sequencing has accelerated the boom of single nucleotide polymorphism (SNP) calling methods, but evaluating accuracy remains challenging owing to the absence of the SNP gold standard. The definitions for without-gold-standard and performance metrics and their estimation are urgently needed. Additionally, the possible correlations between different SNP loci should also be further explored. To address these challenges, we first introduced the concept of a gold standard and imperfect gold standard under the consistency framework and gave the corresponding definitions of sensitivity and specificity. A latent class model (LCM) was established to estimate the sensitivity and specificity of callers. Furthermore, we incorporated different dependency structures into LCM to investigate their impact on sensitivity and specificity. The performance of LCM was illustrated by comparing the accuracy of BCFtools, DeepVariant, FreeBayes, and GATK on various datasets. Through estimations across multiple datasets, the results indicate that LCM is well-suitable for evaluating callers without the SNP gold standard, and accurate inclusion of the dependency between variations is crucial for better performance ranking. DeepVariant has a higher sum of sensitivity and specificity than other callers, followed by GATK and BCFtools. FreeBayes has low sensitivity but high specificity. Notably, appropriate sequencing coverage is another important factor for precise callers' evaluation. Most importantly, a web interface for assessing and comparing different callers was developed to simplify the evaluation process.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142499766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method. 基因组数据的高效存储与分析:k-mer 频率映射和图像表示方法
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-21 DOI: 10.1007/s12539-024-00659-2
Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz

k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.

k-mer 频率对于理解 DNA 序列模式和结构至关重要,可应用于主题发现、基因组分类和短文本组装。然而,随着 k-mer 长度的增加,频率表的维度呈指数增长,这给存储带来了挑战。在本研究中,我们提出了一种在不损失信息的情况下压缩 k-mer 数据的新方法,旨在优化存储和分析过程。我们采用混沌博弈表示法(CGR)将 k-聚合体映射到坐标,并利用这些分量生成 k-聚合体的栅格图像。我们根据子串对 CGR 地图进行了分割和标记,每个子串映射到一个子帧,从而创建了一个类似分形的结构。每个基因组序列的整个 k-聚合体频率集被表示为一幅图像,每个像素对应一个特定的 k-聚合体及其出现情况。与纯文本格式相比,这种方法将文件大小缩小了 16 倍,与二进制格式相比缩小了 3 倍。此外,我们还证明了对来自 14 个植物物种的全基因组序列 k-聚合体频率的图像进行无配对相似性分析的可行性。我们的研究结果凸显了这种方法的潜力,它是访问、处理和分析大型生物序列数据集的快速高效工具,可以检索 k-mer 频率和重建图像。
{"title":"Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.","authors":"Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz","doi":"10.1007/s12539-024-00659-2","DOIUrl":"https://doi.org/10.1007/s12539-024-00659-2","url":null,"abstract":"<p><p>k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cell Fate Dynamics Reconstruction Identifies TPT1 and PTPRZ1 Feedback Loops as Master Regulators of Differentiation in Pediatric Glioblastoma-Immune Cell Networks. 细胞命运动力学重构发现 TPT1 和 PTPRZ1 反馈环是小儿胶质母细胞瘤-免疫细胞网络分化的主调控因子
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-17 DOI: 10.1007/s12539-024-00657-4
Abicumaran Uthamacumaran

Pediatric glioblastoma is a complex dynamical disease that is difficult to treat due to its multiple adaptive behaviors driven largely by phenotypic plasticity. Integrated data science and network theory pipelines offer novel approaches to studying glioblastoma cell fate dynamics, particularly phenotypic transitions over time. Here we used various single-cell trajectory inference algorithms to infer signaling dynamics regulating pediatric glioblastoma-immune cell networks. We identified GATA2, PTPRZ1, TPT1, MTRNR2L1/2, OLIG1/2, SOX11, FXYD6, SEZ6L, PDGFRA, EGFR, S100B, WNT, TNF α , and NF-kB as critical transition genes or signals regulating glioblastoma-immune network dynamics, revealing potential clinically relevant targets. Further, we reconstructed glioblastoma cell fate attractors and found complex bifurcation dynamics within glioblastoma phenotypic transitions, suggesting that a causal pattern may be driving glioblastoma evolution and cell fate decision-making. Together, our findings have implications for developing targeted therapies against glioblastoma, and the continued integration of quantitative approaches and artificial intelligence (AI) to understand pediatric glioblastoma tumor-immune interactions.

小儿胶质母细胞瘤是一种复杂的动态疾病,由于其主要由表型可塑性驱动的多种适应行为而难以治疗。综合数据科学和网络理论管道为研究胶质母细胞瘤细胞命运动态,尤其是随时间发生的表型转变提供了新方法。在这里,我们使用各种单细胞轨迹推断算法来推断调节小儿胶质母细胞瘤-免疫细胞网络的信号动态。我们发现 GATA2、PTPRZ1、TPT1、MTRNR2L1/2、OLIG1/2、SOX11、FXYD6、SEZ6L、PDGFRA、EGFR、S100B、WNT、TNF α 和 NF-kB 是调控胶质母细胞瘤-免疫网络动态的关键过渡基因或信号,揭示了潜在的临床相关靶点。此外,我们还重建了胶质母细胞瘤细胞命运吸引子,发现胶质母细胞瘤表型转换过程中存在复杂的分叉动态,这表明可能存在一种因果模式在驱动胶质母细胞瘤的进化和细胞命运决策。我们的研究结果对开发胶质母细胞瘤靶向疗法以及继续整合定量方法和人工智能(AI)以了解小儿胶质母细胞瘤肿瘤-免疫相互作用具有重要意义。
{"title":"Cell Fate Dynamics Reconstruction Identifies TPT1 and PTPRZ1 Feedback Loops as Master Regulators of Differentiation in Pediatric Glioblastoma-Immune Cell Networks.","authors":"Abicumaran Uthamacumaran","doi":"10.1007/s12539-024-00657-4","DOIUrl":"https://doi.org/10.1007/s12539-024-00657-4","url":null,"abstract":"<p><p>Pediatric glioblastoma is a complex dynamical disease that is difficult to treat due to its multiple adaptive behaviors driven largely by phenotypic plasticity. Integrated data science and network theory pipelines offer novel approaches to studying glioblastoma cell fate dynamics, particularly phenotypic transitions over time. Here we used various single-cell trajectory inference algorithms to infer signaling dynamics regulating pediatric glioblastoma-immune cell networks. We identified GATA2, PTPRZ1, TPT1, MTRNR2L1/2, OLIG1/2, SOX11, FXYD6, SEZ6L, PDGFRA, EGFR, S100B, WNT, TNF <math><mi>α</mi></math> , and NF-kB as critical transition genes or signals regulating glioblastoma-immune network dynamics, revealing potential clinically relevant targets. Further, we reconstructed glioblastoma cell fate attractors and found complex bifurcation dynamics within glioblastoma phenotypic transitions, suggesting that a causal pattern may be driving glioblastoma evolution and cell fate decision-making. Together, our findings have implications for developing targeted therapies against glioblastoma, and the continued integration of quantitative approaches and artificial intelligence (AI) to understand pediatric glioblastoma tumor-immune interactions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy. misORFPred:使用增强型可扩展 k-mer 和动态组合投票策略挖掘植物 Pri-miRNA 中可翻译 sORF 的新方法。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-14 DOI: 10.1007/s12539-024-00661-8
Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan

The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.

据观察,初级微小RNA(pri-miRNA)含有可翻译的小开放阅读框(sORF),可作为独立元素编码肽。相关研究证明,sORFs 在调节生物性状表达方面具有重要意义。现有的预测 sORFs 编码潜力的方法经常忽略这些数据,或将其归类为阴性样本,从而阻碍了在 pri-miRNAs 中识别更多可翻译的 sORFs。有鉴于此,我们提出了一种名为 misORFPred 的新方法。具体来说,该方法设计了一种增强型可扩展 k-mer(ESKmer),可同时整合序列内的组成信息和序列间的距离信息,以提取核苷酸序列特征。在特征选择之后,将最优特征和多个机器学习分类器结合起来构建集合模型,其中提出了一种新设计的动态集合投票策略(DEVS),用于动态调整基础分类器的权重,并为每个未标记样本自适应地选择最优基础分类器。交叉验证结果表明,ESKmer 和 DEVS 对该分类任务至关重要,可以提高模型性能。独立测试结果表明,misORFPred 的性能优于最先进的方法。此外,我们还在不同植物物种的基因组上执行了 misORFPerd,并对预测结果进行了全面分析。总之,misORFPred 是识别植物 pri-miRNA 中可翻译 sORFs 的强大工具,可为后续生物学实验提供高度可信的候选者。
{"title":"misORFPred: A Novel Method to Mine Translatable sORFs in Plant Pri-miRNAs Using Enhanced Scalable k-mer and Dynamic Ensemble Voting Strategy.","authors":"Haibin Li, Jun Meng, Zhaowei Wang, Yushi Luan","doi":"10.1007/s12539-024-00661-8","DOIUrl":"https://doi.org/10.1007/s12539-024-00661-8","url":null,"abstract":"<p><p>The primary microRNAs (pri-miRNAs) have been observed to contain translatable small open reading frames (sORFs) that can encode peptides as an independent element. Relevant studies have proven that those of sORFs are of significance in regulating the expression of biological traits. The existing methods for predicting the coding potential of sORFs frequently overlook this data or categorize them as negative samples, impeding the identification of additional translatable sORFs in pri-miRNAs. In light of this, a novel method named misORFPred has been proposed. Specifically, an enhanced scalable k-mer (ESKmer) that simultaneously integrates the composition information within a sequence and distance information between sequences is designed to extract the nucleotide sequence features. After feature selection, the optimal features and several machine learning classifiers are combined to construct the ensemble model, where a newly devised dynamic ensemble voting strategy (DEVS) is proposed to dynamically adjust the weights of base classifiers and adaptively select the optimal base classifiers for each unlabeled sample. Cross-validation results suggest that ESKmer and DEVS are essential for this classification task and could boost model performance. Independent testing results indicate that misORFPred outperforms the state-of-the-art methods. Furthermore, we execute misORFPerd on the genomes of various plant species and perform a thorough analysis of the predicted outcomes. Taken together, misORFPred is a powerful tool for identifying the translatable sORFs in plant pri-miRNAs and can provide highly trusted candidates for subsequent biological experiments.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plant lncRNA-miRNA Interaction Prediction Based on Counterfactual Heterogeneous Graph Attention Network. 基于反事实异质图注意网络的植物 lncRNA-miRNA 相互作用预测
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-09 DOI: 10.1007/s12539-024-00652-9
Yu He, ZiLan Ning, XingHui Zhu, YinQiong Zhang, ChunHai Liu, SiWei Jiang, ZheMing Yuan, HongYan Zhang

Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.

识别长非编码 RNA(lncRNA)和 microRNA(miRNA)之间的相互作用为了解植物生命过程中的调控关系提供了一个新的视角。最近,基于图神经网络(GNNs)的计算方法被广泛用于预测lncRNA-miRNA相互作用(LMIs),弥补了生物实验的不足。然而,图的低语义性和噪声限制了现有基于 GNN 的方法的性能。本文开发了一种新颖的反事实异构图注意网络(Counterfactual Heterogeneous Graph Attention Network,CFHAN),以提高对噪声的鲁棒性和植物 LMIs 的预测能力。首先,我们构建了一个基于真实世界的 lncRNA-miRNA(L-M)异构网络。其次,CFHAN 利用节点级关注、语义级关注和反事实链接来增强节点嵌入学习。最后,这些嵌入作为多层感知器(MLP)的输入,用于预测 lncRNA 与 miRNA 之间的相互作用。在植物 LMIs 基准数据集上评估我们的方法时,CFHAN 优于五种最先进的方法,平均 AUC 和平均 ACC 分别达到 0.9953 和 0.9733。这证明了 CFHAN 预测植物 LMI 的能力,并展现了良好的跨物种预测能力,为 LMI 实验研究提供了宝贵的启示。
{"title":"Plant lncRNA-miRNA Interaction Prediction Based on Counterfactual Heterogeneous Graph Attention Network.","authors":"Yu He, ZiLan Ning, XingHui Zhu, YinQiong Zhang, ChunHai Liu, SiWei Jiang, ZheMing Yuan, HongYan Zhang","doi":"10.1007/s12539-024-00652-9","DOIUrl":"https://doi.org/10.1007/s12539-024-00652-9","url":null,"abstract":"<p><p>Identifying interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) provides a new perspective for understanding regulatory relationships in plant life processes. Recently, computational methods based on graph neural networks (GNNs) have been widely employed to predict lncRNA-miRNA interactions (LMIs), which compensate for the inadequacy of biological experiments. However, the low-semantic and noise of graph limit the performance of existing GNN-based methods. In this paper, we develop a novel Counterfactual Heterogeneous Graph Attention Network (CFHAN) to improve the robustness to against the noise and the prediction of plant LMIs. Firstly, we construct a real-world based lncRNA-miRNA (L-M) heterogeneous network. Secondly, CFHAN utilizes the node-level attention, the semantic-level attention, and the counterfactual links to enhance the node embeddings learning. Finally, these embeddings are used as inputs for Multilayer Perceptron (MLP) to predict the interactions between lncRNAs and miRNAs. Evaluating our method on a benchmark dataset of plant LMIs, CFHAN outperforms five state-of-the-art methods, and achieves an average AUC and average ACC of 0.9953 and 0.9733, respectively. This demonstrates CFHAN's ability to predict plant LMIs and exhibits promising cross-species prediction ability, offering valuable insights for experimental LMI researches.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Molecular Fragment Representation Learning Framework for Drug-Drug Interaction Prediction. 用于药物相互作用预测的分子片段表征学习框架。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-09 DOI: 10.1007/s12539-024-00658-3
Jiaxi He, Yuping Sun, Jie Ling

The concurrent use of multiple drugs may result in drug-drug interactions, increasing the risk of adverse reactions. Hence, it is particularly crucial to propose computational methods for precisely identifying unknown drug-drug interactions, which is of great significance for drug development and health. However, most recent studies have limited the drug-drug interaction prediction task to identifying interactions between substructures, overlooking molecular hierarchical information. Moreover, the extracted substructures in these methods are always restricted to have the same number of atoms as contained in the molecular graph, which does not align with real-world facts. In this study, a molecular fragment representation learning framework for drug-drug interaction prediction is introduced. Initially, a fragment extraction module is designed to acquire a series of molecular fragments. Subsequently, to capture more comprehensive features, molecular hierarchical information is effectively integrated, enabling drug-drug interaction prediction by identifying pairwise interactions between molecular fragments of each drug. Comprehensive evaluations demonstrate that the proposed method achieved state-of-the-art performance in both DrugBank and Twosides datasets, particularly achieving an improved accuracy of over 20% for unseen drugs in both two datasets. Furthermore, case studies and visual analysis confirm that the proposed method can accurately identify crucial substructures influencing the interactions, which are basically consistent with functional group structures in reality. In conclusion, this method not only enhances the performance of drug-drug interaction prediction but also offers high interpretability. Source code is freely available at https://github.com/kennysyp/MFR-DDI .

同时使用多种药物可能会导致药物间相互作用,增加不良反应的风险。因此,提出精确识别未知药物间相互作用的计算方法尤为重要,这对药物开发和健康意义重大。然而,最近的研究大多将药物相互作用预测任务局限于识别亚结构之间的相互作用,忽略了分子层次信息。此外,这些方法提取的子结构总是被限制为与分子图中包含的原子数相同,这与实际情况不符。本研究介绍了一种用于药物相互作用预测的分子片段表征学习框架。首先,设计了一个片段提取模块来获取一系列分子片段。随后,为了获取更全面的特征,有效整合了分子层次信息,通过识别每种药物分子片段之间的配对相互作用,实现药物相互作用预测。综合评估结果表明,所提出的方法在 DrugBank 和 Twosides 数据集中都取得了最先进的性能,尤其是在这两个数据集中,对未见药物的预测准确率提高了 20% 以上。此外,案例研究和可视化分析证实,所提出的方法能准确识别影响相互作用的关键亚结构,这些亚结构与现实中的功能基团结构基本一致。总之,该方法不仅提高了药物相互作用预测的性能,而且具有很高的可解释性。源代码可在 https://github.com/kennysyp/MFR-DDI 免费获取。
{"title":"A Molecular Fragment Representation Learning Framework for Drug-Drug Interaction Prediction.","authors":"Jiaxi He, Yuping Sun, Jie Ling","doi":"10.1007/s12539-024-00658-3","DOIUrl":"https://doi.org/10.1007/s12539-024-00658-3","url":null,"abstract":"<p><p>The concurrent use of multiple drugs may result in drug-drug interactions, increasing the risk of adverse reactions. Hence, it is particularly crucial to propose computational methods for precisely identifying unknown drug-drug interactions, which is of great significance for drug development and health. However, most recent studies have limited the drug-drug interaction prediction task to identifying interactions between substructures, overlooking molecular hierarchical information. Moreover, the extracted substructures in these methods are always restricted to have the same number of atoms as contained in the molecular graph, which does not align with real-world facts. In this study, a molecular fragment representation learning framework for drug-drug interaction prediction is introduced. Initially, a fragment extraction module is designed to acquire a series of molecular fragments. Subsequently, to capture more comprehensive features, molecular hierarchical information is effectively integrated, enabling drug-drug interaction prediction by identifying pairwise interactions between molecular fragments of each drug. Comprehensive evaluations demonstrate that the proposed method achieved state-of-the-art performance in both DrugBank and Twosides datasets, particularly achieving an improved accuracy of over 20% for unseen drugs in both two datasets. Furthermore, case studies and visual analysis confirm that the proposed method can accurately identify crucial substructures influencing the interactions, which are basically consistent with functional group structures in reality. In conclusion, this method not only enhances the performance of drug-drug interaction prediction but also offers high interpretability. Source code is freely available at https://github.com/kennysyp/MFR-DDI .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142390339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network. 基于结构和残基性质的纳米蛋白质结构稳定性人工智能预测--基于平均汇集双图卷积网络
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-10-05 DOI: 10.1007/s12539-024-00662-7
Daixi Li, Yuqi Zhu, Wujie Zhang, Jing Liu, Xiaochen Yang, Zhihong Liu, Dongqing Wei

The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.

蛋白质的结构稳定性是生物技术、制药和酶学等多个领域的一个重要课题。具体来说,了解蛋白质的结构稳定性对于蛋白质设计至关重要。人工设计在追求蛋白质高热力学稳定性和刚性的同时,不可避免地会牺牲与蛋白质灵活性密切相关的生物学功能。蛋白质的热力学稳定性并不总是最理想的,当它们要完美地发挥其生物功能时,热力学稳定性是最高的。要获得稳定的蛋白质结构,往往需要大量的理论和实验筛选。因此,建立一个基于蛋白质稳定性和生物活性之间平衡的稳定性预测模型变得至关重要。为了在更广阔的结构空间内设计出功能更强的蛋白质药物,本研究开发了一种名为 PSSP 的新型蛋白质结构稳定性预测模型。PSSP 是一个平均池化双图卷积网络(GCN)模型,基于纳米蛋白的序列特征和二级结构、距离矩阵、图和残基属性,提供快速预测和判断。该模型在预测纳米蛋白结构稳定性方面表现出卓越的鲁棒性。与以往的人工智能算法相比,结果表明该模型能快速、准确地评估人工设计蛋白质的结构稳定性,为促进蛋白质设计的稳健发展带来了巨大的前景。
{"title":"AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network.","authors":"Daixi Li, Yuqi Zhu, Wujie Zhang, Jing Liu, Xiaochen Yang, Zhihong Liu, Dongqing Wei","doi":"10.1007/s12539-024-00662-7","DOIUrl":"https://doi.org/10.1007/s12539-024-00662-7","url":null,"abstract":"<p><p>The structural stability of proteins is an important topic in various fields such as biotechnology, pharmaceuticals, and enzymology. Specifically, understanding the structural stability of protein is crucial for protein design. Artificial design, while pursuing high thermodynamic stability and rigidity of proteins, inevitably sacrifices biological functions closely related to protein flexibility. The thermodynamic stability of proteins is not always optimal when they are highest to perfectly perform their biological functions. Extensive theoretical and experimental screening is often required to obtain stable protein structures. Thus, it becomes critically important to develop a stability prediction model based on the balance between protein stability and bioactivity. To design protein drugs with better functionality in a broader structural space, a novel protein structural stability predictor called PSSP has been developed in this study. PSSP is a mean pooled dual graph convolutional network (GCN) model based on sequence characteristics and secondary structure, distance matrix, graph, and residue properties of a nanoprotein to provide rapid prediction and judgment. This model exhibits excellent robustness in predicting the structural stability of nanoproteins. Comparing with previous artificial intelligence algorithms, the results indicate this model can provide a rapid and accurate assessment of the structural stability of artificially designed proteins, which shows the great promises for promoting the robust development of protein design.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142377868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products. CSEL-BGC:整合机器学习的生物信息学框架,用于定义未表征抗菌天然产品的生物合成进化图谱。
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-30 DOI: 10.1007/s12539-024-00656-5
Minghui Du, Yuxiang Ren, Yang Zhang, Wenwen Li, Hongtao Yang, Huiying Chu, Yongshan Zhao

The sluggish pace of new antibacterial drug development reflects a vulnerability in the face of the current severe threat posed by bacterial resistance. Microbial natural products (NPs), as a reservoir of immense chemical potential, have emerged as the most promising avenue for the discovery of next generation antibacterial agent. Directly accessing the antibacterial activity of potential products derived from biosynthetic gene clusters (BGCs) would significantly expedite the process. To tackle this issue, we propose a CSEL-BGC framework that integrates machine learning (ML) techniques. This framework involves the development of a novel cascade-stacking ensemble learning (CSEL) model and the establishment of a groundbreaking model evaluation system. Based on this framework, we predict 6,666 BGCs with antibacterial activity from 3,468 complete bacterial genomes and elucidate a biosynthetic evolutionary landscape to reveal their antibacterial potential. This provides crucial insights for interpretating the synthesis and secretion mechanisms of unknown NPs.

新抗菌药物的开发步伐缓慢,这反映出在当前细菌耐药性构成的严重威胁面前的脆弱性。微生物天然产物(NPs)蕴藏着巨大的化学潜力,已成为发现下一代抗菌剂的最有前途的途径。直接获取从生物合成基因簇(BGCs)中提取的潜在产品的抗菌活性将大大加快这一过程。为了解决这个问题,我们提出了一个整合了机器学习(ML)技术的 CSEL-BGC 框架。该框架包括开发一个新颖的级联堆叠集合学习(CSEL)模型和建立一个开创性的模型评估系统。基于这一框架,我们从 3468 个完整的细菌基因组中预测出了 6666 种具有抗菌活性的 BGCs,并阐明了生物合成进化景观,揭示了它们的抗菌潜力。这为解释未知 NPs 的合成和分泌机制提供了至关重要的见解。
{"title":"CSEL-BGC: A Bioinformatics Framework Integrating Machine Learning for Defining the Biosynthetic Evolutionary Landscape of Uncharacterized Antibacterial Natural Products.","authors":"Minghui Du, Yuxiang Ren, Yang Zhang, Wenwen Li, Hongtao Yang, Huiying Chu, Yongshan Zhao","doi":"10.1007/s12539-024-00656-5","DOIUrl":"https://doi.org/10.1007/s12539-024-00656-5","url":null,"abstract":"<p><p>The sluggish pace of new antibacterial drug development reflects a vulnerability in the face of the current severe threat posed by bacterial resistance. Microbial natural products (NPs), as a reservoir of immense chemical potential, have emerged as the most promising avenue for the discovery of next generation antibacterial agent. Directly accessing the antibacterial activity of potential products derived from biosynthetic gene clusters (BGCs) would significantly expedite the process. To tackle this issue, we propose a CSEL-BGC framework that integrates machine learning (ML) techniques. This framework involves the development of a novel cascade-stacking ensemble learning (CSEL) model and the establishment of a groundbreaking model evaluation system. Based on this framework, we predict 6,666 BGCs with antibacterial activity from 3,468 complete bacterial genomes and elucidate a biosynthetic evolutionary landscape to reveal their antibacterial potential. This provides crucial insights for interpretating the synthesis and secretion mechanisms of unknown NPs.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scCrab: A Reference-Guided Cancer Cell Identification Method based on Bayesian Neural Networks. scCrab:基于贝叶斯神经网络的参考引导癌细胞识别方法
IF 3.9 2区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2024-09-30 DOI: 10.1007/s12539-024-00655-6
Heyang Hua, Wenxin Long, Yan Pan, Siyu Li, Jianyu Zhou, Haixin Wang, Shengquan Chen

Cancer is a significant global public health concern, where early detection can greatly enhance curative outcomes. Therefore, the identification of cancer cells holds significant importance as the primary method for cancer diagnosis. The advancement of single-cell RNA sequencing (scRNA-seq) technology has made it possible to address the problem of cancer cell identification at the single-cell level more efficiently with computational methods, as opposed to the time-consuming and less reproducible manual identification methods. However, existing computational methods have shown suboptimal identification performance and a lack of capability to incorporate external reference data as prior information. Here, we propose scCrab, a reference-guided automatic cancer cell identification method, which performs ensemble learning based on a Bayesian neural network (BNN) with multi-head self-attention mechanisms and a linear regression model. Through a series of experiments on various datasets, we systematically validated the superior performance of scCrab in both intra- and inter-dataset predictions. Besides, we demonstrated the robustness of scCrab to dropout rate and sample size, and conducted ablation experiments to investigate the contributions of each component in scCrab. Furthermore, as a dedicated model for cancer cell identification, scCrab effectively captures cancer-related biological significance during the identification process.

癌症是全球关注的重大公共卫生问题,早期发现可大大提高治疗效果。因此,癌细胞的鉴定作为癌症诊断的主要方法具有重要意义。随着单细胞 RNA 测序(scRNA-seq)技术的发展,与耗时且可重复性较低的人工鉴定方法相比,计算方法可以更有效地解决单细胞水平的癌细胞鉴定问题。然而,现有的计算方法都显示出不理想的识别性能,而且缺乏将外部参考数据作为先验信息的能力。在此,我们提出了一种参考指导的自动癌细胞识别方法 scCrab,该方法基于具有多头自我注意机制的贝叶斯神经网络(BNN)和线性回归模型进行集合学习。通过在各种数据集上进行一系列实验,我们系统地验证了 scCrab 在数据集内和数据集间预测方面的卓越性能。此外,我们还证明了 scCrab 对辍学率和样本大小的鲁棒性,并进行了消融实验,以研究 scCrab 中各组成部分的贡献。此外,作为癌细胞识别的专用模型,scCrab 能在识别过程中有效捕捉与癌症相关的生物学意义。
{"title":"scCrab: A Reference-Guided Cancer Cell Identification Method based on Bayesian Neural Networks.","authors":"Heyang Hua, Wenxin Long, Yan Pan, Siyu Li, Jianyu Zhou, Haixin Wang, Shengquan Chen","doi":"10.1007/s12539-024-00655-6","DOIUrl":"https://doi.org/10.1007/s12539-024-00655-6","url":null,"abstract":"<p><p>Cancer is a significant global public health concern, where early detection can greatly enhance curative outcomes. Therefore, the identification of cancer cells holds significant importance as the primary method for cancer diagnosis. The advancement of single-cell RNA sequencing (scRNA-seq) technology has made it possible to address the problem of cancer cell identification at the single-cell level more efficiently with computational methods, as opposed to the time-consuming and less reproducible manual identification methods. However, existing computational methods have shown suboptimal identification performance and a lack of capability to incorporate external reference data as prior information. Here, we propose scCrab, a reference-guided automatic cancer cell identification method, which performs ensemble learning based on a Bayesian neural network (BNN) with multi-head self-attention mechanisms and a linear regression model. Through a series of experiments on various datasets, we systematically validated the superior performance of scCrab in both intra- and inter-dataset predictions. Besides, we demonstrated the robustness of scCrab to dropout rate and sample size, and conducted ablation experiments to investigate the contributions of each component in scCrab. Furthermore, as a dedicated model for cancer cell identification, scCrab effectively captures cancer-related biological significance during the identification process.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142346020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Interdisciplinary Sciences: Computational Life Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1