首页 > 最新文献

BMC Bioinformatics最新文献

英文 中文
AEGAN-Pathifier: a data augmentation method to improve cancer classification for imbalanced gene expression data. agan - pathifier:一种针对不平衡基因表达数据改进癌症分类的数据增强方法。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-27 DOI: 10.1186/s12859-024-06013-z
Qiaosheng Zhang, Yalong Wei, Jie Hou, Hongpeng Li, Zhaoman Zhong

Background: Cancer classification has consistently been a challenging problem, with the main difficulties being high-dimensional data and the collection of patient samples. Concretely, obtaining patient samples is a costly and resource-intensive process, and imbalances often exist between samples. Moreover, expression data is characterized by high dimensionality, small samples and high noise, which could easily lead to struggles such as dimensionality catastrophe and overfitting. Thus, we incorporate prior knowledge from the pathway and combine AutoEncoder and Generative Adversarial Network (GAN) to solve these difficulties.

Results: In this study, we propose an effective and efficient deep learning method, named AEGAN, which combines the capabilities of AutoEncoder and GAN to generate synthetic samples of the minority class in imbalanced gene expression data. The proposed data balancing technique has been demonstrated to be useful for cancer classification and improving the performance of classifier models. Additionally, we integrate prior knowledge from the pathway and employ the pathifier algorithm to calculate pathway scores for each sample. This data augmentation approach, referred to as AEGAN-Pathifier, not only preserves the biological functionality of the data but also possesses dimensional reduction capabilities. Through validation with various classifiers, the experimental results show an improvement in classifier performance.

Conclusion: AEGAN-Pathifier shows improved performance on the imbalanced datasets GSE25066, GSE20194, BRCA and Liver24. Results from various classifiers indicate that AEGAN-Pathifier has good generalization capability.

背景:癌症分类一直是一个具有挑战性的问题,主要困难在于高维数据和患者样本的收集。具体而言,获取患者样本是一个昂贵且资源密集的过程,并且样本之间经常存在不平衡。此外,表达数据具有高维数、小样本和高噪声的特点,容易导致维数突变和过拟合等问题。因此,我们从路径中吸收先验知识,并结合自动编码器和生成对抗网络(GAN)来解决这些困难。结果:在本研究中,我们提出了一种有效且高效的深度学习方法,称为AEGAN,该方法结合了AutoEncoder和GAN的能力来生成不平衡基因表达数据中少数类的合成样本。所提出的数据平衡技术已被证明对癌症分类和提高分类器模型的性能是有用的。此外,我们整合了路径的先验知识,并使用pathifier算法计算每个样本的路径得分。这种数据增强方法,被称为agan - pathifier,不仅保留了数据的生物功能,而且具有降维能力。通过各种分类器的验证,实验结果表明分类器性能有所提高。结论:AEGAN-Pathifier在不平衡数据集GSE25066、GSE20194、BRCA和Liver24上表现出更好的性能。各种分类器的结果表明,AEGAN-Pathifier具有良好的泛化能力。
{"title":"AEGAN-Pathifier: a data augmentation method to improve cancer classification for imbalanced gene expression data.","authors":"Qiaosheng Zhang, Yalong Wei, Jie Hou, Hongpeng Li, Zhaoman Zhong","doi":"10.1186/s12859-024-06013-z","DOIUrl":"10.1186/s12859-024-06013-z","url":null,"abstract":"<p><strong>Background: </strong>Cancer classification has consistently been a challenging problem, with the main difficulties being high-dimensional data and the collection of patient samples. Concretely, obtaining patient samples is a costly and resource-intensive process, and imbalances often exist between samples. Moreover, expression data is characterized by high dimensionality, small samples and high noise, which could easily lead to struggles such as dimensionality catastrophe and overfitting. Thus, we incorporate prior knowledge from the pathway and combine AutoEncoder and Generative Adversarial Network (GAN) to solve these difficulties.</p><p><strong>Results: </strong>In this study, we propose an effective and efficient deep learning method, named AEGAN, which combines the capabilities of AutoEncoder and GAN to generate synthetic samples of the minority class in imbalanced gene expression data. The proposed data balancing technique has been demonstrated to be useful for cancer classification and improving the performance of classifier models. Additionally, we integrate prior knowledge from the pathway and employ the pathifier algorithm to calculate pathway scores for each sample. This data augmentation approach, referred to as AEGAN-Pathifier, not only preserves the biological functionality of the data but also possesses dimensional reduction capabilities. Through validation with various classifiers, the experimental results show an improvement in classifier performance.</p><p><strong>Conclusion: </strong>AEGAN-Pathifier shows improved performance on the imbalanced datasets GSE25066, GSE20194, BRCA and Liver24. Results from various classifiers indicate that AEGAN-Pathifier has good generalization capability.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"392"},"PeriodicalIF":2.9,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11673641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An effective heuristic for developing hybrid feature selection in high dimensional and low sample size datasets. 一种有效的启发式方法,用于高维和低样本数据集的混合特征选择。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-26 DOI: 10.1186/s12859-024-06017-9
Hyunseok Shin, Sejong Oh

Background: High-dimensional datasets with low sample sizes (HDLSS) are pivotal in the fields of biology and bioinformatics. One of core objective of HDLSS is to select most informative features and discarding redundant or irrelevant features. This is particularly crucial in bioinformatics, where accurate feature (gene) selection can lead to breakthroughs in drug development and provide insights into disease diagnostics. Despite its importance, identifying optimal features is still a significant challenge in HDLSS.

Results: To address this challenge, we propose an effective feature selection method that combines gradual permutation filtering with a heuristic tribrid search strategy, specifically tailored for HDLSS contexts. The proposed method considers inter-feature interactions and leverages feature rankings during the search process. In addition, a new performance metric for the HDLSS that evaluates both the number and quality of selected features is suggested. Through the comparison of the benchmark dataset with existing methods, the proposed method reduced the average number of selected features from 37.8 to 5.5 and improved the performance of the prediction model, based on the selected features, from 0.855 to 0.927.

Conclusions: The proposed method effectively selects a small number of important features and achieves high prediction performance.

背景:低样本大小的高维数据集(HDLSS)在生物学和生物信息学领域至关重要。HDLSS的核心目标之一是选择信息量最大的特征,丢弃冗余或不相关的特征。这在生物信息学中尤其重要,在生物信息学中,准确的特征(基因)选择可以导致药物开发的突破,并为疾病诊断提供见解。尽管它很重要,但确定HDLSS的最佳特征仍然是一个重大挑战。结果:为了解决这一挑战,我们提出了一种有效的特征选择方法,该方法将渐进排列过滤与启发式混合搜索策略相结合,专门为HDLSS上下文量身定制。该方法考虑了特征间的相互作用,并在搜索过程中利用特征排序。此外,还提出了一种新的HDLSS性能指标,用于评估所选特征的数量和质量。通过基准数据集与现有方法的比较,该方法将所选特征的平均个数从37.8个减少到5.5个,并将基于所选特征的预测模型的性能从0.855提高到0.927。结论:该方法有效地选择了少量的重要特征,达到了较高的预测效果。
{"title":"An effective heuristic for developing hybrid feature selection in high dimensional and low sample size datasets.","authors":"Hyunseok Shin, Sejong Oh","doi":"10.1186/s12859-024-06017-9","DOIUrl":"10.1186/s12859-024-06017-9","url":null,"abstract":"<p><strong>Background: </strong>High-dimensional datasets with low sample sizes (HDLSS) are pivotal in the fields of biology and bioinformatics. One of core objective of HDLSS is to select most informative features and discarding redundant or irrelevant features. This is particularly crucial in bioinformatics, where accurate feature (gene) selection can lead to breakthroughs in drug development and provide insights into disease diagnostics. Despite its importance, identifying optimal features is still a significant challenge in HDLSS.</p><p><strong>Results: </strong>To address this challenge, we propose an effective feature selection method that combines gradual permutation filtering with a heuristic tribrid search strategy, specifically tailored for HDLSS contexts. The proposed method considers inter-feature interactions and leverages feature rankings during the search process. In addition, a new performance metric for the HDLSS that evaluates both the number and quality of selected features is suggested. Through the comparison of the benchmark dataset with existing methods, the proposed method reduced the average number of selected features from 37.8 to 5.5 and improved the performance of the prediction model, based on the selected features, from 0.855 to 0.927.</p><p><strong>Conclusions: </strong>The proposed method effectively selects a small number of important features and achieves high prediction performance.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"390"},"PeriodicalIF":2.9,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11670382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep learning-based metabolomics data study of prostate cancer. 基于深度学习的前列腺癌代谢组学数据研究。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-26 DOI: 10.1186/s12859-024-06016-w
Liqiang Sun, Xiaojing Fan, Yunwei Zhao, Qi Zhang, Mingyang Jiang

As a heterogeneous disease, prostate cancer (PCa) exhibits diverse clinical and biological features, which pose significant challenges for early diagnosis and treatment. Metabolomics offers promising new approaches for early diagnosis, treatment, and prognosis of PCa. However, metabolomics data are characterized by high dimensionality, noise, variability, and small sample sizes, presenting substantial challenges for classification. Despite the wide range of applications of deep learning methods, the use of deep learning in metabolomics research has not been extensively explored. In this study, we propose a hybrid model, TransConvNet, which combines transformer and convolutional neural networks for the classification of prostate cancer metabolomics data. We introduce a 1D convolution layer for the inputs to the dot-product attention mechanism, enabling the interaction of both local and global information. Additionally, a gating mechanism is incorporated to dynamically adjust the attention weights. The features extracted by multi-head attention are further refined through 1D convolution, and a residual network is introduced to alleviate the gradient vanishing problem in the convolutional layers. We conducted comparative experiments with seven other machine learning algorithms. Through five-fold cross-validation, TransConvNet achieved an accuracy of 81.03% and an AUC of 0.89, significantly outperforming the other algorithms. Additionally, we validated TransConvNet's generalization ability through experiments on the lung cancer dataset, with the results demonstrating its robustness and adaptability to different metabolomics datasets. We also proposed the MI-RF (Mutual Information-based random forest) model, which effectively identified key biomarkers associated with prostate cancer by leveraging comprehensive feature weight coefficients. In contrast, traditional methods identified only a limited number of biomarkers. In summary, these results highlight the potential of TransConvNet and MI-RF in both classification tasks and biomarker discovery, providing valuable insights for the clinical application of prostate cancer diagnosis.

前列腺癌作为一种异质性疾病,具有多种临床和生物学特征,这给早期诊断和治疗带来了重大挑战。代谢组学为前列腺癌的早期诊断、治疗和预后提供了有希望的新方法。然而,代谢组学数据具有高维、噪声、可变性和小样本量的特点,给分类带来了巨大的挑战。尽管深度学习方法的应用范围广泛,但深度学习在代谢组学研究中的应用尚未得到广泛探索。在这项研究中,我们提出了一个混合模型TransConvNet,它结合了变压器和卷积神经网络来分类前列腺癌代谢组学数据。我们为点积注意机制的输入引入了一维卷积层,从而实现了局部和全局信息的交互。此外,还加入了一个门控机制来动态调整注意力权重。将多头注意提取的特征通过一维卷积进一步细化,并引入残差网络来缓解卷积层的梯度消失问题。我们与其他七种机器学习算法进行了对比实验。经过5次交叉验证,TransConvNet的准确率为81.03%,AUC为0.89,明显优于其他算法。此外,我们通过肺癌数据集的实验验证了TransConvNet的泛化能力,结果证明了其对不同代谢组学数据集的鲁棒性和适应性。我们还提出了MI-RF(互信息随机森林)模型,该模型利用综合特征权重系数有效地识别与前列腺癌相关的关键生物标志物。相比之下,传统方法只能识别有限数量的生物标志物。总之,这些结果突出了TransConvNet和MI-RF在分类任务和生物标志物发现方面的潜力,为前列腺癌诊断的临床应用提供了有价值的见解。
{"title":"Deep learning-based metabolomics data study of prostate cancer.","authors":"Liqiang Sun, Xiaojing Fan, Yunwei Zhao, Qi Zhang, Mingyang Jiang","doi":"10.1186/s12859-024-06016-w","DOIUrl":"10.1186/s12859-024-06016-w","url":null,"abstract":"<p><p>As a heterogeneous disease, prostate cancer (PCa) exhibits diverse clinical and biological features, which pose significant challenges for early diagnosis and treatment. Metabolomics offers promising new approaches for early diagnosis, treatment, and prognosis of PCa. However, metabolomics data are characterized by high dimensionality, noise, variability, and small sample sizes, presenting substantial challenges for classification. Despite the wide range of applications of deep learning methods, the use of deep learning in metabolomics research has not been extensively explored. In this study, we propose a hybrid model, TransConvNet, which combines transformer and convolutional neural networks for the classification of prostate cancer metabolomics data. We introduce a 1D convolution layer for the inputs to the dot-product attention mechanism, enabling the interaction of both local and global information. Additionally, a gating mechanism is incorporated to dynamically adjust the attention weights. The features extracted by multi-head attention are further refined through 1D convolution, and a residual network is introduced to alleviate the gradient vanishing problem in the convolutional layers. We conducted comparative experiments with seven other machine learning algorithms. Through five-fold cross-validation, TransConvNet achieved an accuracy of 81.03% and an AUC of 0.89, significantly outperforming the other algorithms. Additionally, we validated TransConvNet's generalization ability through experiments on the lung cancer dataset, with the results demonstrating its robustness and adaptability to different metabolomics datasets. We also proposed the MI-RF (Mutual Information-based random forest) model, which effectively identified key biomarkers associated with prostate cancer by leveraging comprehensive feature weight coefficients. In contrast, traditional methods identified only a limited number of biomarkers. In summary, these results highlight the potential of TransConvNet and MI-RF in both classification tasks and biomarker discovery, providing valuable insights for the clinical application of prostate cancer diagnosis.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"391"},"PeriodicalIF":2.9,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11674358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142891901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing sequence data analysis using convolution neural network for the prediction of CNV bait positions. 利用卷积神经网络优化序列数据分析,预测CNV诱饵位置。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-24 DOI: 10.1186/s12859-024-06006-y
Zoltán Maróti, Peter Juma Ochieng, József Dombi, Miklós Krész, Tibor Kalmár

Background: Accurate prediction of copy number variations (CNVs) from targeted capture next-generation sequencing (NGS) data relies on effective normalization of read coverage profiles. The normalization process is particularly challenging due to hidden systemic biases such as GC bias, which can significantly affect the sensitivity and specificity of CNV detection. In many cases, the kit manifests provide only the genome coordinates of the targeted regions, and the exact bait design of the oligo capture baits is not available. Although the on-target regions significantly overlap with the bait design, a lack of adequate information allows less accurate normalization of the coverage data. In this study, we propose a novel approach that utilizes a 1D convolution neural network (CNN) model to predict the positions of capture baits in complex whole-exome sequencing (WES) kits. By accurately identifying the exact positions of bait coordinates, our model enables precise normalization of GC bias across target regions, thereby allowing better CNV data normalization.

Results: We evaluated the optimal hyperparameters, model architecture, and complexity to predict the likely positions of the oligo capture baits. Our analysis shows that the CNN models outperform the Dense NN for bait predictions. Batch normalization is the most important parameter for the stable training of CNN models. Our results indicate that the spatiality of the data plays an important role in the prediction performance. We have shown that combined input data, including experimental coverage, on-target information, and sequence data, are critical for bait prediction. Furthermore, comparison with the on-target information indicated that the CNN models performed better in predicting bait positions that exhibited a high degree of overlap (>90%) with the true bait positions.

Results: This study highlights the potential of utilizing CNN-based approaches to optimize coverage data analysis and improve copy number data normalization. Subsequent CNV detection based on these predicted coordinates facilitates more accurate measurement of coverage profiles and better normalization for GC bias. As a result, this approach could reduce systemic bias and improve the sensitivity and specificity of CNV detection in genomic studies.

背景:从目标捕获下一代测序(NGS)数据中准确预测拷贝数变化(CNVs)依赖于有效的读覆盖谱归一化。由于GC偏差等潜在的系统性偏差,标准化过程尤其具有挑战性,这些偏差会显著影响CNV检测的灵敏度和特异性。在许多情况下,试剂盒清单只提供了目标区域的基因组坐标,并且寡核苷酸捕获诱饵的确切设计是不可用的。尽管目标区域与诱饵设计有很大的重叠,但缺乏足够的信息使得覆盖数据的归一化不太准确。在这项研究中,我们提出了一种利用1D卷积神经网络(CNN)模型来预测复杂全外显子组测序(WES)试剂盒中捕获诱饵位置的新方法。通过准确地识别诱饵坐标的确切位置,我们的模型可以精确地规范化目标区域的GC偏差,从而实现更好的CNV数据规范化。结果:我们评估了最优超参数、模型架构和复杂性,以预测寡核苷酸捕获诱饵的可能位置。我们的分析表明,CNN模型在诱饵预测方面优于Dense NN。批归一化是CNN模型稳定训练最重要的参数。我们的研究结果表明,数据的空间性对预测性能起着重要作用。我们已经表明,组合输入数据,包括实验覆盖,目标信息和序列数据,对诱饵预测至关重要。此外,与目标信息的比较表明,CNN模型在预测与真实诱饵位置高度重叠(>90%)的诱饵位置方面表现更好。结果:本研究强调了利用基于cnn的方法优化覆盖率数据分析和改善拷贝数数据规范化的潜力。随后基于这些预测坐标的CNV检测有助于更准确地测量覆盖曲线和更好地归一化GC偏差。因此,该方法可以减少系统偏差,提高基因组研究中CNV检测的敏感性和特异性。
{"title":"Optimizing sequence data analysis using convolution neural network for the prediction of CNV bait positions.","authors":"Zoltán Maróti, Peter Juma Ochieng, József Dombi, Miklós Krész, Tibor Kalmár","doi":"10.1186/s12859-024-06006-y","DOIUrl":"10.1186/s12859-024-06006-y","url":null,"abstract":"<p><strong>Background: </strong>Accurate prediction of copy number variations (CNVs) from targeted capture next-generation sequencing (NGS) data relies on effective normalization of read coverage profiles. The normalization process is particularly challenging due to hidden systemic biases such as GC bias, which can significantly affect the sensitivity and specificity of CNV detection. In many cases, the kit manifests provide only the genome coordinates of the targeted regions, and the exact bait design of the oligo capture baits is not available. Although the on-target regions significantly overlap with the bait design, a lack of adequate information allows less accurate normalization of the coverage data. In this study, we propose a novel approach that utilizes a 1D convolution neural network (CNN) model to predict the positions of capture baits in complex whole-exome sequencing (WES) kits. By accurately identifying the exact positions of bait coordinates, our model enables precise normalization of GC bias across target regions, thereby allowing better CNV data normalization.</p><p><strong>Results: </strong>We evaluated the optimal hyperparameters, model architecture, and complexity to predict the likely positions of the oligo capture baits. Our analysis shows that the CNN models outperform the Dense NN for bait predictions. Batch normalization is the most important parameter for the stable training of CNN models. Our results indicate that the spatiality of the data plays an important role in the prediction performance. We have shown that combined input data, including experimental coverage, on-target information, and sequence data, are critical for bait prediction. Furthermore, comparison with the on-target information indicated that the CNN models performed better in predicting bait positions that exhibited a high degree of overlap (>90%) with the true bait positions.</p><p><strong>Results: </strong>This study highlights the potential of utilizing CNN-based approaches to optimize coverage data analysis and improve copy number data normalization. Subsequent CNV detection based on these predicted coordinates facilitates more accurate measurement of coverage profiles and better normalization for GC bias. As a result, this approach could reduce systemic bias and improve the sensitivity and specificity of CNV detection in genomic studies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"389"},"PeriodicalIF":2.9,"publicationDate":"2024-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142885167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scEGOT: single-cell trajectory inference framework based on entropic Gaussian mixture optimal transport. 基于熵高斯混合最优输运的单细胞轨迹推理框架。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-23 DOI: 10.1186/s12859-024-05988-z
Toshiaki Yachimura, Hanbo Wang, Yusuke Imoto, Momoko Yoshida, Sohei Tasaki, Yoji Kojima, Yukihiro Yabuta, Mitinori Saitou, Yasuaki Hiraoka

Background: Time-series scRNA-seq data have opened a door to elucidate cell differentiation, and in this context, the optimal transport theory has been attracting much attention. However, there remain critical issues in interpretability and computational cost.

Results: We present scEGOT, a comprehensive framework for single-cell trajectory inference, as a generative model with high interpretability and low computational cost. Applied to the human primordial germ cell-like cell (PGCLC) induction system, scEGOT identified the PGCLC progenitor population and bifurcation time of segregation. Our analysis shows TFAP2A is insufficient for identifying PGCLC progenitors, requiring NKX1-2. Additionally, MESP1 and GATA6 are also crucial for PGCLC/somatic cell segregation.

Conclusions: These findings shed light on the mechanism that segregates PGCLC from somatic lineages. Notably, not limited to scRNA-seq, scEGOT's versatility can extend to general single-cell data like scATAC-seq, and hence has the potential to revolutionize our understanding of such datasets and, thereby also, developmental biology.

背景:时间序列scRNA-seq数据为阐明细胞分化打开了一扇大门,在此背景下,最优转运理论备受关注。然而,在可解释性和计算成本方面仍然存在关键问题。结果:我们提出了一个单细胞轨迹推理的综合框架scEGOT,作为一个具有高可解释性和低计算成本的生成模型。应用于人原始生殖细胞样细胞(PGCLC)诱导系统,scEGOT对PGCLC祖细胞群体和分离分叉时间进行了鉴定。我们的分析表明,TFAP2A不足以识别PGCLC祖细胞,需要NKX1-2。此外,MESP1和GATA6对PGCLC/体细胞分离也至关重要。结论:这些发现揭示了PGCLC与体细胞谱系分离的机制。值得注意的是,scEGOT的多功能性不仅局限于scRNA-seq,还可以扩展到scATAC-seq等一般单细胞数据,因此有可能彻底改变我们对这些数据集的理解,从而也改变了发育生物学的理解。
{"title":"scEGOT: single-cell trajectory inference framework based on entropic Gaussian mixture optimal transport.","authors":"Toshiaki Yachimura, Hanbo Wang, Yusuke Imoto, Momoko Yoshida, Sohei Tasaki, Yoji Kojima, Yukihiro Yabuta, Mitinori Saitou, Yasuaki Hiraoka","doi":"10.1186/s12859-024-05988-z","DOIUrl":"10.1186/s12859-024-05988-z","url":null,"abstract":"<p><strong>Background: </strong>Time-series scRNA-seq data have opened a door to elucidate cell differentiation, and in this context, the optimal transport theory has been attracting much attention. However, there remain critical issues in interpretability and computational cost.</p><p><strong>Results: </strong>We present scEGOT, a comprehensive framework for single-cell trajectory inference, as a generative model with high interpretability and low computational cost. Applied to the human primordial germ cell-like cell (PGCLC) induction system, scEGOT identified the PGCLC progenitor population and bifurcation time of segregation. Our analysis shows TFAP2A is insufficient for identifying PGCLC progenitors, requiring NKX1-2. Additionally, MESP1 and GATA6 are also crucial for PGCLC/somatic cell segregation.</p><p><strong>Conclusions: </strong>These findings shed light on the mechanism that segregates PGCLC from somatic lineages. Notably, not limited to scRNA-seq, scEGOT's versatility can extend to general single-cell data like scATAC-seq, and hence has the potential to revolutionize our understanding of such datasets and, thereby also, developmental biology.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"388"},"PeriodicalIF":2.9,"publicationDate":"2024-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11665215/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MISDP: multi-task fusion visit interval for sequential diagnosis prediction. MISDP:多任务融合访问间隔序列诊断预测。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-20 DOI: 10.1186/s12859-024-05998-x
Shengrong Zhu, Ruijia Yang, Zifeng Pan, Xuan Tian, Hong Ji

Backgrounds: Diagnostic prediction is a central application that spans various medical specialties and scenarios, sequential diagnosis prediction is the process of predicting future diagnoses based on patients' historical visits. Prior research has underexplored the impact of irregular intervals between patient visits on predictive models, despite its significance.

Method: We developed the Multi-task Fusion Visit Interval for Sequential Diagnosis Prediction (MISDP) framework to address this research gap. The MISDP framework integrated sequential diagnosis prediction with visit interval prediction within a multi-task learning paradigm. It uses positional encoding and interval encoding to handle irregular patient visit intervals. Furthermore, it incorporates historical attention residue to enhance the multi-head self-attention mechanism, focusing on extracting long-term dependencies from clinical historical visits.

Results: The MISDP model exhibited superior performance across real-world healthcare dataset, irrespective of the training data scarcity or abundance. With only 20% training data, MISDP achieved a 4. 2% improvement over KAME; when training data ranged from 60 to 80%, MISDP surpassed SETOR, the top baseline, by 0. 8% in accuracy, underscoring its robustness and efficacy in sequential diagnosis prediction task.

Conclusions: The MISDP model significantly improves the accuracy of Sequential Diagnosis Prediction. The result highlights the advantage of multi-task learning in synergistically enhancing the performance of individual sub-task. Notably, irregular visit interval factors and historical attention residue has been particularly instrumental in refining the precision of sequential diagnosis prediction, suggesting a promising avenue for advancing clinical decision-making through data-driven modeling approaches.

背景:诊断预测是一种跨越各种医学专业和场景的核心应用,顺序诊断预测是基于患者历史就诊预测未来诊断的过程。先前的研究没有充分探讨患者就诊之间的不规则间隔对预测模型的影响,尽管它很重要。方法:我们开发了多任务融合访问间隔序列诊断预测(MISDP)框架来解决这一研究空白。MISDP框架在多任务学习范式中集成了顺序诊断预测和访问间隔预测。它采用位置编码和间隔编码来处理不规则的患者就诊间隔。此外,纳入历史注意残留,增强多头自注意机制,重点从临床历史就诊中提取长期依赖关系。结果:MISDP模型在真实世界的医疗数据集中表现出优异的性能,无论训练数据稀缺或丰富。只有20%的训练数据,MISDP达到了4。比KAME提高2%;当训练数据范围在60%到80%之间时,MISDP比最高基线SETOR高出0。准确率达到8%,表明该方法在序列诊断预测任务中的稳健性和有效性。结论:MISDP模型显著提高了序列诊断预测的准确性。结果表明,多任务学习在协同提高单个子任务绩效方面具有优势。值得注意的是,不规律的就诊间隔因素和历史注意力残留在提高顺序诊断预测的精度方面特别有用,这表明通过数据驱动的建模方法推进临床决策是一条有前途的途径。
{"title":"MISDP: multi-task fusion visit interval for sequential diagnosis prediction.","authors":"Shengrong Zhu, Ruijia Yang, Zifeng Pan, Xuan Tian, Hong Ji","doi":"10.1186/s12859-024-05998-x","DOIUrl":"10.1186/s12859-024-05998-x","url":null,"abstract":"<p><strong>Backgrounds: </strong>Diagnostic prediction is a central application that spans various medical specialties and scenarios, sequential diagnosis prediction is the process of predicting future diagnoses based on patients' historical visits. Prior research has underexplored the impact of irregular intervals between patient visits on predictive models, despite its significance.</p><p><strong>Method: </strong>We developed the Multi-task Fusion Visit Interval for Sequential Diagnosis Prediction (MISDP) framework to address this research gap. The MISDP framework integrated sequential diagnosis prediction with visit interval prediction within a multi-task learning paradigm. It uses positional encoding and interval encoding to handle irregular patient visit intervals. Furthermore, it incorporates historical attention residue to enhance the multi-head self-attention mechanism, focusing on extracting long-term dependencies from clinical historical visits.</p><p><strong>Results: </strong>The MISDP model exhibited superior performance across real-world healthcare dataset, irrespective of the training data scarcity or abundance. With only 20% training data, MISDP achieved a 4. 2% improvement over KAME; when training data ranged from 60 to 80%, MISDP surpassed SETOR, the top baseline, by 0. 8% in accuracy, underscoring its robustness and efficacy in sequential diagnosis prediction task.</p><p><strong>Conclusions: </strong>The MISDP model significantly improves the accuracy of Sequential Diagnosis Prediction. The result highlights the advantage of multi-task learning in synergistically enhancing the performance of individual sub-task. Notably, irregular visit interval factors and historical attention residue has been particularly instrumental in refining the precision of sequential diagnosis prediction, suggesting a promising avenue for advancing clinical decision-making through data-driven modeling approaches.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"387"},"PeriodicalIF":2.9,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11662528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142871119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of miRNA-disease associations based on PCA and cascade forest. 基于PCA和级联林的mirna -疾病关联预测。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-19 DOI: 10.1186/s12859-024-05999-w
Chuanlei Zhang, Yubo Li, Yinglun Dong, Wei Chen, Changqing Yu

Background: As a key non-coding RNA molecule, miRNA profoundly affects gene expression regulation and connects to the pathological processes of several kinds of human diseases. However, conventional experimental methods for validating miRNA-disease associations are laborious. Consequently, the development of efficient and reliable computational prediction models is crucial for the identification and validation of these associations.

Results: In this research, we developed the PCACFMDA method to predict the potential associations between miRNAs and diseases. To construct a multidimensional feature matrix, we consider the fusion similarities of miRNA and disease and miRNA-disease pairs. We then use principal component analysis(PCA) to reduce data complexity and extract low-dimensional features. Subsequently, a tuned cascade forest is used to mine the features and output prediction scores deeply. The results of the 5-fold cross-validation using the HMDD v2.0 database indicate that the PCACFMDA algorithm achieved an AUC of 98.56%. Additionally, we perform case studies on breast, esophageal and lung neoplasms. The findings revealed that the top 50 miRNAs most strongly linked to each disease have been validated.

Conclusions: Based on PCA and optimized cascade forests, we propose the PCACFMDA model for predicting undiscovered miRNA-disease associations. The experimental results demonstrate superior prediction performance and commendable stability. Consequently, the PCACFMDA is a potent instrument for in-depth exploration of miRNA-disease associations.

背景:miRNA作为一种关键的非编码RNA分子,深刻影响着基因表达调控,与人类多种疾病的病理过程密切相关。然而,验证mirna与疾病关联的传统实验方法是费力的。因此,开发高效可靠的计算预测模型对于识别和验证这些关联至关重要。结果:在本研究中,我们开发了PCACFMDA方法来预测mirna与疾病之间的潜在关联。为了构建多维特征矩阵,我们考虑了miRNA与疾病和miRNA-疾病对的融合相似性。然后,我们使用主成分分析(PCA)来降低数据复杂性并提取低维特征。随后,使用调优级联森林对特征进行深度挖掘,并输出预测分数。利用HMDD v2.0数据库进行5倍交叉验证的结果表明,PCACFMDA算法的AUC达到了98.56%。此外,我们还对乳腺、食道和肺肿瘤进行病例研究。研究结果显示,与每种疾病最密切相关的前50种mirna已得到验证。结论:基于PCA和优化的级联森林,我们提出了用于预测未发现的mirna -疾病关联的PCACFMDA模型。实验结果表明,该方法具有良好的预测性能和稳定性。因此,PCACFMDA是深入探索mirna与疾病关联的有力工具。
{"title":"Prediction of miRNA-disease associations based on PCA and cascade forest.","authors":"Chuanlei Zhang, Yubo Li, Yinglun Dong, Wei Chen, Changqing Yu","doi":"10.1186/s12859-024-05999-w","DOIUrl":"10.1186/s12859-024-05999-w","url":null,"abstract":"<p><strong>Background: </strong>As a key non-coding RNA molecule, miRNA profoundly affects gene expression regulation and connects to the pathological processes of several kinds of human diseases. However, conventional experimental methods for validating miRNA-disease associations are laborious. Consequently, the development of efficient and reliable computational prediction models is crucial for the identification and validation of these associations.</p><p><strong>Results: </strong>In this research, we developed the PCACFMDA method to predict the potential associations between miRNAs and diseases. To construct a multidimensional feature matrix, we consider the fusion similarities of miRNA and disease and miRNA-disease pairs. We then use principal component analysis(PCA) to reduce data complexity and extract low-dimensional features. Subsequently, a tuned cascade forest is used to mine the features and output prediction scores deeply. The results of the 5-fold cross-validation using the HMDD v2.0 database indicate that the PCACFMDA algorithm achieved an AUC of 98.56%. Additionally, we perform case studies on breast, esophageal and lung neoplasms. The findings revealed that the top 50 miRNAs most strongly linked to each disease have been validated.</p><p><strong>Conclusions: </strong>Based on PCA and optimized cascade forests, we propose the PCACFMDA model for predicting undiscovered miRNA-disease associations. The experimental results demonstrate superior prediction performance and commendable stability. Consequently, the PCACFMDA is a potent instrument for in-depth exploration of miRNA-disease associations.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"386"},"PeriodicalIF":2.9,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11660965/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142862959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spall: accurate and robust unveiling cellular landscapes from spatially resolved transcriptomics data using a decomposition network. Spall:使用分解网络从空间分解转录组学数据中准确而稳健地揭示细胞景观。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-18 DOI: 10.1186/s12859-024-06003-1
Zhongning Jiang, Wei Huang, Raymond H W Lam, Wei Zhang

Recent developments in spatially resolved transcriptomics (SRT) enable the characterization of spatial structures for different tissues. Many decomposition methods have been proposed to depict the cellular distribution within tissues. However, existing computational methods struggle to balance spatial continuity in cell distribution with the preservation of cell-specific characteristics. To address this, we propose Spall, a novel decomposition network that integrates scRNA-seq data with SRT data to accurately infer cell type proportions. Spall introduced the GATv2 module, featuring a flexible dynamic attention mechanism to capture relationships between spots. This improves the identification of cellular distribution patterns in spatial analysis. Additionally, Spall incorporates skip connections to address the loss of cell-specific information, thereby enhancing the prediction capability for rare cell types. Experimental results show that Spall outperforms the state-of-the-art methods in reconstructing cell distribution patterns on multiple datasets. Notably, Spall reveals tumor heterogeneity in human pancreatic ductal adenocarcinoma samples and delineates complex tissue structures, such as the laminar organization of the mouse cerebral cortex and the mouse cerebellum. These findings highlight the ability of Spall to provide reliable low-dimensional embeddings for downstream analyses, offering new opportunities for deciphering tissue structures.

空间分辨转录组学(SRT)的最新发展使不同组织的空间结构表征成为可能。已经提出了许多分解方法来描述组织内的细胞分布。然而,现有的计算方法难以平衡细胞分布的空间连续性和保存细胞特异性特征。为了解决这个问题,我们提出了一种新的分解网络Spall,它将scRNA-seq数据与SRT数据集成在一起,以准确推断细胞类型比例。Spall引入了GATv2模块,具有灵活的动态注意机制来捕捉点之间的关系。这提高了空间分析中细胞分布模式的识别。此外,Spall结合了跳过连接来解决细胞特异性信息的丢失,从而增强了对罕见细胞类型的预测能力。实验结果表明,在多数据集上重建细胞分布模式方面,Spall优于最先进的方法。值得注意的是,Spall揭示了人类胰腺导管腺癌样本的肿瘤异质性,并描绘了复杂的组织结构,如小鼠大脑皮层和小鼠小脑的层状组织。这些发现突出了Spall为下游分析提供可靠的低维嵌入的能力,为破译组织结构提供了新的机会。
{"title":"Spall: accurate and robust unveiling cellular landscapes from spatially resolved transcriptomics data using a decomposition network.","authors":"Zhongning Jiang, Wei Huang, Raymond H W Lam, Wei Zhang","doi":"10.1186/s12859-024-06003-1","DOIUrl":"10.1186/s12859-024-06003-1","url":null,"abstract":"<p><p>Recent developments in spatially resolved transcriptomics (SRT) enable the characterization of spatial structures for different tissues. Many decomposition methods have been proposed to depict the cellular distribution within tissues. However, existing computational methods struggle to balance spatial continuity in cell distribution with the preservation of cell-specific characteristics. To address this, we propose Spall, a novel decomposition network that integrates scRNA-seq data with SRT data to accurately infer cell type proportions. Spall introduced the GATv2 module, featuring a flexible dynamic attention mechanism to capture relationships between spots. This improves the identification of cellular distribution patterns in spatial analysis. Additionally, Spall incorporates skip connections to address the loss of cell-specific information, thereby enhancing the prediction capability for rare cell types. Experimental results show that Spall outperforms the state-of-the-art methods in reconstructing cell distribution patterns on multiple datasets. Notably, Spall reveals tumor heterogeneity in human pancreatic ductal adenocarcinoma samples and delineates complex tissue structures, such as the laminar organization of the mouse cerebral cortex and the mouse cerebellum. These findings highlight the ability of Spall to provide reliable low-dimensional embeddings for downstream analyses, offering new opportunities for deciphering tissue structures.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"379"},"PeriodicalIF":2.9,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656923/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142852300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepMiRBP: a hybrid model for predicting microRNA-protein interactions based on transfer learning and cosine similarity. DeepMiRBP:基于迁移学习和余弦相似性预测microrna -蛋白质相互作用的混合模型。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-18 DOI: 10.1186/s12859-024-05985-2
Sasan Azizian, Juan Cui

Background: Interactions between microRNAs and RNA-binding proteins are crucial for microRNA-mediated gene regulation and sorting. Despite their significance, the molecular mechanisms governing these interactions remain underexplored, apart from sequence motifs identified on microRNAs. To date, only a limited number of microRNA-binding proteins have been confirmed, typically through labor-intensive experimental procedures. Advanced bioinformatics tools are urgently needed to facilitate this research.

Methods: We present DeepMiRBP, a novel hybrid deep learning model specifically designed to predict microRNA-binding proteins by modeling molecular interactions. This innovation approach is the first to target the direct interactions between small RNAs and proteins. DeepMiRBP consists of two main components. The first component employs bidirectional long short-term memory (Bi-LSTM) neural networks to capture sequential dependencies and context within RNA sequences, attention mechanisms to enhance the model's focus on the most relevant features and transfer learning to apply knowledge gained from a large dataset of RNA-protein binding sites to the specific task of predicting microRNA-protein interactions. Cosine similarity is applied to assess RNA similarities. The second component utilizes Convolutional Neural Networks (CNNs) to process the spatial data inherent in protein structures based on Position-Specific Scoring Matrices (PSSM) and contact maps to generate detailed and accurate representations of potential microRNA-binding sites and assess protein similarities.

Results: DeepMiRBP achieved a prediction accuracy of 87.4% during training and 85.4% using testing, with an F score of 0.860. Additionally, we validated our method using three case studies, focusing on microRNAs such as miR-451, -19b, -23a, -21, -223, and -let-7d. DeepMiRBP successfully predicted known miRNA interactions with recently discovered RNA-binding proteins, including AGO, YBX1, and FXR2, identified in various exosomes.

Conclusions: Our proposed DeepMiRBP strategy represents the first of its kind designed for microRNA-protein interaction prediction. Its promising performance underscores the model's potential to uncover novel interactions critical for small RNA sorting and packaging, as well as to infer new RNA transporter proteins. The methodologies and insights from DeepMiRBP offer a scalable template for future small RNA research, from mechanistic discovery to modeling disease-related cell-to-cell communication, emphasizing its adaptability and potential for developing novel small RNA-centric therapeutic interventions and personalized medicine.

背景:microrna与rna结合蛋白之间的相互作用对于microrna介导的基因调控和分选至关重要。尽管它们具有重要意义,但除了在microrna上确定的序列基序外,控制这些相互作用的分子机制仍未得到充分探索。迄今为止,只有有限数量的microrna结合蛋白被证实,通常是通过劳动密集型的实验程序。迫切需要先进的生物信息学工具来促进这一研究。方法:我们提出了DeepMiRBP,这是一种新的混合深度学习模型,专门用于通过模拟分子相互作用来预测microrna结合蛋白。这种创新方法是第一个针对小rna和蛋白质之间直接相互作用的方法。DeepMiRBP主要由两个部分组成。第一个组件采用双向长短期记忆(Bi-LSTM)神经网络来捕获RNA序列中的序列依赖性和上下文,注意机制来增强模型对最相关特征的关注,以及迁移学习,将从RNA-蛋白质结合位点的大型数据集中获得的知识应用于预测microrna -蛋白质相互作用的特定任务。余弦相似度用于评估RNA相似度。第二个组件利用卷积神经网络(cnn)处理基于位置特异性评分矩阵(PSSM)和接触图的蛋白质结构中固有的空间数据,以生成潜在microrna结合位点的详细和准确表示并评估蛋白质相似性。结果:DeepMiRBP训练预测准确率为87.4%,测试预测准确率为85.4%,F值为0.860。此外,我们通过三个案例研究验证了我们的方法,重点关注miR-451、-19b、-23a、-21、-223和-let-7d等microrna。DeepMiRBP成功预测了已知的miRNA与最近发现的rna结合蛋白的相互作用,包括AGO、YBX1和FXR2,这些蛋白在各种外泌体中被鉴定出来。结论:我们提出的DeepMiRBP策略是第一个用于预测microrna -蛋白相互作用的策略。其有希望的性能强调了该模型的潜力,揭示新的相互作用对小RNA的分类和包装至关重要,以及推断新的RNA转运蛋白。DeepMiRBP的方法和见解为未来的小RNA研究提供了一个可扩展的模板,从机制发现到疾病相关的细胞间通讯建模,强调其适应性和开发新型小RNA为中心的治疗干预和个性化医疗的潜力。
{"title":"DeepMiRBP: a hybrid model for predicting microRNA-protein interactions based on transfer learning and cosine similarity.","authors":"Sasan Azizian, Juan Cui","doi":"10.1186/s12859-024-05985-2","DOIUrl":"10.1186/s12859-024-05985-2","url":null,"abstract":"<p><strong>Background: </strong>Interactions between microRNAs and RNA-binding proteins are crucial for microRNA-mediated gene regulation and sorting. Despite their significance, the molecular mechanisms governing these interactions remain underexplored, apart from sequence motifs identified on microRNAs. To date, only a limited number of microRNA-binding proteins have been confirmed, typically through labor-intensive experimental procedures. Advanced bioinformatics tools are urgently needed to facilitate this research.</p><p><strong>Methods: </strong>We present DeepMiRBP, a novel hybrid deep learning model specifically designed to predict microRNA-binding proteins by modeling molecular interactions. This innovation approach is the first to target the direct interactions between small RNAs and proteins. DeepMiRBP consists of two main components. The first component employs bidirectional long short-term memory (Bi-LSTM) neural networks to capture sequential dependencies and context within RNA sequences, attention mechanisms to enhance the model's focus on the most relevant features and transfer learning to apply knowledge gained from a large dataset of RNA-protein binding sites to the specific task of predicting microRNA-protein interactions. Cosine similarity is applied to assess RNA similarities. The second component utilizes Convolutional Neural Networks (CNNs) to process the spatial data inherent in protein structures based on Position-Specific Scoring Matrices (PSSM) and contact maps to generate detailed and accurate representations of potential microRNA-binding sites and assess protein similarities.</p><p><strong>Results: </strong>DeepMiRBP achieved a prediction accuracy of 87.4% during training and 85.4% using testing, with an F score of 0.860. Additionally, we validated our method using three case studies, focusing on microRNAs such as miR-451, -19b, -23a, -21, -223, and -let-7d. DeepMiRBP successfully predicted known miRNA interactions with recently discovered RNA-binding proteins, including AGO, YBX1, and FXR2, identified in various exosomes.</p><p><strong>Conclusions: </strong>Our proposed DeepMiRBP strategy represents the first of its kind designed for microRNA-protein interaction prediction. Its promising performance underscores the model's potential to uncover novel interactions critical for small RNA sorting and packaging, as well as to infer new RNA transporter proteins. The methodologies and insights from DeepMiRBP offer a scalable template for future small RNA research, from mechanistic discovery to modeling disease-related cell-to-cell communication, emphasizing its adaptability and potential for developing novel small RNA-centric therapeutic interventions and personalized medicine.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"381"},"PeriodicalIF":2.9,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11656930/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142852193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Loop-mediated isothermal amplification assays for the detection of antimicrobial resistance elements in Vibrio cholera. 环介导等温扩增法检测霍乱弧菌抗微生物药物耐药因子。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-12-18 DOI: 10.1186/s12859-024-06001-3
Daniel Antonio Negrón, Shipra Trivedi, Nicholas Tolli, David Ashford, Gabrielle Melton, Stephanie Guertin, Katharine Jennings, Bryan D Necciai, Shanmuga Sozhamannan, Bradley W Abramson

Background: The bacterium Vibrio cholerae causes diarrheal illness and can acquire genetic material leading to multiple drug resistance (MDR). Rapid detection of resistance-conferring mobile genetic elements helps avoid the prescription of ineffective antibiotics for specific strains. Colorimetric loop-mediated isothermal amplification (LAMP) assays provide a rapid and cost-effective means for detection at point-of-care since they do not require specialized equipment, require limited expertise to perform, and can take less than 30 min to perform in resource limited regions. LAMP output is a color change that can be viewed by eye, but it can be difficult to design primer sets, determine target specificity, and interpret subjective color changes.

Methods: We developed an algorithm for the in silico design and evaluation of LAMP assays within the open-source PCR Signature Erosion Tool (PSET) and a computer vision application for the quantitative analysis of colorimetric outputs. First, Primer3 calculates LAMP primer sequence candidates with settings based on GC-content optimization. Next, PSET aligns the primer sequences of each assay against large sequence databases to calculate sufficient sequence similarity, coverage, and primer arrangement to the intended taxa, ultimately generating a confusion matrix. Finally, we tested assay candidates in the laboratory against synthetic constructs.

Results: As an example, we generated new LAMP assays targeting drug resistance in V. cholerae and evaluated existing ones from the literature based on in silico target specificity and in vitro testing. Improvements in the design and testing of LAMP assays, with heightened target specificity and a simple analysis platform, increase utility for in-field applications. Overall, 9 of the 16 tested LAMP assays had positive signal through visual and computer vision-based detection methods developed here. Here we show LAMP assays tested on synthetic AMR gene targets for aph(6), varG, floR, qnrVC5, and almG, which allow for resistance to aminoglycosides, penicillins, carbapenems, phenicols, fluoroquinolones, and polymyxins respectively.

背景:霍乱弧菌引起腹泻疾病,并可获得导致多重耐药(MDR)的遗传物质。快速检测具有耐药性的移动遗传因子有助于避免为特定菌株开出无效的抗生素处方。比色环介导等温扩增(LAMP)检测提供了一种快速且具有成本效益的检测方法,因为它们不需要专门的设备,需要有限的专业知识来执行,并且在资源有限的地区可以在不到30分钟的时间内完成。LAMP输出是一种可以通过眼睛看到的颜色变化,但很难设计引物组,确定目标特异性,并解释主观颜色变化。方法:我们在开源PCR特征侵蚀工具(PSET)中开发了一种用于LAMP检测的芯片设计和评估算法,并开发了一种用于比色输出定量分析的计算机视觉应用程序。首先,Primer3根据gc含量优化的设置计算LAMP引物序列候选序列。接下来,PSET将每个试验的引物序列与大型序列数据库进行比对,以计算序列相似性、覆盖率和引物与预期分类群的排列,最终生成混淆矩阵。最后,我们在实验室中对合成构建物进行了测试。结果:以霍乱弧菌耐药性为例,我们建立了新的LAMP检测方法,并基于硅靶特异性和体外测试对文献中已有的LAMP检测方法进行了评价。LAMP测定法的设计和测试的改进,提高了目标特异性和简单的分析平台,增加了现场应用的实用性。总体而言,16项LAMP检测中有9项通过基于视觉和计算机视觉的检测方法获得阳性信号。在这里,我们展示了在合成AMR基因靶标上进行的LAMP检测,这些靶标分别对氨基糖苷类、青霉素类、碳青霉烯类、酚类、氟喹诺酮类和多粘菌素具有耐药性。
{"title":"Loop-mediated isothermal amplification assays for the detection of antimicrobial resistance elements in Vibrio cholera.","authors":"Daniel Antonio Negrón, Shipra Trivedi, Nicholas Tolli, David Ashford, Gabrielle Melton, Stephanie Guertin, Katharine Jennings, Bryan D Necciai, Shanmuga Sozhamannan, Bradley W Abramson","doi":"10.1186/s12859-024-06001-3","DOIUrl":"10.1186/s12859-024-06001-3","url":null,"abstract":"<p><strong>Background: </strong>The bacterium Vibrio cholerae causes diarrheal illness and can acquire genetic material leading to multiple drug resistance (MDR). Rapid detection of resistance-conferring mobile genetic elements helps avoid the prescription of ineffective antibiotics for specific strains. Colorimetric loop-mediated isothermal amplification (LAMP) assays provide a rapid and cost-effective means for detection at point-of-care since they do not require specialized equipment, require limited expertise to perform, and can take less than 30 min to perform in resource limited regions. LAMP output is a color change that can be viewed by eye, but it can be difficult to design primer sets, determine target specificity, and interpret subjective color changes.</p><p><strong>Methods: </strong>We developed an algorithm for the in silico design and evaluation of LAMP assays within the open-source PCR Signature Erosion Tool (PSET) and a computer vision application for the quantitative analysis of colorimetric outputs. First, Primer3 calculates LAMP primer sequence candidates with settings based on GC-content optimization. Next, PSET aligns the primer sequences of each assay against large sequence databases to calculate sufficient sequence similarity, coverage, and primer arrangement to the intended taxa, ultimately generating a confusion matrix. Finally, we tested assay candidates in the laboratory against synthetic constructs.</p><p><strong>Results: </strong>As an example, we generated new LAMP assays targeting drug resistance in V. cholerae and evaluated existing ones from the literature based on in silico target specificity and in vitro testing. Improvements in the design and testing of LAMP assays, with heightened target specificity and a simple analysis platform, increase utility for in-field applications. Overall, 9 of the 16 tested LAMP assays had positive signal through visual and computer vision-based detection methods developed here. Here we show LAMP assays tested on synthetic AMR gene targets for aph(6), varG, floR, qnrVC5, and almG, which allow for resistance to aminoglycosides, penicillins, carbapenems, phenicols, fluoroquinolones, and polymyxins respectively.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"25 1","pages":"384"},"PeriodicalIF":2.9,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11657800/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142852290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1