首页 > 最新文献

BMC Bioinformatics最新文献

英文 中文
A novel approach to the analysis of Overall Survival (OS) as response with Progression-Free Interval (PFI) as condition based on the RNA-seq expression data in The Cancer Genome Atlas (TCGA) 基于《癌症基因组图谱》(The Cancer Genome Atlas,TCGA)中的 RNA-seq 表达数据,以无进展间期(Progression-Free Interval,PFI)为条件,分析作为反应的总生存期(Overall Survival,OS)的新方法
IF 3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-13 DOI: 10.1186/s12859-024-05897-1
Bo Lin, Kaipeng Wang, Yuan Yuan, Yueguo Wang, Qingyuan Liu, Yulan Wang, Jian Sun, Wenwen Wang, Huanli Wang, Shusheng Zhou, Kui Jin, Mengping Zhang, Yinglei Lai
Overall Survival (OS) and Progression-Free Interval (PFI) as survival times have been collected in The Cancer Genome Atlas (TCGA). It is of biomedical interest to consider their dependence in pathway detection and survival prediction. We intend to develop novel methods for integrating PFI as condition based on parametric survival models for identifying pathways associated with OS and predicting OS. Based on the framework of conditional probability, we developed a family of frailty-based parametric-models for this purpose, with exponential or Weibull distribution as baseline. We also considered two classes of existing methods with PFI as a covariate. We evaluated the performance of three approaches by analyzing RNA-seq expression data from TCGA for lung squamous cell carcinoma and lung adenocarcinoma (LUNG), brain lower grade glioma and glioblastoma multiforme (GBMLGG), as well as skin cutaneous melanoma (SKCM). Our focus was on fourteen general cancer-related pathways. The 10-fold cross-validation was employed for the evaluation of predictive accuracy. For LUNG, p53 signaling and cell cycle pathways were detected by all approaches. Furthermore, three approaches with the consideration of PFI demonstrated significantly better predictive performance compared to the approaches without the consideration of PFI. For GBMLGG, ten pathways (e.g., Wnt signaling, JAK-STAT signaling, ECM-receptor interaction, etc.) were detected by all approaches. Furthermore, three approaches with the consideration of PFI demonstrated better predictive performance compared to the approaches without the consideration of PFI. For SKCM, p53 signaling pathway was detected only by our Weibull-baseline-based model. And three approaches with the consideration of PFI demonstrated significantly better predictive performance compared to the approaches without the consideration of PFI. Based on our study, it is necessary to incorporate PFI into the survival analysis of OS. Furthermore, PFI is a survival-type time, and improved results can be achieved by our conditional-probability-based approach.
癌症基因组图谱(TCGA)收集了作为生存时间的总生存期(OS)和无进展间期(PFI)。考虑它们在通路检测和生存预测中的依赖关系具有生物医学意义。我们打算在参数生存模型的基础上,开发整合 PFI 作为条件的新方法,以识别与 OS 相关的通路并预测 OS。基于条件概率框架,我们为此开发了一系列基于虚弱的参数模型,以指数分布或魏布勒分布为基线。我们还考虑了以 PFI 作为协变量的两类现有方法。我们通过分析 TCGA 中肺鳞状细胞癌和肺腺癌(LUNG)、脑低级别胶质瘤和多形性胶质母细胞瘤(GBMLGG)以及皮肤黑色素瘤(SKCM)的 RNA-seq 表达数据,评估了三种方法的性能。我们的重点是 14 条与癌症相关的一般路径。我们采用了 10 倍交叉验证来评估预测准确性。对于肺癌,所有方法都检测到了 p53 信号传导和细胞周期通路。此外,与未考虑 PFI 的方法相比,考虑了 PFI 的三种方法显示出明显更好的预测性能。对于 GBMLGG,所有方法都检测到了十种途径(如 Wnt 信号转导、JAK-STAT 信号转导、ECM-受体相互作用等)。此外,与未考虑 PFI 的方法相比,考虑 PFI 的三种方法显示出更好的预测性能。对于 SKCM,只有基于 Weibull 基线的模型检测到 p53 信号通路。与未考虑 PFI 的方法相比,考虑 PFI 的三种方法的预测性能明显更好。根据我们的研究,有必要将 PFI 纳入 OS 的生存分析中。此外,PFI 是一种生存类型的时间,我们基于条件概率的方法可以改善结果。
{"title":"A novel approach to the analysis of Overall Survival (OS) as response with Progression-Free Interval (PFI) as condition based on the RNA-seq expression data in The Cancer Genome Atlas (TCGA)","authors":"Bo Lin, Kaipeng Wang, Yuan Yuan, Yueguo Wang, Qingyuan Liu, Yulan Wang, Jian Sun, Wenwen Wang, Huanli Wang, Shusheng Zhou, Kui Jin, Mengping Zhang, Yinglei Lai","doi":"10.1186/s12859-024-05897-1","DOIUrl":"https://doi.org/10.1186/s12859-024-05897-1","url":null,"abstract":"Overall Survival (OS) and Progression-Free Interval (PFI) as survival times have been collected in The Cancer Genome Atlas (TCGA). It is of biomedical interest to consider their dependence in pathway detection and survival prediction. We intend to develop novel methods for integrating PFI as condition based on parametric survival models for identifying pathways associated with OS and predicting OS. Based on the framework of conditional probability, we developed a family of frailty-based parametric-models for this purpose, with exponential or Weibull distribution as baseline. We also considered two classes of existing methods with PFI as a covariate. We evaluated the performance of three approaches by analyzing RNA-seq expression data from TCGA for lung squamous cell carcinoma and lung adenocarcinoma (LUNG), brain lower grade glioma and glioblastoma multiforme (GBMLGG), as well as skin cutaneous melanoma (SKCM). Our focus was on fourteen general cancer-related pathways. The 10-fold cross-validation was employed for the evaluation of predictive accuracy. For LUNG, p53 signaling and cell cycle pathways were detected by all approaches. Furthermore, three approaches with the consideration of PFI demonstrated significantly better predictive performance compared to the approaches without the consideration of PFI. For GBMLGG, ten pathways (e.g., Wnt signaling, JAK-STAT signaling, ECM-receptor interaction, etc.) were detected by all approaches. Furthermore, three approaches with the consideration of PFI demonstrated better predictive performance compared to the approaches without the consideration of PFI. For SKCM, p53 signaling pathway was detected only by our Weibull-baseline-based model. And three approaches with the consideration of PFI demonstrated significantly better predictive performance compared to the approaches without the consideration of PFI. Based on our study, it is necessary to incorporate PFI into the survival analysis of OS. Furthermore, PFI is a survival-type time, and improved results can be achieved by our conditional-probability-based approach.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mild cognitive impairment prediction based on multi-stream convolutional neural networks 基于多流卷积神经网络的轻度认知障碍预测
IF 3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-12 DOI: 10.1186/s12859-024-05911-6
Chien-Cheng Lee, Hong-Han (Hank) Chau, Hsiao-Lun Wang, Yi-Fang Chuang, Yawgeng Chau
Mild cognitive impairment (MCI) is the transition stage between the cognitive decline expected in normal aging and more severe cognitive decline such as dementia. The early diagnosis of MCI plays an important role in human healthcare. Current methods of MCI detection include cognitive tests to screen for executive function impairments, possibly followed by neuroimaging tests. However, these methods are expensive and time-consuming. Several studies have demonstrated that MCI and dementia can be detected by machine learning technologies from different modality data. This study proposes a multi-stream convolutional neural network (MCNN) model to predict MCI from face videos. The total effective data are 48 facial videos from 45 participants, including 35 videos from normal cognitive participants and 13 videos from MCI participants. The videos are divided into several segments. Then, the MCNN captures the latent facial spatial features and facial dynamic features of each segment and classifies the segment as MCI or normal. Finally, the aggregation stage produces the final detection results of the input video. We evaluate 27 MCNN model combinations including three ResNet architectures, three optimizers, and three activation functions. The experimental results showed that the ResNet-50 backbone with Swish activation function and Ranger optimizer produces the best results with an F1-score of 89% at the segment level. However, the ResNet-18 backbone with Swish and Ranger achieves the F1-score of 100% at the participant level. This study presents an efficient new method for predicting MCI from facial videos. Studies have shown that MCI can be detected from facial videos, and facial data can be used as a biomarker for MCI. This approach is very promising for developing accurate models for screening MCI through facial data. It demonstrates that automated, non-invasive, and inexpensive MCI screening methods are feasible and do not require highly subjective paper-and-pencil questionnaires. Evaluation of 27 model combinations also found that ResNet-50 with Swish is more stable for different optimizers. Such results provide directions for hyperparameter tuning to further improve MCI predictions.
轻度认知功能障碍(MCI)是正常衰老过程中预期出现的认知功能衰退与痴呆症等更严重的认知功能衰退之间的过渡阶段。早期诊断 MCI 在人类医疗保健中发挥着重要作用。目前检测 MCI 的方法包括认知测试,以筛查执行功能障碍,随后可能进行神经影像测试。然而,这些方法既昂贵又耗时。一些研究表明,机器学习技术可以从不同的模态数据中检测出 MCI 和痴呆症。本研究提出了一种多流卷积神经网络(MCNN)模型来预测人脸视频中的 MCI。总有效数据为 45 名参与者的 48 段面部视频,其中 35 段来自认知正常的参与者,13 段来自 MCI 参与者。这些视频被分为几个片段。然后,MCNN 捕捉每个片段的潜在面部空间特征和面部动态特征,并将该片段分为 MCI 或正常。最后,汇总阶段产生输入视频的最终检测结果。我们评估了 27 种 MCNN 模型组合,包括三种 ResNet 架构、三种优化器和三种激活函数。实验结果表明,带有 Swish 激活函数和 Ranger 优化器的 ResNet-50 主干网效果最好,在片段级别的 F1 分数达到 89%。然而,带有 Swish 和 Ranger 的 ResNet-18 主干网在参与者层面的 F1 分数达到了 100%。本研究提出了一种从面部视频预测 MCI 的高效新方法。研究表明,MCI 可以从面部视频中检测出来,而且面部数据可以用作 MCI 的生物标记。这种方法很有希望开发出通过面部数据筛查 MCI 的精确模型。它证明了自动化、非侵入性和廉价的 MCI 筛查方法是可行的,而且不需要主观性很强的纸笔问卷。对 27 种模型组合的评估还发现,ResNet-50 与 Swish 对于不同的优化器更稳定。这些结果为超参数调整提供了方向,从而进一步改进 MCI 预测。
{"title":"Mild cognitive impairment prediction based on multi-stream convolutional neural networks","authors":"Chien-Cheng Lee, Hong-Han (Hank) Chau, Hsiao-Lun Wang, Yi-Fang Chuang, Yawgeng Chau","doi":"10.1186/s12859-024-05911-6","DOIUrl":"https://doi.org/10.1186/s12859-024-05911-6","url":null,"abstract":"Mild cognitive impairment (MCI) is the transition stage between the cognitive decline expected in normal aging and more severe cognitive decline such as dementia. The early diagnosis of MCI plays an important role in human healthcare. Current methods of MCI detection include cognitive tests to screen for executive function impairments, possibly followed by neuroimaging tests. However, these methods are expensive and time-consuming. Several studies have demonstrated that MCI and dementia can be detected by machine learning technologies from different modality data. This study proposes a multi-stream convolutional neural network (MCNN) model to predict MCI from face videos. The total effective data are 48 facial videos from 45 participants, including 35 videos from normal cognitive participants and 13 videos from MCI participants. The videos are divided into several segments. Then, the MCNN captures the latent facial spatial features and facial dynamic features of each segment and classifies the segment as MCI or normal. Finally, the aggregation stage produces the final detection results of the input video. We evaluate 27 MCNN model combinations including three ResNet architectures, three optimizers, and three activation functions. The experimental results showed that the ResNet-50 backbone with Swish activation function and Ranger optimizer produces the best results with an F1-score of 89% at the segment level. However, the ResNet-18 backbone with Swish and Ranger achieves the F1-score of 100% at the participant level. This study presents an efficient new method for predicting MCI from facial videos. Studies have shown that MCI can be detected from facial videos, and facial data can be used as a biomarker for MCI. This approach is very promising for developing accurate models for screening MCI through facial data. It demonstrates that automated, non-invasive, and inexpensive MCI screening methods are feasible and do not require highly subjective paper-and-pencil questionnaires. Evaluation of 27 model combinations also found that ResNet-50 with Swish is more stable for different optimizers. Such results provide directions for hyperparameter tuning to further improve MCI predictions.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
altAFplotter: a web app for reliable UPD detection in NGS diagnostics altAFplotter:用于 NGS 诊断中可靠 UPD 检测的网络应用程序
IF 3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-12 DOI: 10.1186/s12859-024-05922-3
Maximilian Radtke, Johanna Moch, Julia Hentschel, Isabell Schumann
The detection of uniparental disomies (the inheritance of both chromosome homologues from a single parent, UPDs) is not part of most standard or commercial NGS-pipelines in human genetics and thus a common gap in NGS diagnostics. To address this we developed a tool for UPD-detection based on panel or exome data which is easy to use and publicly available. The app is freely available at https://altafplotter.uni-leipzig.de/ and implemented in Python, using the Streamlit framework for data science web apps. It utilizes bcftools and tabix for processing vcf files. The source code is available at https://github.com/HUGLeipzig/altafplotter and can be used to host your own instance of the tool. We believe the app to be a great benefit for research and diagnostic labs, which struggle identifying and interpreting UPDs in their NGS diagnostic setup. The information provided allows a quick interpretation of the results and thus is suitable for usage in a high throughput manner by clinicians and biologists.
人类遗传学中的大多数标准或商业 NGS 管线都不包括单亲遗传病(单亲染色体同源染色体的遗传,UPDs)的检测,因此这是 NGS 诊断中的一个常见缺陷。为了解决这个问题,我们开发了一个基于面板或外显子组数据的 UPD 检测工具,该工具易于使用,并可公开获取。该应用程序可在 https://altafplotter.uni-leipzig.de/ 免费获取,使用 Python 实现,并使用了用于数据科学网络应用程序的 Streamlit 框架。它利用 bcftools 和 tabix 处理 vcf 文件。源代码可从 https://github.com/HUGLeipzig/altafplotter 获取,您可以用它来托管自己的工具实例。我们相信,该应用程序对研究和诊断实验室大有裨益,因为他们在 NGS 诊断设置中很难识别和解释 UPD。所提供的信息可以快速解读结果,因此适合临床医生和生物学家以高通量的方式使用。
{"title":"altAFplotter: a web app for reliable UPD detection in NGS diagnostics","authors":"Maximilian Radtke, Johanna Moch, Julia Hentschel, Isabell Schumann","doi":"10.1186/s12859-024-05922-3","DOIUrl":"https://doi.org/10.1186/s12859-024-05922-3","url":null,"abstract":"The detection of uniparental disomies (the inheritance of both chromosome homologues from a single parent, UPDs) is not part of most standard or commercial NGS-pipelines in human genetics and thus a common gap in NGS diagnostics. To address this we developed a tool for UPD-detection based on panel or exome data which is easy to use and publicly available. The app is freely available at https://altafplotter.uni-leipzig.de/ and implemented in Python, using the Streamlit framework for data science web apps. It utilizes bcftools and tabix for processing vcf files. The source code is available at https://github.com/HUGLeipzig/altafplotter and can be used to host your own instance of the tool. We believe the app to be a great benefit for research and diagnostic labs, which struggle identifying and interpreting UPDs in their NGS diagnostic setup. The information provided allows a quick interpretation of the results and thus is suitable for usage in a high throughput manner by clinicians and biologists.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PopMLvis: a tool for analysis and visualization of population structure using genotype data from genome-wide association studies PopMLvis:利用全基因组关联研究的基因型数据进行群体结构分析和可视化的工具
IF 3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-11 DOI: 10.1186/s12859-024-05908-1
Mohamed Elshrif, Keivin Isufaj, Khalid Kunji, Mohamad Saad
One of the aims of population genetics is to identify genetic differences/similarities among individuals of multiple ancestries. Many approaches including principal component analysis, clustering, and maximum likelihood techniques can be used to assign individuals to a given ancestry based on their genetic makeup. Although there are several tools that implement such algorithms, there is a lack of interactive visual platforms to run a variety of algorithms in one place. Therefore, we developed PopMLvis, a platform that offers an interactive environment to visualize genetic similarity data using several algorithms, and generate figures that can be easily integrated into scientific articles.
群体遗传学的目标之一是确定多个祖先个体之间的遗传差异/相似性。包括主成分分析、聚类和最大似然技术在内的许多方法都可用于根据个体的基因构成将其归入特定祖先。虽然有多种工具可以实现这些算法,但目前还缺乏可在一个地方运行多种算法的交互式可视化平台。因此,我们开发了 PopMLvis,这是一个提供交互式环境的平台,可使用多种算法可视化遗传相似性数据,并生成可轻松整合到科学文章中的图表。
{"title":"PopMLvis: a tool for analysis and visualization of population structure using genotype data from genome-wide association studies","authors":"Mohamed Elshrif, Keivin Isufaj, Khalid Kunji, Mohamad Saad","doi":"10.1186/s12859-024-05908-1","DOIUrl":"https://doi.org/10.1186/s12859-024-05908-1","url":null,"abstract":"One of the aims of population genetics is to identify genetic differences/similarities among individuals of multiple ancestries. Many approaches including principal component analysis, clustering, and maximum likelihood techniques can be used to assign individuals to a given ancestry based on their genetic makeup. Although there are several tools that implement such algorithms, there is a lack of interactive visual platforms to run a variety of algorithms in one place. Therefore, we developed PopMLvis, a platform that offers an interactive environment to visualize genetic similarity data using several algorithms, and generate figures that can be easily integrated into scientific articles.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BayesianSSA: a Bayesian statistical model based on structural sensitivity analysis for predicting responses to enzyme perturbations in metabolic networks BayesianSSA:基于结构敏感性分析的贝叶斯统计模型,用于预测代谢网络中酶扰动的反应
IF 3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-10 DOI: 10.1186/s12859-024-05921-4
Shion Hosoda, Hisashi Iwata, Takuya Miura, Maiko Tanabe, Takashi Okada, Atsushi Mochizuki, Miwa Sato
Chemical bioproduction has attracted attention as a key technology in a decarbonized society. In computational design for chemical bioproduction, it is necessary to predict changes in metabolic fluxes when up-/down-regulating enzymatic reactions, that is, responses of the system to enzyme perturbations. Structural sensitivity analysis (SSA) was previously developed as a method to predict qualitative responses to enzyme perturbations on the basis of the structural information of the reaction network. However, the network structural information can sometimes be insufficient to predict qualitative responses unambiguously, which is a practical issue in bioproduction applications. To address this, in this study, we propose BayesianSSA, a Bayesian statistical model based on SSA. BayesianSSA extracts environmental information from perturbation datasets collected in environments of interest and integrates it into SSA predictions. We applied BayesianSSA to synthetic and real datasets of the central metabolic pathway of Escherichia coli. Our result demonstrates that BayesianSSA can successfully integrate environmental information extracted from perturbation data into SSA predictions. In addition, the posterior distribution estimated by BayesianSSA can be associated with the known pathway reported to enhance succinate export flux in previous studies. We believe that BayesianSSA will accelerate the chemical bioproduction process and contribute to advancements in the field.
化学生物生产作为去碳化社会的一项关键技术备受关注。在化学生物生产的计算设计中,有必要预测上调/下调酶反应时代谢通量的变化,即系统对酶扰动的反应。结构灵敏度分析(SSA)是一种基于反应网络结构信息预测酶扰动定性反应的方法。然而,网络结构信息有时不足以明确预测定性反应,这是生物生产应用中的一个实际问题。针对这一问题,我们在本研究中提出了基于 SSA 的贝叶斯统计模型 BayesianSSA。BayesianSSA 从在相关环境中收集的扰动数据集中提取环境信息,并将其整合到 SSA 预测中。我们将 BayesianSSA 应用于大肠杆菌中心代谢途径的合成数据集和真实数据集。结果表明,BayesianSSA 可以成功地将从扰动数据中提取的环境信息整合到 SSA 预测中。此外,BayesianSSA 估计的后验分布可以与以往研究中报道的提高琥珀酸输出通量的已知途径相关联。我们相信,BayesianSSA 将加速化学生物生产过程,并促进该领域的进步。
{"title":"BayesianSSA: a Bayesian statistical model based on structural sensitivity analysis for predicting responses to enzyme perturbations in metabolic networks","authors":"Shion Hosoda, Hisashi Iwata, Takuya Miura, Maiko Tanabe, Takashi Okada, Atsushi Mochizuki, Miwa Sato","doi":"10.1186/s12859-024-05921-4","DOIUrl":"https://doi.org/10.1186/s12859-024-05921-4","url":null,"abstract":"Chemical bioproduction has attracted attention as a key technology in a decarbonized society. In computational design for chemical bioproduction, it is necessary to predict changes in metabolic fluxes when up-/down-regulating enzymatic reactions, that is, responses of the system to enzyme perturbations. Structural sensitivity analysis (SSA) was previously developed as a method to predict qualitative responses to enzyme perturbations on the basis of the structural information of the reaction network. However, the network structural information can sometimes be insufficient to predict qualitative responses unambiguously, which is a practical issue in bioproduction applications. To address this, in this study, we propose BayesianSSA, a Bayesian statistical model based on SSA. BayesianSSA extracts environmental information from perturbation datasets collected in environments of interest and integrates it into SSA predictions. We applied BayesianSSA to synthetic and real datasets of the central metabolic pathway of Escherichia coli. Our result demonstrates that BayesianSSA can successfully integrate environmental information extracted from perturbation data into SSA predictions. In addition, the posterior distribution estimated by BayesianSSA can be associated with the known pathway reported to enhance succinate export flux in previous studies. We believe that BayesianSSA will accelerate the chemical bioproduction process and contribute to advancements in the field.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiCMC: High-Efficiency Contact Matrix Compressor HiCMC: 高效接触式矩阵压缩机
IF 3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-10 DOI: 10.1186/s12859-024-05907-2
Yeremia Gunawan Adhisantoso, Tim Körner, Fabian Müntefering, Jörn Ostermann, Jan Voges
Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data. By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses. HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc .
染色体组织在复制、调节和转录等生物过程中发挥着重要作用。研究染色体结构与其生物功能之间关系的一种方法是 Hi-C 研究,这是一种捕捉染色体构象的全基因组方法。这种研究会产生大量数据。染色体组织是动态的,需要在不同的时间点进行快照,这进一步增加了需要存储的数据量,从而使问题更加严重。我们提出了一种名为高效接触矩阵压缩器(HiCMC)的新方法,用于高效压缩 Hi-C 数据。通过对接触矩阵中发现的底层结构(如隔室和域)进行建模,HiCMC 在多个细胞系和接触矩阵分辨率上的表现比最先进的 CMC 方法优胜约 8%,比其他最先进的 Cooler、LZMA 和 bzip2 方法优胜 50%以上。此外,HiCMC 还将特定领域的信息整合到其生成的压缩比特流中,这些信息可用于加速下游分析。HiCMC 是一种新颖的压缩方法,它利用了接触矩阵的固有特性,如区块和域。与最先进的方法相比,它能实现更好的压缩效果。HiCMC 可在 https://github.com/sXperfect/hicmc 上获取。
{"title":"HiCMC: High-Efficiency Contact Matrix Compressor","authors":"Yeremia Gunawan Adhisantoso, Tim Körner, Fabian Müntefering, Jörn Ostermann, Jan Voges","doi":"10.1186/s12859-024-05907-2","DOIUrl":"https://doi.org/10.1186/s12859-024-05907-2","url":null,"abstract":"Chromosome organization plays an important role in biological processes such as replication, regulation, and transcription. One way to study the relationship between chromosome structure and its biological functions is through Hi-C studies, a genome-wide method for capturing chromosome conformation. Such studies generate vast amounts of data. The problem is exacerbated by the fact that chromosome organization is dynamic, requiring snapshots at different points in time, further increasing the amount of data to be stored. We present a novel approach called the High-Efficiency Contact Matrix Compressor (HiCMC) for efficient compression of Hi-C data. By modeling the underlying structures found in the contact matrix, such as compartments and domains, HiCMC outperforms the state-of-the-art method CMC by approximately 8% and the other state-of-the-art methods cooler, LZMA, and bzip2 by over 50% across multiple cell lines and contact matrix resolutions. In addition, HiCMC integrates domain-specific information into the compressed bitstreams that it generates, and this information can be used to speed up downstream analyses. HiCMC is a novel compression approach that utilizes intrinsic properties of contact matrix, such as compartments and domains. It allows for a better compression in comparison to the state-of-the-art methods. HiCMC is available at https://github.com/sXperfect/hicmc .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142219726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mouse embryo CoCoPUTs: novel murine transcriptomic-weighted usage website featuring multiple strains, tissues, and stages. 小鼠胚胎 CoCoPUTs:具有多个品系、组织和阶段的新型小鼠转录组加权使用网站。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-06 DOI: 10.1186/s12859-024-05906-3
Sarah E Fumagalli, Sean Smith, Tigran Ghazanchyan, Douglas Meyer, Rahul Paul, Collin Campbell, Luis Santana-Quintero, Anton Golikov, Juan Ibla, Haim Bar, Anton A Komar, Ryan C Hunt, Brian Lin, Michael DiCuccio, Chava Kimchi-Sarfaty

Mouse (Mus musculus) models have been heavily utilized in developmental biology research to understand mammalian embryonic development, as mice share many genetic, physiological, and developmental characteristics with humans. New explorations into the integration of temporal (stage-specific) and transcriptional (tissue-specific) data have expanded our knowledge of mouse embryo tissue-specific gene functions. To better understand the substantial impact of synonymous mutational variations in the cell-state-specific transcriptome on a tissue's codon and codon pair usage landscape, we have established a novel resource-Mouse Embryo Codon and Codon Pair Usage Tables (Mouse Embryo CoCoPUTs). This webpage not only offers codon and codon pair usage, but also GC, dinucleotide, and junction dinucleotide usage, encompassing four strains, 15 murine embryonic tissue groups, 18 Theiler stages, and 26 embryonic days. Here, we leverage Mouse Embryo CoCoPUTs and employ the use of heatmaps to depict usage changes over time and a comparison to human usage for each strain and embryonic time point, highlighting unique differences and similarities. The usage similarities found between mouse and human central nervous system data highlight the translation for projects leveraging mouse models. Data for this analysis can be directly retrieved from Mouse Embryo CoCoPUTs. This cutting-edge resource plays a crucial role in deciphering the complex interplay between usage patterns and embryonic development, offering valuable insights into variation across diverse tissues, strains, and stages. Its applications extend across multiple domains, with notable advantages for biotherapeutic development, where optimizing codon usage can enhance protein expression; one can compare strains, tissues, and mouse embryonic stages in one query. Additionally, Mouse Embryo CoCoPUTs holds great potential in the field of tissue-specific genetic engineering, providing insights for tailoring gene expression to specific tissues for targeted interventions. Furthermore, this resource may enhance our understanding of the nuanced connections between usage biases and tissue-specific gene function, contributing to the development of more accurate predictive models for genetic disorders.

小鼠(Mus musculus)模型在发育生物学研究中被大量用于了解哺乳动物的胚胎发育,因为小鼠与人类有许多共同的遗传、生理和发育特征。对整合时间(阶段特异性)和转录(组织特异性)数据的新探索扩大了我们对小鼠胚胎组织特异性基因功能的了解。为了更好地了解细胞状态特异性转录组中的同义突变对组织的密码子和密码子对使用情况的重大影响,我们建立了一个新的资源--小鼠胚胎密码子和密码子对使用表(Mouse Embryo CoCoPUTs)。该网页不仅提供密码子和密码子对的使用情况,还提供 GC、二核苷酸和连接二核苷酸的使用情况,涵盖 4 个品系、15 个小鼠胚胎组织组、18 个 Theiler 阶段和 26 个胚胎日。在这里,我们利用小鼠胚胎 CoCoPUTs 并使用热图来描述每个品系和胚胎时间点的用量随时间的变化以及与人类用量的比较,突出了独特的差异和相似性。在小鼠和人类中枢神经系统数据之间发现的用法相似性突出了利用小鼠模型的项目转化。用于该分析的数据可直接从小鼠胚胎 CoCoPUTs 中获取。这一尖端资源在解读使用模式与胚胎发育之间复杂的相互作用方面发挥着至关重要的作用,为了解不同组织、品系和阶段的变异提供了宝贵的信息。它的应用涉及多个领域,在生物治疗开发方面具有显著优势,因为优化密码子的使用可以提高蛋白质的表达;人们可以在一次查询中比较品系、组织和小鼠胚胎阶段。此外,小鼠胚胎 CoCoPUTs 在组织特异性基因工程领域具有巨大的潜力,它为针对特定组织的基因表达进行有针对性的干预提供了见解。此外,该资源还能加深我们对使用偏差和组织特异性基因功能之间细微联系的理解,有助于开发更准确的遗传疾病预测模型。
{"title":"Mouse embryo CoCoPUTs: novel murine transcriptomic-weighted usage website featuring multiple strains, tissues, and stages.","authors":"Sarah E Fumagalli, Sean Smith, Tigran Ghazanchyan, Douglas Meyer, Rahul Paul, Collin Campbell, Luis Santana-Quintero, Anton Golikov, Juan Ibla, Haim Bar, Anton A Komar, Ryan C Hunt, Brian Lin, Michael DiCuccio, Chava Kimchi-Sarfaty","doi":"10.1186/s12859-024-05906-3","DOIUrl":"10.1186/s12859-024-05906-3","url":null,"abstract":"<p><p>Mouse (Mus musculus) models have been heavily utilized in developmental biology research to understand mammalian embryonic development, as mice share many genetic, physiological, and developmental characteristics with humans. New explorations into the integration of temporal (stage-specific) and transcriptional (tissue-specific) data have expanded our knowledge of mouse embryo tissue-specific gene functions. To better understand the substantial impact of synonymous mutational variations in the cell-state-specific transcriptome on a tissue's codon and codon pair usage landscape, we have established a novel resource-Mouse Embryo Codon and Codon Pair Usage Tables (Mouse Embryo CoCoPUTs). This webpage not only offers codon and codon pair usage, but also GC, dinucleotide, and junction dinucleotide usage, encompassing four strains, 15 murine embryonic tissue groups, 18 Theiler stages, and 26 embryonic days. Here, we leverage Mouse Embryo CoCoPUTs and employ the use of heatmaps to depict usage changes over time and a comparison to human usage for each strain and embryonic time point, highlighting unique differences and similarities. The usage similarities found between mouse and human central nervous system data highlight the translation for projects leveraging mouse models. Data for this analysis can be directly retrieved from Mouse Embryo CoCoPUTs. This cutting-edge resource plays a crucial role in deciphering the complex interplay between usage patterns and embryonic development, offering valuable insights into variation across diverse tissues, strains, and stages. Its applications extend across multiple domains, with notable advantages for biotherapeutic development, where optimizing codon usage can enhance protein expression; one can compare strains, tissues, and mouse embryonic stages in one query. Additionally, Mouse Embryo CoCoPUTs holds great potential in the field of tissue-specific genetic engineering, providing insights for tailoring gene expression to specific tissues for targeted interventions. Furthermore, this resource may enhance our understanding of the nuanced connections between usage biases and tissue-specific gene function, contributing to the development of more accurate predictive models for genetic disorders.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142145048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assigning credit where it is due: an information content score to capture the clinical value of multiplexed assays of variant effect. 功在当代,利在千秋:用信息含量评分来衡量变异效应多重检测的临床价值。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-06 DOI: 10.1186/s12859-024-05920-5
John Michael O Ranola, Carolyn Horton, Tina Pesaran, Shawn Fayer, Lea M Starita, Brian H Shirts

Background: A variant can be pathogenic or benign with relation to a human disease. Current classification categories from benign to pathogenic reflect a probabilistic summary of the current understanding. A primary metric of clinical utility for multiplexed assays of variant effect (MAVE) is the number of variants that can be reclassified from uncertain significance (VUS). However, a gap in this measure of utility is that it underrepresents the information gained from MAVEs. The aim of this study was to develop an improved quantification metric for MAVE utility. We propose adopting an information content approach that includes data that does not reclassify variants will better reflect true information gain. We adopted an information content approach to evaluate the information gain, in bits, for MAVEs of BRCA1, PTEN, and TP53. Here, one bit represents the amount of information required to completely classify a single variant starting from no information.

Results: BRCA1 MAVEs produced a total of 831.2 bits of information, 6.58% of the total missense information in BRCA1 and a 22-fold increase over the information that only contributed to VUS reclassification. PTEN MAVEs produced 2059.6 bits of information which represents 32.8% of the total missense information in PTEN and an 85-fold increase over the information that contributed to VUS reclassification. TP53 MAVEs produced 277.8 bits of information which represents 6.22% of the total missense information in TP53 and a 3.5-fold increase over the information that contributed to VUS reclassification.

Conclusions: An information content approach will more accurately portray information gained through MAVE mapping efforts than by counting the number of variants reclassified. This information content approach may also help define the impact of guideline changes that modify the information definitions used to classify groups of variants.

背景:与人类疾病相关的变异体可以是致病的,也可以是良性的。目前从良性到致病性的分类类别反映了对当前认识的概率总结。变异效应多重检测(MAVE)临床实用性的一个主要衡量标准是可从不确定性(VUS)中重新分类的变异数量。然而,这一效用衡量标准的不足之处在于它未能充分反映从多重变异效应检测中获得的信息。本研究的目的是为 MAVE 实用性开发一种改进的量化指标。我们建议采用信息含量法,其中包括不对变体进行重新分类的数据,这样可以更好地反映真实的信息增益。我们采用信息含量法来评估 BRCA1、PTEN 和 TP53 的 MAVE 的信息增益(以比特为单位)。在这里,一个比特表示从无信息开始对单个变体进行完全分类所需的信息量:BRCA1 MAVEs 总共产生了 831.2 比特的信息,占 BRCA1 错义信息总量的 6.58%,比只有助于 VUS 重新分类的信息增加了 22 倍。PTEN MAVEs产生了2059.6比特的信息,占PTEN中错义信息总量的32.8%,比有助于VUS重新分类的信息增加了85倍。TP53 MAVEs产生了277.8比特的信息,占TP53总错义信息的6.22%,比VUS重新分类的信息增加了3.5倍:结论:与计算重新分类的变异体数量相比,信息内容法能更准确地描述通过 MAVE 图谱工作获得的信息。这种信息内容方法还有助于确定指南变更的影响,因为指南变更会修改用于变异组分类的信息定义。
{"title":"Assigning credit where it is due: an information content score to capture the clinical value of multiplexed assays of variant effect.","authors":"John Michael O Ranola, Carolyn Horton, Tina Pesaran, Shawn Fayer, Lea M Starita, Brian H Shirts","doi":"10.1186/s12859-024-05920-5","DOIUrl":"10.1186/s12859-024-05920-5","url":null,"abstract":"<p><strong>Background: </strong>A variant can be pathogenic or benign with relation to a human disease. Current classification categories from benign to pathogenic reflect a probabilistic summary of the current understanding. A primary metric of clinical utility for multiplexed assays of variant effect (MAVE) is the number of variants that can be reclassified from uncertain significance (VUS). However, a gap in this measure of utility is that it underrepresents the information gained from MAVEs. The aim of this study was to develop an improved quantification metric for MAVE utility. We propose adopting an information content approach that includes data that does not reclassify variants will better reflect true information gain. We adopted an information content approach to evaluate the information gain, in bits, for MAVEs of BRCA1, PTEN, and TP53. Here, one bit represents the amount of information required to completely classify a single variant starting from no information.</p><p><strong>Results: </strong>BRCA1 MAVEs produced a total of 831.2 bits of information, 6.58% of the total missense information in BRCA1 and a 22-fold increase over the information that only contributed to VUS reclassification. PTEN MAVEs produced 2059.6 bits of information which represents 32.8% of the total missense information in PTEN and an 85-fold increase over the information that contributed to VUS reclassification. TP53 MAVEs produced 277.8 bits of information which represents 6.22% of the total missense information in TP53 and a 3.5-fold increase over the information that contributed to VUS reclassification.</p><p><strong>Conclusions: </strong>An information content approach will more accurately portray information gained through MAVE mapping efforts than by counting the number of variants reclassified. This information content approach may also help define the impact of guideline changes that modify the information definitions used to classify groups of variants.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11380199/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142145037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain. Cortexa:研究小鼠大脑基因表达和替代剪接的综合资源。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-05 DOI: 10.1186/s12859-024-05919-y
Stephan Weißbach, Jonas Milkovits, Stefan Pastore, Martin Heine, Susanne Gerber, Hristo Todorov

Background: Gene expression and alternative splicing are strictly regulated processes that shape brain development and determine the cellular identity of differentiated neural cell populations. Despite the availability of multiple valuable datasets, many functional implications, especially those related to alternative splicing, remain poorly understood. Moreover, neuroscientists working primarily experimentally often lack the bioinformatics expertise required to process alternative splicing data and produce meaningful and interpretable results. Notably, re-analyzing publicly available datasets and integrating them with in-house data can provide substantial novel insights. However, such analyses necessitate developing harmonized data handling and processing pipelines which in turn require considerable computational resources and in-depth bioinformatics expertise.

Results: Here, we present Cortexa-a comprehensive web portal that incorporates RNA-sequencing datasets from the mouse cerebral cortex (longitudinal or cell-specific) and the hippocampus. Cortexa facilitates understandable visualization of the expression and alternative splicing patterns of individual genes. Our platform provides SplicePCA-a tool that allows users to integrate their alternative splicing dataset and compare it to cell-specific or developmental neocortical splicing patterns. All standardized gene expression and alternative splicing datasets can be downloaded for further in-depth downstream analysis without the need for extensive preprocessing.

Conclusions: Cortexa provides a robust and readily available resource for unraveling the complexity of gene expression and alternative splicing regulatory processes in the mouse brain. The data portal is available at https://cortexa-rna.com/.

背景:基因表达和替代剪接是受到严格调控的过程,它们影响着大脑的发育,并决定着分化神经细胞群的细胞特性。尽管有多种有价值的数据集,但人们对许多功能影响,尤其是与替代剪接有关的功能影响,仍然知之甚少。此外,主要从事实验工作的神经科学家往往缺乏处理替代剪接数据并得出有意义和可解释的结果所需的生物信息学专业知识。值得注意的是,重新分析公开数据集并将其与内部数据整合,可以提供大量新的见解。然而,这类分析需要开发统一的数据处理和加工管道,而这反过来又需要大量的计算资源和深入的生物信息学专业知识:在此,我们介绍了Cortexa--一个整合了小鼠大脑皮层(纵向或细胞特异性)和海马的RNA测序数据集的综合门户网站。Cortexa 可使单个基因的表达和替代剪接模式可视化。我们的平台提供的 SplicePCA 工具允许用户整合他们的替代剪接数据集,并将其与细胞特异性或发育期新皮质剪接模式进行比较。所有标准化的基因表达和替代剪接数据集都可以下载,以便进一步进行深入的下游分析,而无需进行大量的预处理:结论:Cortexa 为揭示小鼠大脑中基因表达和替代剪接调控过程的复杂性提供了一个强大且随时可用的资源。数据门户网站:https://cortexa-rna.com/。
{"title":"Cortexa: a comprehensive resource for studying gene expression and alternative splicing in the murine brain.","authors":"Stephan Weißbach, Jonas Milkovits, Stefan Pastore, Martin Heine, Susanne Gerber, Hristo Todorov","doi":"10.1186/s12859-024-05919-y","DOIUrl":"10.1186/s12859-024-05919-y","url":null,"abstract":"<p><strong>Background: </strong>Gene expression and alternative splicing are strictly regulated processes that shape brain development and determine the cellular identity of differentiated neural cell populations. Despite the availability of multiple valuable datasets, many functional implications, especially those related to alternative splicing, remain poorly understood. Moreover, neuroscientists working primarily experimentally often lack the bioinformatics expertise required to process alternative splicing data and produce meaningful and interpretable results. Notably, re-analyzing publicly available datasets and integrating them with in-house data can provide substantial novel insights. However, such analyses necessitate developing harmonized data handling and processing pipelines which in turn require considerable computational resources and in-depth bioinformatics expertise.</p><p><strong>Results: </strong>Here, we present Cortexa-a comprehensive web portal that incorporates RNA-sequencing datasets from the mouse cerebral cortex (longitudinal or cell-specific) and the hippocampus. Cortexa facilitates understandable visualization of the expression and alternative splicing patterns of individual genes. Our platform provides SplicePCA-a tool that allows users to integrate their alternative splicing dataset and compare it to cell-specific or developmental neocortical splicing patterns. All standardized gene expression and alternative splicing datasets can be downloaded for further in-depth downstream analysis without the need for extensive preprocessing.</p><p><strong>Conclusions: </strong>Cortexa provides a robust and readily available resource for unraveling the complexity of gene expression and alternative splicing regulatory processes in the mouse brain. The data portal is available at https://cortexa-rna.com/.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378610/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142139221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference of single-cell network using mutual information for scRNA-seq data analysis. 利用互信息推断单细胞网络,用于 scRNA-seq 数据分析。
IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Pub Date : 2024-09-05 DOI: 10.1186/s12859-024-05895-3
Lan-Yun Chang, Ting-Yi Hao, Wei-Jie Wang, Chun-Yu Lin

Background: With the advance in single-cell RNA sequencing (scRNA-seq) technology, deriving inherent biological system information from expression profiles at a single-cell resolution has become possible. It has been known that network modeling by estimating the associations between genes could better reveal dynamic changes in biological systems. However, accurately constructing a single-cell network (SCN) to capture the network architecture of each cell and further explore cell-to-cell heterogeneity remains challenging.

Results: We introduce SINUM, a method for constructing the SIngle-cell Network Using Mutual information, which estimates mutual information between any two genes from scRNA-seq data to determine whether they are dependent or independent in a specific cell. Experiments on various scRNA-seq datasets with different cell numbers based on eight performance indexes (e.g., adjusted rand index and F-measure index) validated the accuracy and robustness of SINUM in cell type identification, superior to the state-of-the-art SCN inference method. Additionally, the SINUM SCNs exhibit high overlap with the human interactome and possess the scale-free property.

Conclusions: SINUM presents a view of biological systems at the network level to detect cell-type marker genes/gene pairs and investigate time-dependent changes in gene associations during embryo development. Codes for SINUM are freely available at https://github.com/SysMednet/SINUM .

背景:随着单细胞 RNA 测序(scRNA-seq)技术的发展,从单细胞分辨率的表达谱中获取固有的生物系统信息已成为可能。众所周知,通过估计基因之间的关联建立网络模型可以更好地揭示生物系统的动态变化。然而,准确构建单细胞网络(SCN)以捕捉每个细胞的网络结构并进一步探索细胞间的异质性仍具有挑战性:我们介绍了一种利用互信息构建单细胞网络(SCN)的方法--SINUM,它能从scRNA-seq数据中估计任意两个基因之间的互信息,以确定它们在特定细胞中是依赖还是独立的。基于八项性能指标(如调整后的兰德指数和 F-measure 指数)在不同细胞数的 scRNA-seq 数据集上进行的实验验证了 SINUM 在细胞类型鉴定方面的准确性和鲁棒性,优于最先进的 SCN 推断方法。此外,SINUM SCN 与人类相互作用组具有很高的重叠性,并具有无标度特性:结论:SINUM 从网络层面展示了生物系统的视图,可用于检测细胞类型标记基因/基因对,并研究胚胎发育过程中基因关联随时间发生的变化。SINUM 的代码可在 https://github.com/SysMednet/SINUM 免费获取。
{"title":"Inference of single-cell network using mutual information for scRNA-seq data analysis.","authors":"Lan-Yun Chang, Ting-Yi Hao, Wei-Jie Wang, Chun-Yu Lin","doi":"10.1186/s12859-024-05895-3","DOIUrl":"10.1186/s12859-024-05895-3","url":null,"abstract":"<p><strong>Background: </strong>With the advance in single-cell RNA sequencing (scRNA-seq) technology, deriving inherent biological system information from expression profiles at a single-cell resolution has become possible. It has been known that network modeling by estimating the associations between genes could better reveal dynamic changes in biological systems. However, accurately constructing a single-cell network (SCN) to capture the network architecture of each cell and further explore cell-to-cell heterogeneity remains challenging.</p><p><strong>Results: </strong>We introduce SINUM, a method for constructing the SIngle-cell Network Using Mutual information, which estimates mutual information between any two genes from scRNA-seq data to determine whether they are dependent or independent in a specific cell. Experiments on various scRNA-seq datasets with different cell numbers based on eight performance indexes (e.g., adjusted rand index and F-measure index) validated the accuracy and robustness of SINUM in cell type identification, superior to the state-of-the-art SCN inference method. Additionally, the SINUM SCNs exhibit high overlap with the human interactome and possess the scale-free property.</p><p><strong>Conclusions: </strong>SINUM presents a view of biological systems at the network level to detect cell-type marker genes/gene pairs and investigate time-dependent changes in gene associations during embryo development. Codes for SINUM are freely available at https://github.com/SysMednet/SINUM .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":null,"pages":null},"PeriodicalIF":2.9,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378379/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142139222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
BMC Bioinformatics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1