Pub Date : 2025-12-01Epub Date: 2025-09-22DOI: 10.1177/15578666251377097
Jing Qi, Wen Shuai, Lv Yanqi, Yang Mingyu, Shuilin Jin
Spatial transcriptomics (ST) reveals tissue organization but presents analytical challenges due to high dimensionality and complex spatial-hierarchical structures, which are often distorted by Euclidean-based dimensionality reduction methods. Here, we introduce HyperDiffuseNet, a deep geometric learning framework designed for ST data representation. HyperDiffuseNet utilizes a variational autoencoder with a hyperbolic latent space to effectively capture hierarchical relationships. It integrates spatial context by first employing graph convolutional networks on the spatial graph to learn multi-scale dependencies, which inform the computation of a diffusion matrix. This graph-derived diffusion information is then efficiently incorporated into the hyperbolic embeddings via linear mixing in the ambient Minkowski space. The model uses negative binomial reconstruction loss and is optimized with a composite objective function balancing reconstruction fidelity, Kullback-Leibler divergence regularization, attention-weighted spatial regularization, diffusion consistency, and local structure preservation. Empirical evaluations on multiple ST datasets demonstrate that HyperDiffuseNet achieves competitive clustering performance. The hyperbolic embedding approach shows notable improvements in Silhouette coefficient and adjusted rand index metrics across most tested datasets, while maintaining comparable performance in structure preservation.
{"title":"HyperDiffuseNet: A Deep Hyperbolic Manifold Learning Method for Dimensionality Reduction in Spatial Transcriptomics.","authors":"Jing Qi, Wen Shuai, Lv Yanqi, Yang Mingyu, Shuilin Jin","doi":"10.1177/15578666251377097","DOIUrl":"10.1177/15578666251377097","url":null,"abstract":"<p><p>Spatial transcriptomics (ST) reveals tissue organization but presents analytical challenges due to high dimensionality and complex spatial-hierarchical structures, which are often distorted by Euclidean-based dimensionality reduction methods. Here, we introduce HyperDiffuseNet, a deep geometric learning framework designed for ST data representation. HyperDiffuseNet utilizes a variational autoencoder with a hyperbolic latent space to effectively capture hierarchical relationships. It integrates spatial context by first employing graph convolutional networks on the spatial graph to learn multi-scale dependencies, which inform the computation of a diffusion matrix. This graph-derived diffusion information is then efficiently incorporated into the hyperbolic embeddings via linear mixing in the ambient Minkowski space. The model uses negative binomial reconstruction loss and is optimized with a composite objective function balancing reconstruction fidelity, Kullback-Leibler divergence regularization, attention-weighted spatial regularization, diffusion consistency, and local structure preservation. Empirical evaluations on multiple ST datasets demonstrate that HyperDiffuseNet achieves competitive clustering performance. The hyperbolic embedding approach shows notable improvements in Silhouette coefficient and adjusted rand index metrics across most tested datasets, while maintaining comparable performance in structure preservation.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1101-1120"},"PeriodicalIF":1.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145113360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-11DOI: 10.1177/15578666251370765
Zi-Wei Bai, Ricky X F Chen, Michael Fuchs
An enumerative study of RNA secondary structures according to various characteristics is a topic of key importance in computational biology. RNA secondary structure pairs have also been studied in various contexts. Recently, the homology groups of the simplicial complices induced by pairs of secondary structures have been studied by Bura, He, and Reidys, providing a new way for characterizing these structure pairs. In particular, the homology group corresponding to any pair has been shown to be a free group. In this article, we provide enumerative results, both exactly and asymptotically, for those pairs giving a free group of rank zero. The asymptotic number of these structure pairs of length n is shown to be (0.2774624151…)(4.8105752536…)nn-3/2. We also prove that the distribution of the number of base pairs in those pairs of secondary structures is asymptotically normal.
{"title":"Counting RNA Loop Interaction Networks of Homology Group Rank Zero.","authors":"Zi-Wei Bai, Ricky X F Chen, Michael Fuchs","doi":"10.1177/15578666251370765","DOIUrl":"https://doi.org/10.1177/15578666251370765","url":null,"abstract":"<p><p>An enumerative study of RNA secondary structures according to various characteristics is a topic of key importance in computational biology. RNA secondary structure pairs have also been studied in various contexts. Recently, the homology groups of the simplicial complices induced by pairs of secondary structures have been studied by Bura, He, and Reidys, providing a new way for characterizing these structure pairs. In particular, the homology group <math><mrow><mrow><msub><mrow><mi>H</mi></mrow><mn>2</mn></msub></mrow></mrow></math> corresponding to any pair has been shown to be a free group. In this article, we provide enumerative results, both exactly and asymptotically, for those pairs giving a free group of rank zero. The asymptotic number of these structure pairs of length <i>n</i> is shown to be (0.2774624151…)(4.8105752536…)<sup><i>n</i></sup><i>n</i><sup>-3/2</sup>. We also prove that the distribution of the number of base pairs in those pairs of secondary structures is asymptotically normal.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"32 12","pages":"1147-1159"},"PeriodicalIF":1.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145604350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-08-25DOI: 10.1177/15578666251371079
Jiacheng Pan, Yihong Dong, Daogen Jiang, Longyang Wang
Graph neural networks have shown impressive performance in a variety of biomedical application tasks due to their powerful graph representation capabilities. Although GNN has achieved great success, the data noise and data scarcity problems commonly faced in real psychiatric disease prediction scenarios may affect the training and prediction of graph learning models. At present, there is no relevant work to obtain a reasonable solution. Data augmentation, which allows limited data to produce value equivalent to more data without substantially increasing the data, is considered a practical approach to addressing the problem of noisy data and data scarcity. In this work, we propose a method based on graph data augmentation for solving the problem of noisy data and data scarcity in mental illness prediction. To mitigate the negative effects of label noise, we use edge predictors to optimize the graph topology, enhance links to nodes with high similarity, remove erroneous noisy edges, and enhance the model robustness by adding adversarial perturbations in the feature space. In addition, a confident self-checking mechanism allows accurate pseudolabeling to be obtained, providing more supervision for the model training phase and further reducing the effect of label noise. Extensive experiments on two multimodal real mental illness datasets show that the proposed approach has better performance. Sufficient ablation experimental studies were conducted to assess the effectiveness of each component. The experimental results validate the effectiveness and scalability of our framework for population-based disease prediction, even under challenging conditions of data noise and sparsity. The implementation code is publicly available at: https://github.com/jiachengpan98/GDA-GCN.
{"title":"Graph Data Augmentation for Graph Convolutional Networks Learning in Robust Mental Disorder Prediction with Limited and Noisy Labels.","authors":"Jiacheng Pan, Yihong Dong, Daogen Jiang, Longyang Wang","doi":"10.1177/15578666251371079","DOIUrl":"10.1177/15578666251371079","url":null,"abstract":"<p><p>Graph neural networks have shown impressive performance in a variety of biomedical application tasks due to their powerful graph representation capabilities. Although GNN has achieved great success, the data noise and data scarcity problems commonly faced in real psychiatric disease prediction scenarios may affect the training and prediction of graph learning models. At present, there is no relevant work to obtain a reasonable solution. Data augmentation, which allows limited data to produce value equivalent to more data without substantially increasing the data, is considered a practical approach to addressing the problem of noisy data and data scarcity. In this work, we propose a method based on graph data augmentation for solving the problem of noisy data and data scarcity in mental illness prediction. To mitigate the negative effects of label noise, we use edge predictors to optimize the graph topology, enhance links to nodes with high similarity, remove erroneous noisy edges, and enhance the model robustness by adding adversarial perturbations in the feature space. In addition, a confident self-checking mechanism allows accurate pseudolabeling to be obtained, providing more supervision for the model training phase and further reducing the effect of label noise. Extensive experiments on two multimodal real mental illness datasets show that the proposed approach has better performance. Sufficient ablation experimental studies were conducted to assess the effectiveness of each component. The experimental results validate the effectiveness and scalability of our framework for population-based disease prediction, even under challenging conditions of data noise and sparsity. The implementation code is publicly available at: https://github.com/jiachengpan98/GDA-GCN.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1171-1189"},"PeriodicalIF":1.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-05DOI: 10.1177/15578666251370766
Tien-Wen Lee
The general linear model (GLM) has been widely used in research, where the error term has been treated as noise. However, compelling evidence suggests that in biological systems, the target variables may possess their innate variances. A modified GLM was proposed to explicitly model biological variance and nonbiological noise. Using the expectation and maximization (EM) scheme can distinguish biological variance from noise, termed EMSEV (EM for separating variances). The performance of EMSEV was evaluated by varying noise levels, dimensions of the design matrix, and covariance structures of the target variables. The deviation between EMSEV outputs and the predefined distribution parameters increased with noise level. With a proper initial guess, when the noise magnitude and the variance of the target variables were similar, there were deviations of 3% and 10%-16% in the estimated mean and covariance of the target variables, respectively, along with a 1.7% deviation in noise estimation. EMSEV appears promising for distinguishing signal variance from noise in biological systems. The potential applications and implications in biological science and statistical inference are discussed.
一般线性模型(GLM)在研究中得到了广泛的应用,其中误差项被当作噪声处理。然而,令人信服的证据表明,在生物系统中,目标变量可能具有其固有的方差。提出了一种改进的GLM来明确地模拟生物方差和非生物噪声。利用期望和最大化(EM)方案可以从噪声中区分生物方差,称为EMSEV (EM for separation variances)。EMSEV的性能通过不同的噪声水平、设计矩阵的维度和目标变量的协方差结构来评估。EMSEV输出与预定义分布参数之间的偏差随着噪声水平的增加而增大。通过适当的初始猜测,当目标变量的噪声量级和方差相似时,目标变量的估计均值和协方差分别存在3%和10%-16%的偏差,噪声估计偏差为1.7%。EMSEV似乎有望在生物系统中区分信号方差和噪声。讨论了其在生物科学和统计推断中的潜在应用和意义。
{"title":"Separating Biological Variance from Noise by Applying Expectation-Maximization Algorithm to Modified General Linear Model.","authors":"Tien-Wen Lee","doi":"10.1177/15578666251370766","DOIUrl":"10.1177/15578666251370766","url":null,"abstract":"<p><p>The general linear model (GLM) has been widely used in research, where the error term has been treated as noise. However, compelling evidence suggests that in biological systems, the target variables may possess their innate variances. A modified GLM was proposed to explicitly model biological variance and nonbiological noise. Using the expectation and maximization (EM) scheme can distinguish biological variance from noise, termed EMSEV (EM for separating variances). The performance of EMSEV was evaluated by varying noise levels, dimensions of the design matrix, and covariance structures of the target variables. The deviation between EMSEV outputs and the predefined distribution parameters increased with noise level. With a proper initial guess, when the noise magnitude and the variance of the target variables were similar, there were deviations of 3% and 10%-16% in the estimated mean and covariance of the target variables, respectively, along with a 1.7% deviation in noise estimation. EMSEV appears promising for distinguishing signal variance from noise in biological systems. The potential applications and implications in biological science and statistical inference are discussed.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1121-1130"},"PeriodicalIF":1.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145006142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-09-23DOI: 10.1177/15578666251380235
Dan Huang, Hyerim Park, Hokeun Sun
DNA methylation is a representative epigenetic change that occurs in our body and plays an essential role in regulating gene expression as well as in cancer progression. Identification of differentially methylated genes between two biological conditions has been popularly studied in epigenetic association studies. However, most of statistical methods aim to detect differences in mean methylation levels between two conditions. So, they are limited to identify differences in methylation variances which have been recently observed in cancer research. Moreover, they often fail to identify genes containing both differentially methylated CpG sites and neutral sites due to weak group association signals. In this article, we propose a new statistical method based on a group-penalized exponential tilt model that essentially combines an exponential tilt model and group lasso, regrading each gene as a group of multiple CpG sites. The proposed method is able to detect differentially methylated genes, capturing both mean and variance association signals. In our extensive simulation study, we demonstrated that the proposed method has superior selection performance, compared with the existing statistical methods developed for detection of differentially methylated genes. We also applied it to 450K DNA methylation data of The Cancer Genome Atlas Breast Invasive Carcinoma Collection. We were able to identify potentially cancer-related genes.
{"title":"Group-Penalized Exponential Tilt Model for Identification of Differentially Methylated Genes in Epigenetic Association Studies.","authors":"Dan Huang, Hyerim Park, Hokeun Sun","doi":"10.1177/15578666251380235","DOIUrl":"10.1177/15578666251380235","url":null,"abstract":"<p><p>DNA methylation is a representative epigenetic change that occurs in our body and plays an essential role in regulating gene expression as well as in cancer progression. Identification of differentially methylated genes between two biological conditions has been popularly studied in epigenetic association studies. However, most of statistical methods aim to detect differences in mean methylation levels between two conditions. So, they are limited to identify differences in methylation variances which have been recently observed in cancer research. Moreover, they often fail to identify genes containing both differentially methylated CpG sites and neutral sites due to weak group association signals. In this article, we propose a new statistical method based on a group-penalized exponential tilt model that essentially combines an exponential tilt model and group lasso, regrading each gene as a group of multiple CpG sites. The proposed method is able to detect differentially methylated genes, capturing both mean and variance association signals. In our extensive simulation study, we demonstrated that the proposed method has superior selection performance, compared with the existing statistical methods developed for detection of differentially methylated genes. We also applied it to 450K DNA methylation data of The Cancer Genome Atlas Breast Invasive Carcinoma Collection. We were able to identify potentially cancer-related genes.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1131-1146"},"PeriodicalIF":1.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145131110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-03DOI: 10.1177/15578666251380233
Joung Min Choi, Liqing Zhang
Accurate breast cancer subtype prediction is critical for precise diagnosis, treatment planning, and prognosis evaluation. Recent studies highlight the important role of epigenetic modifications in breast tumor, especially the potential of abnormal DNA methylation patterns as markers for distinct subtypes. However, developing a reliable model for subtype prediction based on DNA methylation profiles is challenging due to the scarcity of annotated dataset. This work proposes BCtypeFinder, a breast cancer subtype prediction framework that utilizes a domain adaptation network combined with semi-supervised learning to address batch effects. Our model leverages both labeled and unlabeled DNA methylation data to extract domain-invariant features while aligning subtype distributions across various datasets. BCtypeFinder outperforms current methods, showcasing superior classification performance across multiple test cases. Furthermore, we explored the effects of batch correction in BCtypeFinder, demonstrating its ability to remove batch-specific variations among patients of the same subtype, thus improving the robustness of the classifier. BCtypeFinder is publicly available at https://github.com/joungmin-choi/BCtypeFinder.
{"title":"BCtypeFinder: A Semi-Supervised Model with Domain Adaptation for Breast Cancer Subtyping Using DNA Methylation Profiles.","authors":"Joung Min Choi, Liqing Zhang","doi":"10.1177/15578666251380233","DOIUrl":"10.1177/15578666251380233","url":null,"abstract":"<p><p>Accurate breast cancer subtype prediction is critical for precise diagnosis, treatment planning, and prognosis evaluation. Recent studies highlight the important role of epigenetic modifications in breast tumor, especially the potential of abnormal DNA methylation patterns as markers for distinct subtypes. However, developing a reliable model for subtype prediction based on DNA methylation profiles is challenging due to the scarcity of annotated dataset. This work proposes BCtypeFinder, a breast cancer subtype prediction framework that utilizes a domain adaptation network combined with semi-supervised learning to address batch effects. Our model leverages both labeled and unlabeled DNA methylation data to extract domain-invariant features while aligning subtype distributions across various datasets. BCtypeFinder outperforms current methods, showcasing superior classification performance across multiple test cases. Furthermore, we explored the effects of batch correction in BCtypeFinder, demonstrating its ability to remove batch-specific variations among patients of the same subtype, thus improving the robustness of the classifier. BCtypeFinder is publicly available at https://github.com/joungmin-choi/BCtypeFinder.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1160-1170"},"PeriodicalIF":1.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145225388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-07-02DOI: 10.1089/cmb.2025.0093
Michael Fuchs, Mike Steel
Motivated by applications in medical bioinformatics, Khayatian et al. (2024) introduced a family of metrics on Cayley trees [the k-Robinson-Foulds (RF) distance, for . . . ] and explored their distribution on pairs of random Cayley trees via simulations. In this article, we investigate this distribution mathematically and derive exact asymptotic descriptions of the distribution of the k-RF metric for the extreme values and , as n becomes large. We show that a linear transform of the 0-RF metric converges to a Poisson distribution (with mean 2), whereas a similar transform for the ()-RF metric leads to a normal distribution (with mean ). These results (together with the case which behaves quite differently and ) shed light on the earlier simulation results and the predictions made concerning them.
{"title":"The Asymptotic Distribution of the <i>k</i>-Robinson-Foulds Dissimilarity Measure on Labeled Trees.","authors":"Michael Fuchs, Mike Steel","doi":"10.1089/cmb.2025.0093","DOIUrl":"10.1089/cmb.2025.0093","url":null,"abstract":"<p><p>Motivated by applications in medical bioinformatics, Khayatian et al. (2024) introduced a family of metrics on Cayley trees [the <i>k</i>-Robinson-Foulds (RF) distance, for <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>0</mn><mo>,</mo></mrow></math> . . . <math><mrow><mo>,</mo><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>] and explored their distribution on pairs of random Cayley trees via simulations. In this article, we investigate this distribution mathematically and derive exact asymptotic descriptions of the distribution of the <i>k</i>-RF metric for the extreme values <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>0</mn></mrow></math> and <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>, as <i>n</i> becomes large. We show that a linear transform of the 0-RF metric converges to a Poisson distribution (with mean 2), whereas a similar transform for the (<math><mrow><mi>n</mi><mo>-</mo><mn>2</mn></mrow></math>)-RF metric leads to a normal distribution (with mean <math><mrow><mstyle><mo>∼</mo></mstyle><mo> </mo><mi>n</mi><mrow><msup><mrow><mi>e</mi></mrow><mrow><mo>-</mo><mn>2</mn></mrow></msup></mrow></mrow></math>). These results (together with the case <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mn>1</mn></mrow></math> which behaves quite differently and <math><mrow><mi>k</mi><mo> </mo><mo>=</mo><mo> </mo><mi>n</mi><mo>-</mo><mn>3</mn></mrow></math>) shed light on the earlier simulation results and the predictions made concerning them.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1060-1073"},"PeriodicalIF":1.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144540425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-05DOI: 10.1089/cmb.2024.0849
Janet B Jones-Oliveira, Hans-Joseph B Oliveira, Joseph S Oliveira, David A Dixon
Improved computational methods to analyze the mathematical structure and function of biochemical networks are needed when the biomolecular connectivity is known but when a complete set of the equilibrium and rate constants may not be available. We use Petri nets, which are equivalently bipartite digraphs, to analyze the rule-based flow of information through the network. We present several computational improvements to Petri net modeling as an aid to improve this approach, previously limited by the combinatorics of network size and complexity. The generation of Petri nets using equations for three elemental stencils (molecular reaction, synthesis complex formation, and decomposition complex formation) has been automated. A set of finite probability measures is defined in terms of a partition information entropy, where the complete listing of unique minimal cycles (UMCs) of the Petri net provides the natural partitioning. This enables the ranking of the UMC listing that covers all possible information flows in the reaction network; the information entropy measure enables the identification of which UMCs are more significant than others. In terms of the information entropy, forward cycles are less surprising and carry less information entropy, whereas backward cycles carry more information entropy and serve as regulators by providing feedback to control the network. As the systems analyzed increase in size and complexity, the automatic rank ordering of the UMCs provides a mechanism to highlight the globally most important information without the need to make local simplifying modeling choices. The information entropy metric is also used to compute source-to-sink information costs and is related to knockout analyses. The hybrid Petri net approach shows the most important species and where it is easiest to disrupt or otherwise affect the network. As exemplar, the enhanced methodology is applied to a model of the initial subnetwork in the EGFR network.
{"title":"Using Partition Information Entropy to Computationally Rank Order Critical Subreactions in a Petri Net Model of a Biochemical Signaling Network.","authors":"Janet B Jones-Oliveira, Hans-Joseph B Oliveira, Joseph S Oliveira, David A Dixon","doi":"10.1089/cmb.2024.0849","DOIUrl":"10.1089/cmb.2024.0849","url":null,"abstract":"<p><p>Improved computational methods to analyze the mathematical structure and function of biochemical networks are needed when the biomolecular connectivity is known but when a complete set of the equilibrium and rate constants may not be available. We use Petri nets, which are equivalently bipartite digraphs, to analyze the rule-based flow of information through the network. We present several computational improvements to Petri net modeling as an aid to improve this approach, previously limited by the combinatorics of network size and complexity. The generation of Petri nets using equations for three elemental stencils (molecular reaction, synthesis complex formation, and decomposition complex formation) has been automated. A set of finite probability measures is defined in terms of a partition information entropy, where the complete listing of unique minimal cycles (UMCs) of the Petri net provides the natural partitioning. This enables the ranking of the UMC listing that covers all possible information flows in the reaction network; the information entropy measure enables the identification of which UMCs are more significant than others. In terms of the information entropy, forward cycles are less surprising and carry less information entropy, whereas backward cycles carry more information entropy and serve as regulators by providing feedback to control the network. As the systems analyzed increase in size and complexity, the automatic rank ordering of the UMCs provides a mechanism to highlight the globally most important information without the need to make local simplifying modeling choices. The information entropy metric is also used to compute source-to-sink information costs and is related to knockout analyses. The hybrid Petri net approach shows the most important species and where it is easiest to disrupt or otherwise affect the network. As exemplar, the enhanced methodology is applied to a model of the initial subnetwork in the EGFR network.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1003-1040"},"PeriodicalIF":1.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-06-06DOI: 10.1089/cmb.2024.0721
Zhibin Pu, Shufei Ge
Imaging genetics aims to uncover the hidden relationship between imaging quantitative traits (QTs) and genetic markers [e.g., single nucleotide polymorphism (SNP)] and brings valuable insights into the pathogenesis of complex diseases, such as cancers and cognitive disorders (e.g., Alzheimer's disease). However, most linear models in imaging genetics did not explicitly model the inner relationship among QTs, which might miss some potential efficiency gains from information borrowing across brain regions. In this work, we developed a novel Bayesian regression framework for identifying significant associations between QTs and genetic markers while explicitly modeling spatial dependency between QTs, with the main contributions as follows. First, we developed a spatial-correlated multitask linear mixed-effects model to account for dependencies between QTs. We incorporated a population-level mixed-effects term into the model, taking full advantage of the dependent structure of brain imaging-derived QTs. Second, we implemented the model in the Bayesian framework and derived a Markov chain Monte Carlo (MCMC) algorithm to achieve the model inference. Further, we incorporated the MCMC samples with the Cauchy combination test to examine the association between SNPs and QTs, which avoided computationally intractable multitest issues. The simulation studies indicated improved power of our proposed model compared with classical models where inner dependencies of QTs were not modeled. We also applied the new spatial model to an imaging dataset obtained from the Alzheimer's Disease Neuroimaging Initiative database (https://adni.loni.usc.edu). The implementation of our method is available at https://github.com/ZhibinPU/spatialmultitasklmm.git.
{"title":"A Spatial-Correlated Multitask Linear Mixed-Effects Model for Imaging Genetics.","authors":"Zhibin Pu, Shufei Ge","doi":"10.1089/cmb.2024.0721","DOIUrl":"10.1089/cmb.2024.0721","url":null,"abstract":"<p><p>Imaging genetics aims to uncover the hidden relationship between imaging quantitative traits (QTs) and genetic markers [e.g., single nucleotide polymorphism (SNP)] and brings valuable insights into the pathogenesis of complex diseases, such as cancers and cognitive disorders (e.g., Alzheimer's disease). However, most linear models in imaging genetics did not explicitly model the inner relationship among QTs, which might miss some potential efficiency gains from information borrowing across brain regions. In this work, we developed a novel Bayesian regression framework for identifying significant associations between QTs and genetic markers while explicitly modeling spatial dependency between QTs, with the main contributions as follows. First, we developed a spatial-correlated multitask linear mixed-effects model to account for dependencies between QTs. We incorporated a population-level mixed-effects term into the model, taking full advantage of the dependent structure of brain imaging-derived QTs. Second, we implemented the model in the Bayesian framework and derived a Markov chain Monte Carlo (MCMC) algorithm to achieve the model inference. Further, we incorporated the MCMC samples with the Cauchy combination test to examine the association between SNPs and QTs, which avoided computationally intractable multitest issues. The simulation studies indicated improved power of our proposed model compared with classical models where inner dependencies of QTs were not modeled. We also applied the new spatial model to an imaging dataset obtained from the Alzheimer's Disease Neuroimaging Initiative database (https://adni.loni.usc.edu). The implementation of our method is available at https://github.com/ZhibinPU/spatialmultitasklmm.git.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"987-1002"},"PeriodicalIF":1.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144234309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-01Epub Date: 2025-08-14DOI: 10.1177/15578666251363380
Ling-Yu Wu, Yan Shao, Yang Gao, Xun-Jie Li, Xing-Xing Kang, Guo-Ping Zhao, Peng-Bo Wen
Radiotherapy (RT) plays a crucial role in tumor treatment, but reliable prognostic biomarkers for patient survival remain limited. To address this gap, we constructed a comprehensive database, RadSpliceDB (https://radsplicedb.com.cn), focusing on splicing isoforms, aiming to identify potential prognostic markers from this perspective. We integrated transcriptome data from patients treated with RT across 24 tumor types in The Cancer Genome Atlas to identify splicing isoforms associated with RT prognosis. We constructed effective prognostic models to validate the potential of the selected isoforms as reliable biomarkers. The database provides comprehensive annotations and functional analyses of these isoforms. RadSpliceDB contains a total of 49,587 splicing events associated with RT prognosis, covering 180,149 splicing isoforms and encompassing various common splicing patterns, such as exon skipping, 3' splice site, and 5' splice site. We further evaluated the potential of these splicing isoforms as tumor antigens using VaxiJen v2.0, identifying several candidates with high antigenicity scores. This database not only provides systematic annotations of splicing isoforms to elucidate the mechanisms of RT response but also serves as a valuable resource for identifying potential biomarkers for personalized RT. RadSpliceDB provides essential data support for optimizing RT strategies in cancer treatment.
{"title":"RadSpliceDB: A Comprehensive Database of Radiotherapy Prognosis-Related Splicing Isoforms.","authors":"Ling-Yu Wu, Yan Shao, Yang Gao, Xun-Jie Li, Xing-Xing Kang, Guo-Ping Zhao, Peng-Bo Wen","doi":"10.1177/15578666251363380","DOIUrl":"10.1177/15578666251363380","url":null,"abstract":"<p><p>Radiotherapy (RT) plays a crucial role in tumor treatment, but reliable prognostic biomarkers for patient survival remain limited. To address this gap, we constructed a comprehensive database, RadSpliceDB (https://radsplicedb.com.cn), focusing on splicing isoforms, aiming to identify potential prognostic markers from this perspective. We integrated transcriptome data from patients treated with RT across 24 tumor types in The Cancer Genome Atlas to identify splicing isoforms associated with RT prognosis. We constructed effective prognostic models to validate the potential of the selected isoforms as reliable biomarkers. The database provides comprehensive annotations and functional analyses of these isoforms. RadSpliceDB contains a total of 49,587 splicing events associated with RT prognosis, covering 180,149 splicing isoforms and encompassing various common splicing patterns, such as exon skipping, 3' splice site, and 5' splice site. We further evaluated the potential of these splicing isoforms as tumor antigens using VaxiJen v2.0, identifying several candidates with high antigenicity scores. This database not only provides systematic annotations of splicing isoforms to elucidate the mechanisms of RT response but also serves as a valuable resource for identifying potential biomarkers for personalized RT. RadSpliceDB provides essential data support for optimizing RT strategies in cancer treatment.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"1090-1099"},"PeriodicalIF":1.6,"publicationDate":"2025-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144855397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}