Dose-response curves of immunostaining experiments are commonly described as Langmuir isotherm. However, for common immunostaining protocols the equilibrium assumption is violated and the dose-response behavior is governed by antibody accumulation. If bound antibodies are replenished, i.e. the concentration of unbound antibodies is constant, the accumulation model can easily be solved analytically. Yet, in many experimental setups the overall amount of antibodies is fixed such that antibody binding reduces the concentration of free antibodies. Solving the accumulation model for this case is more difficult and seems to be impossible if the epitopes are heterogeneous. In this paper, we solve the accumulation model with antibody depletion analytically for the simple case of identical epitopes. We derive inequalities between the depletion-free accumulation model, the accumulation model and the Langmuir isotherm. This allows us to characterize the antibody depletion effect. We generalize the problem to heterogeneous epitopes, where we prove the existence and uniqueness of a solution that behaves as expected by the experimental setting. With these properties we derive bounds for the resulting multi-epitope-class accumulation model and investigate the depletion effect in the case of heterogeneous epitopes.
{"title":"Modelling the effect of antibody depletion on dose-response behavior for common immunostaining protocols","authors":"Dominik Tschimmel, Steffen Waldherr, Tim Hucho","doi":"arxiv-2409.06895","DOIUrl":"https://doi.org/arxiv-2409.06895","url":null,"abstract":"Dose-response curves of immunostaining experiments are commonly described as\u0000Langmuir isotherm. However, for common immunostaining protocols the equilibrium\u0000assumption is violated and the dose-response behavior is governed by antibody\u0000accumulation. If bound antibodies are replenished, i.e. the concentration of\u0000unbound antibodies is constant, the accumulation model can easily be solved\u0000analytically. Yet, in many experimental setups the overall amount of antibodies\u0000is fixed such that antibody binding reduces the concentration of free\u0000antibodies. Solving the accumulation model for this case is more difficult and\u0000seems to be impossible if the epitopes are heterogeneous. In this paper, we\u0000solve the accumulation model with antibody depletion analytically for the\u0000simple case of identical epitopes. We derive inequalities between the\u0000depletion-free accumulation model, the accumulation model and the Langmuir\u0000isotherm. This allows us to characterize the antibody depletion effect. We\u0000generalize the problem to heterogeneous epitopes, where we prove the existence\u0000and uniqueness of a solution that behaves as expected by the experimental\u0000setting. With these properties we derive bounds for the resulting\u0000multi-epitope-class accumulation model and investigate the depletion effect in\u0000the case of heterogeneous epitopes.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Ye, Zaixiang Zheng, Dongyu Xue, Yuning Shen, Lihao Wang, Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu
Recent years have witnessed a surge in the development of protein foundation models, significantly improving performance in protein prediction and generative tasks ranging from 3D structure prediction and protein design to conformational dynamics. However, the capabilities and limitations associated with these models remain poorly understood due to the absence of a unified evaluation framework. To fill this gap, we introduce ProteinBench, a holistic evaluation framework designed to enhance the transparency of protein foundation models. Our approach consists of three key components: (i) A taxonomic classification of tasks that broadly encompass the main challenges in the protein domain, based on the relationships between different protein modalities; (ii) A multi-metric evaluation approach that assesses performance across four key dimensions: quality, novelty, diversity, and robustness; and (iii) In-depth analyses from various user objectives, providing a holistic view of model performance. Our comprehensive evaluation of protein foundation models reveals several key findings that shed light on their current capabilities and limitations. To promote transparency and facilitate further research, we release the evaluation dataset, code, and a public leaderboard publicly for further analysis and a general modular toolkit. We intend for ProteinBench to be a living benchmark for establishing a standardized, in-depth evaluation framework for protein foundation models, driving their development and application while fostering collaboration within the field.
{"title":"ProteinBench: A Holistic Evaluation of Protein Foundation Models","authors":"Fei Ye, Zaixiang Zheng, Dongyu Xue, Yuning Shen, Lihao Wang, Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu","doi":"arxiv-2409.06744","DOIUrl":"https://doi.org/arxiv-2409.06744","url":null,"abstract":"Recent years have witnessed a surge in the development of protein foundation\u0000models, significantly improving performance in protein prediction and\u0000generative tasks ranging from 3D structure prediction and protein design to\u0000conformational dynamics. However, the capabilities and limitations associated\u0000with these models remain poorly understood due to the absence of a unified\u0000evaluation framework. To fill this gap, we introduce ProteinBench, a holistic\u0000evaluation framework designed to enhance the transparency of protein foundation\u0000models. Our approach consists of three key components: (i) A taxonomic\u0000classification of tasks that broadly encompass the main challenges in the\u0000protein domain, based on the relationships between different protein\u0000modalities; (ii) A multi-metric evaluation approach that assesses performance\u0000across four key dimensions: quality, novelty, diversity, and robustness; and\u0000(iii) In-depth analyses from various user objectives, providing a holistic view\u0000of model performance. Our comprehensive evaluation of protein foundation models\u0000reveals several key findings that shed light on their current capabilities and\u0000limitations. To promote transparency and facilitate further research, we\u0000release the evaluation dataset, code, and a public leaderboard publicly for\u0000further analysis and a general modular toolkit. We intend for ProteinBench to\u0000be a living benchmark for establishing a standardized, in-depth evaluation\u0000framework for protein foundation models, driving their development and\u0000application while fostering collaboration within the field.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since the advent of CRISPR-Cas9, a groundbreaking gene-editing technology that enables precise genomic modifications via a short RNA guide sequence, there has been a marked increase in the accessibility and application of this technology across various fields. The success of CRISPR-Cas9 has spurred further investment and led to the discovery of additional CRISPR systems, including CRISPR-Cas13. Distinct from Cas9, which targets DNA, Cas13 targets RNA, offering unique advantages for gene modulation. We focus on Cas13d, a variant known for its collateral activity where it non-specifically cleaves adjacent RNA molecules upon activation, a feature critical to its function. We introduce DeepFM-Crispr, a novel deep learning model developed to predict the on-target efficiency and evaluate the off-target effects of Cas13d. This model harnesses a large language model to generate comprehensive representations rich in evolutionary and structural data, thereby enhancing predictions of RNA secondary structures and overall sgRNA efficacy. A transformer-based architecture processes these inputs to produce a predictive efficacy score. Comparative experiments show that DeepFM-Crispr not only surpasses traditional models but also outperforms recent state-of-the-art deep learning methods in terms of prediction accuracy and reliability.
{"title":"DeepFM-Crispr: Prediction of CRISPR On-Target Effects via Deep Learning","authors":"Condy Bao, Fuxiao Liu","doi":"arxiv-2409.05938","DOIUrl":"https://doi.org/arxiv-2409.05938","url":null,"abstract":"Since the advent of CRISPR-Cas9, a groundbreaking gene-editing technology\u0000that enables precise genomic modifications via a short RNA guide sequence,\u0000there has been a marked increase in the accessibility and application of this\u0000technology across various fields. The success of CRISPR-Cas9 has spurred\u0000further investment and led to the discovery of additional CRISPR systems,\u0000including CRISPR-Cas13. Distinct from Cas9, which targets DNA, Cas13 targets\u0000RNA, offering unique advantages for gene modulation. We focus on Cas13d, a\u0000variant known for its collateral activity where it non-specifically cleaves\u0000adjacent RNA molecules upon activation, a feature critical to its function. We\u0000introduce DeepFM-Crispr, a novel deep learning model developed to predict the\u0000on-target efficiency and evaluate the off-target effects of Cas13d. This model\u0000harnesses a large language model to generate comprehensive representations rich\u0000in evolutionary and structural data, thereby enhancing predictions of RNA\u0000secondary structures and overall sgRNA efficacy. A transformer-based\u0000architecture processes these inputs to produce a predictive efficacy score.\u0000Comparative experiments show that DeepFM-Crispr not only surpasses traditional\u0000models but also outperforms recent state-of-the-art deep learning methods in\u0000terms of prediction accuracy and reliability.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Felix Risbro Hjerrild, Shan Shan, Doug M Boyer, Ingrid Daubechies
A key challenge in evolutionary biology is to develop robust computational tools that can accurately analyze shape variations across diverse anatomical structures. The Dirichlet Normal Energy (DNE) is a shape complexity metric that addresses this by summarizing the local curvature of surfaces, particularly aiding the analytical studies and providing insights into evolutionary and functional adaptations. Building on the DNE concept, we introduce a Python-based implementation, designed to compute both the original DNE and a newly developed sign-oriented DNE metric. This Python package includes a user-friendly command line interface (CLI) and built-in visualization tools to facilitate the interpretation of the surface's local curvature properties. The addition of signDNE, which integrates the convexity and concavity of surfaces, enhances the tool's ability to identify fine-scale features across a broad range of biological structures. We validate the robustness of our method by comparing its performance with standard implementations on a dataset of triangular meshes with varying discrete representations. Additionally, we demonstrate its potential applications through visualization of the local curvature field (i.e., local curvature value over the surface) on various biological specimens, showing how it effectively captures complex biological features. In this paper, we offer a brief overview of the Python CLI for ease of use. Alongside the Python implementation, we have also updated the original MATLAB package to ensure consistent and accurate DNE computation across platforms. These improvements enhance the tool's flexibility, reduce sensitivity to sampling density and mesh quality, and support a more accurate interpretation of biological surface topography.
进化生物学的一个关键挑战是开发强大的计算工具,以准确分析不同解剖结构的形状变化。Dirichlet Normal Energy(DNE)是一种形状复杂性度量,它通过总结表面的局部曲率来解决这一问题,尤其有助于分析研究,并提供对进化和功能适应性的见解。在 DNE 概念的基础上,我们引入了基于 Python 的实现,旨在计算原始 DNE 和新开发的面向符号的 DNE 指标。这个 Python 软件包包括用户友好的命令行界面(CLI)和内置可视化工具,以方便解释曲面的局部曲率属性。添加的符号 DNE 综合了表面的凸度和凹度,增强了该工具在广泛的生物结构中识别精细尺度特征的能力。我们通过比较我们的方法与标准实现方法在不同离散表示的三角形网格数据集上的性能,验证了我们方法的稳健性。此外,我们还通过对各种生物标本上的局部曲率场(即表面上的局部曲率值)进行可视化,展示了该方法如何有效捕捉复杂的生物特征,从而证明了它的潜在应用价值。在本文中,我们简要介绍了 Python CLI,以方便使用。除了 Python 实现之外,我们还更新了原始的 MATLAB 软件包,以确保跨平台 DNE 计算的一致性和准确性。这些改进增强了工具的灵活性,降低了对采样密度和网格质量的敏感性,并支持更准确地解释生物表面地形。
{"title":"signDNE: A python package for ariaDNE and its sign-oriented extension","authors":"Felix Risbro Hjerrild, Shan Shan, Doug M Boyer, Ingrid Daubechies","doi":"arxiv-2409.05549","DOIUrl":"https://doi.org/arxiv-2409.05549","url":null,"abstract":"A key challenge in evolutionary biology is to develop robust computational\u0000tools that can accurately analyze shape variations across diverse anatomical\u0000structures. The Dirichlet Normal Energy (DNE) is a shape complexity metric that\u0000addresses this by summarizing the local curvature of surfaces, particularly\u0000aiding the analytical studies and providing insights into evolutionary and\u0000functional adaptations. Building on the DNE concept, we introduce a\u0000Python-based implementation, designed to compute both the original DNE and a\u0000newly developed sign-oriented DNE metric. This Python package includes a\u0000user-friendly command line interface (CLI) and built-in visualization tools to\u0000facilitate the interpretation of the surface's local curvature properties. The\u0000addition of signDNE, which integrates the convexity and concavity of surfaces,\u0000enhances the tool's ability to identify fine-scale features across a broad\u0000range of biological structures. We validate the robustness of our method by\u0000comparing its performance with standard implementations on a dataset of\u0000triangular meshes with varying discrete representations. Additionally, we\u0000demonstrate its potential applications through visualization of the local\u0000curvature field (i.e., local curvature value over the surface) on various\u0000biological specimens, showing how it effectively captures complex biological\u0000features. In this paper, we offer a brief overview of the Python CLI for ease\u0000of use. Alongside the Python implementation, we have also updated the original\u0000MATLAB package to ensure consistent and accurate DNE computation across\u0000platforms. These improvements enhance the tool's flexibility, reduce\u0000sensitivity to sampling density and mesh quality, and support a more accurate\u0000interpretation of biological surface topography.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang
Predicting cellular responses to various perturbations is a critical focus in drug discovery and personalized therapeutics, with deep learning models playing a significant role in this endeavor. Single-cell datasets contain technical artifacts that may hinder the predictability of such models, which poses quality control issues highly regarded in this area. To address this, we propose CRADLE-VAE, a causal generative framework tailored for single-cell gene perturbation modeling, enhanced with counterfactual reasoning-based artifact disentanglement. Throughout training, CRADLE-VAE models the underlying latent distribution of technical artifacts and perturbation effects present in single-cell datasets. It employs counterfactual reasoning to effectively disentangle such artifacts by modulating the latent basal spaces and learns robust features for generating cellular response data with improved quality. Experimental results demonstrate that this approach improves not only treatment effect estimation performance but also generative quality as well. The CRADLE-VAE codebase is publicly available at https://github.com/dmis-lab/CRADLE-VAE.
{"title":"CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement","authors":"Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang","doi":"arxiv-2409.05484","DOIUrl":"https://doi.org/arxiv-2409.05484","url":null,"abstract":"Predicting cellular responses to various perturbations is a critical focus in\u0000drug discovery and personalized therapeutics, with deep learning models playing\u0000a significant role in this endeavor. Single-cell datasets contain technical\u0000artifacts that may hinder the predictability of such models, which poses\u0000quality control issues highly regarded in this area. To address this, we\u0000propose CRADLE-VAE, a causal generative framework tailored for single-cell gene\u0000perturbation modeling, enhanced with counterfactual reasoning-based artifact\u0000disentanglement. Throughout training, CRADLE-VAE models the underlying latent\u0000distribution of technical artifacts and perturbation effects present in\u0000single-cell datasets. It employs counterfactual reasoning to effectively\u0000disentangle such artifacts by modulating the latent basal spaces and learns\u0000robust features for generating cellular response data with improved quality.\u0000Experimental results demonstrate that this approach improves not only treatment\u0000effect estimation performance but also generative quality as well. The\u0000CRADLE-VAE codebase is publicly available at\u0000https://github.com/dmis-lab/CRADLE-VAE.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daria Stepanova, Meritxell Brunet Guasch, Helen M. Byrne, Tomás Alarcón
Epigenetics plays a key role in cellular differentiation and maintaining cell identity, enabling cells to regulate their genetic activity without altering the DNA sequence. Epigenetic regulation occurs within the context of hierarchically folded chromatin, yet the interplay between the dynamics of epigenetic modifications and chromatin architecture remains poorly understood. In addition, it remains unclear what mechanisms drive the formation of rugged epigenetic patterns, characterised by alternating genomic regions enriched in activating and repressive marks. In this study, we focus on post-translational modifications of histone H3 tails, particularly H3K27me3, H3K4me3, and H3K27ac. We introduce a mesoscopic stochastic model that incorporates chromatin architecture and competition of histone-modifying enzymes into the dynamics of epigenetic modifications in small genomic loci comprising several nucleosomes. Our approach enables us to investigate the mechanisms by which epigenetic patterns form on larger scales of chromatin organisation, such as loops and domains. Through bifurcation analysis and stochastic simulations, we demonstrate that the model can reproduce uniform chromatin states (open, closed, and bivalent) and generate previously unexplored rugged profiles. Our results suggest that enzyme competition and chromatin conformations with high-frequency interactions between distant genomic loci can drive the emergence of rugged epigenetic landscapes. Additionally, we hypothesise that bivalent chromatin can act as an intermediate state, facilitating transitions between uniform and rugged landscapes. This work offers a powerful mathematical framework for understanding the dynamic interactions between chromatin architecture and epigenetic regulation, providing new insights into the formation of complex epigenetic patterns.
{"title":"Understanding how chromatin folding and enzyme competition affect rugged epigenetic landscapes","authors":"Daria Stepanova, Meritxell Brunet Guasch, Helen M. Byrne, Tomás Alarcón","doi":"arxiv-2409.06116","DOIUrl":"https://doi.org/arxiv-2409.06116","url":null,"abstract":"Epigenetics plays a key role in cellular differentiation and maintaining cell\u0000identity, enabling cells to regulate their genetic activity without altering\u0000the DNA sequence. Epigenetic regulation occurs within the context of\u0000hierarchically folded chromatin, yet the interplay between the dynamics of\u0000epigenetic modifications and chromatin architecture remains poorly understood.\u0000In addition, it remains unclear what mechanisms drive the formation of rugged\u0000epigenetic patterns, characterised by alternating genomic regions enriched in\u0000activating and repressive marks. In this study, we focus on post-translational\u0000modifications of histone H3 tails, particularly H3K27me3, H3K4me3, and H3K27ac.\u0000We introduce a mesoscopic stochastic model that incorporates chromatin\u0000architecture and competition of histone-modifying enzymes into the dynamics of\u0000epigenetic modifications in small genomic loci comprising several nucleosomes.\u0000Our approach enables us to investigate the mechanisms by which epigenetic\u0000patterns form on larger scales of chromatin organisation, such as loops and\u0000domains. Through bifurcation analysis and stochastic simulations, we\u0000demonstrate that the model can reproduce uniform chromatin states (open,\u0000closed, and bivalent) and generate previously unexplored rugged profiles. Our\u0000results suggest that enzyme competition and chromatin conformations with\u0000high-frequency interactions between distant genomic loci can drive the\u0000emergence of rugged epigenetic landscapes. Additionally, we hypothesise that\u0000bivalent chromatin can act as an intermediate state, facilitating transitions\u0000between uniform and rugged landscapes. This work offers a powerful mathematical\u0000framework for understanding the dynamic interactions between chromatin\u0000architecture and epigenetic regulation, providing new insights into the\u0000formation of complex epigenetic patterns.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Malek Senoussi, Thierry Artières, Paul Villoutreix
One of the major challenges arising from single-cell transcriptomics experiments is the question of how to annotate the associated single-cell transcriptomic profiles. Because of the large size and the high dimensionality of the data, automated methods for annotation are needed. We focus here on datasets obtained in the context of developmental biology, where the differentiation process leads to a hierarchical structure. We consider a frequent setting where both labeled and unlabeled data are available at training time, but the sets of the labels of labeled data on one side and of the unlabeled data on the other side, are disjoint. It is an instance of the Novel Class Discovery problem. The goal is to achieve two objectives, clustering the data and mapping the clusters with labels. We propose extensions of k-Means and GMM clustering methods for solving the problem and report comparative results on artificial and experimental transcriptomic datasets. Our approaches take advantage of the hierarchical nature of the data.
{"title":"Hierarchical novel class discovery for single-cell transcriptomic profiles","authors":"Malek Senoussi, Thierry Artières, Paul Villoutreix","doi":"arxiv-2409.05937","DOIUrl":"https://doi.org/arxiv-2409.05937","url":null,"abstract":"One of the major challenges arising from single-cell transcriptomics\u0000experiments is the question of how to annotate the associated single-cell\u0000transcriptomic profiles. Because of the large size and the high dimensionality\u0000of the data, automated methods for annotation are needed. We focus here on\u0000datasets obtained in the context of developmental biology, where the\u0000differentiation process leads to a hierarchical structure. We consider a\u0000frequent setting where both labeled and unlabeled data are available at\u0000training time, but the sets of the labels of labeled data on one side and of\u0000the unlabeled data on the other side, are disjoint. It is an instance of the\u0000Novel Class Discovery problem. The goal is to achieve two objectives,\u0000clustering the data and mapping the clusters with labels. We propose extensions\u0000of k-Means and GMM clustering methods for solving the problem and report\u0000comparative results on artificial and experimental transcriptomic datasets. Our\u0000approaches take advantage of the hierarchical nature of the data.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"396 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianing Xu, Huimin Hu, Gregory Ellison, Lili Yu, Christopher Whalen, Liang Liu
Reconstructing transmission networks is essential for identifying key factors like superspreaders and high-risk locations, which are critical for developing effective pandemic prevention strategies. In this study, we developed a Bayesian framework that integrates genomic and temporal data to reconstruct transmission networks for infectious diseases. The Bayesian transmission model accounts for the latent period and differentiates between symptom onset and actual infection time, enhancing the accuracy of transmission dynamics and epidemiological models. Additionally, the model allows for the transmission of multiple pathogen lineages, reflecting the complexity of real-world transmission events more accurately than models that assume a single lineage transmission. Simulation results show that the Bayesian model reliably estimates both the model parameters and the transmission network. Moreover, hypothesis testing effectively identifies direct transmission events. This approach highlights the crucial role of genetic data in reconstructing transmission networks and understanding the origins and transmission dynamics of infectious diseases.
{"title":"Bayesian estimation of transmission networks for infectious diseases","authors":"Jianing Xu, Huimin Hu, Gregory Ellison, Lili Yu, Christopher Whalen, Liang Liu","doi":"arxiv-2409.05245","DOIUrl":"https://doi.org/arxiv-2409.05245","url":null,"abstract":"Reconstructing transmission networks is essential for identifying key factors\u0000like superspreaders and high-risk locations, which are critical for developing\u0000effective pandemic prevention strategies. In this study, we developed a\u0000Bayesian framework that integrates genomic and temporal data to reconstruct\u0000transmission networks for infectious diseases. The Bayesian transmission model\u0000accounts for the latent period and differentiates between symptom onset and\u0000actual infection time, enhancing the accuracy of transmission dynamics and\u0000epidemiological models. Additionally, the model allows for the transmission of\u0000multiple pathogen lineages, reflecting the complexity of real-world\u0000transmission events more accurately than models that assume a single lineage\u0000transmission. Simulation results show that the Bayesian model reliably\u0000estimates both the model parameters and the transmission network. Moreover,\u0000hypothesis testing effectively identifies direct transmission events. This\u0000approach highlights the crucial role of genetic data in reconstructing\u0000transmission networks and understanding the origins and transmission dynamics\u0000of infectious diseases.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jeremy D. Goldhaber-Fiebert, Hawre Jalal, Fernando Alarid Escudero
Individual-level state-transition microsimulations (iSTMs) have proliferated for economic evaluations in place of cohort state transition models (cSTMs). Probabilistic economic evaluations quantify decision uncertainty and value of information (VOI). Prior studies show that iSTMs provide unbiased estimates of expected incremental net monetary benefits (EINMB), but statistical properties of their estimates of decision uncertainty and VOI are uncharacterized. We compare such iSTMs-produced estimates to corresponding cSTMs. For a 2-alternative decision and normally distributed incremental costs and benefits, we derive analytical expressions for the probability of being cost-effective and the expected value of perfect information (EVPI) for cSTMs and iSTMs, accounting for correlations in incremental outcomes at the population and individual levels. Numerical simulations illustrate our findings and explore relaxation of normality assumptions or having >2 decision alternatives. iSTM estimates of decision uncertainty and VOI are biased but asymptotically consistent (i.e., bias->0 as number of microsimulated individuals->infinity). Decision uncertainty depends on one tail of the INMB distribution (e.g., P(INMB<=0)) which depends on estimated variance (larger with iSTMs given first-order noise). While iSTMs overestimate EVPI, their direction of bias for the probability of being cost-effective is ambiguous. Bias is larger when uncertainties in incremental costs and effects are negatively correlated. While more samples at the population uncertainty level are interchangeable with more microsimulations for estimating EINMB, minimizing iSTM bias in estimating decision uncertainty and VOI depends on sufficient microsimulations. Analysts should account for this when allocating their computational budgets and, at minimum, characterize such bias in their reported results.
{"title":"Microsimulation Estimates of Decision Uncertainty and Value of Information Are Biased but Consistent","authors":"Jeremy D. Goldhaber-Fiebert, Hawre Jalal, Fernando Alarid Escudero","doi":"arxiv-2409.05183","DOIUrl":"https://doi.org/arxiv-2409.05183","url":null,"abstract":"Individual-level state-transition microsimulations (iSTMs) have proliferated\u0000for economic evaluations in place of cohort state transition models (cSTMs).\u0000Probabilistic economic evaluations quantify decision uncertainty and value of\u0000information (VOI). Prior studies show that iSTMs provide unbiased estimates of\u0000expected incremental net monetary benefits (EINMB), but statistical properties\u0000of their estimates of decision uncertainty and VOI are uncharacterized. We\u0000compare such iSTMs-produced estimates to corresponding cSTMs. For a\u00002-alternative decision and normally distributed incremental costs and benefits,\u0000we derive analytical expressions for the probability of being cost-effective\u0000and the expected value of perfect information (EVPI) for cSTMs and iSTMs,\u0000accounting for correlations in incremental outcomes at the population and\u0000individual levels. Numerical simulations illustrate our findings and explore\u0000relaxation of normality assumptions or having >2 decision alternatives. iSTM\u0000estimates of decision uncertainty and VOI are biased but asymptotically\u0000consistent (i.e., bias->0 as number of microsimulated individuals->infinity).\u0000Decision uncertainty depends on one tail of the INMB distribution (e.g.,\u0000P(INMB<=0)) which depends on estimated variance (larger with iSTMs given\u0000first-order noise). While iSTMs overestimate EVPI, their direction of bias for\u0000the probability of being cost-effective is ambiguous. Bias is larger when\u0000uncertainties in incremental costs and effects are negatively correlated. While\u0000more samples at the population uncertainty level are interchangeable with more\u0000microsimulations for estimating EINMB, minimizing iSTM bias in estimating\u0000decision uncertainty and VOI depends on sufficient microsimulations. Analysts\u0000should account for this when allocating their computational budgets and, at\u0000minimum, characterize such bias in their reported results.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The blood-brain barrier (BBB) serves as a protective barrier that separates the brain from the circulatory system, regulating the passage of substances into the central nervous system. Assessing the BBB permeability of potential drugs is crucial for effective drug targeting. However, traditional experimental methods for measuring BBB permeability are challenging and impractical for large-scale screening. Consequently, there is a need to develop computational approaches to predict BBB permeability. This paper proposes a GPS Transformer architecture augmented with Self Attention, designed to perform well in the low-data regime. The proposed approach achieved a state-of-the-art performance on the BBB permeability prediction task using the BBBP dataset, surpassing existing models. With a ROC-AUC of 78.8%, the approach sets a state-of-the-art by 5.5%. We demonstrate that standard Self Attention coupled with GPS transformer performs better than other variants of attention coupled with GPS Transformer.
{"title":"Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets","authors":"Shivesh Prakash","doi":"arxiv-2409.04909","DOIUrl":"https://doi.org/arxiv-2409.04909","url":null,"abstract":"The blood-brain barrier (BBB) serves as a protective barrier that separates\u0000the brain from the circulatory system, regulating the passage of substances\u0000into the central nervous system. Assessing the BBB permeability of potential\u0000drugs is crucial for effective drug targeting. However, traditional\u0000experimental methods for measuring BBB permeability are challenging and\u0000impractical for large-scale screening. Consequently, there is a need to develop\u0000computational approaches to predict BBB permeability. This paper proposes a GPS\u0000Transformer architecture augmented with Self Attention, designed to perform\u0000well in the low-data regime. The proposed approach achieved a state-of-the-art\u0000performance on the BBB permeability prediction task using the BBBP dataset,\u0000surpassing existing models. With a ROC-AUC of 78.8%, the approach sets a\u0000state-of-the-art by 5.5%. We demonstrate that standard Self Attention coupled\u0000with GPS transformer performs better than other variants of attention coupled\u0000with GPS Transformer.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"166 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}