arXiv - QuanBio - Quantitative Methods最新文献_第3页

Modelling the effect of antibody depletion on dose-response behavior for common immunostaining protocols 模拟抗体耗竭对常见免疫染色方案剂量反应行为的影响

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-10 DOI: arxiv-2409.06895

Dominik Tschimmel, Steffen Waldherr, Tim Hucho

Dose-response curves of immunostaining experiments are commonly described asLangmuir isotherm. However, for common immunostaining protocols the equilibriumassumption is violated and the dose-response behavior is governed by antibodyaccumulation. If bound antibodies are replenished, i.e. the concentration ofunbound antibodies is constant, the accumulation model can easily be solvedanalytically. Yet, in many experimental setups the overall amount of antibodiesis fixed such that antibody binding reduces the concentration of freeantibodies. Solving the accumulation model for this case is more difficult andseems to be impossible if the epitopes are heterogeneous. In this paper, wesolve the accumulation model with antibody depletion analytically for thesimple case of identical epitopes. We derive inequalities between thedepletion-free accumulation model, the accumulation model and the Langmuirisotherm. This allows us to characterize the antibody depletion effect. Wegeneralize the problem to heterogeneous epitopes, where we prove the existenceand uniqueness of a solution that behaves as expected by the experimentalsetting. With these properties we derive bounds for the resultingmulti-epitope-class accumulation model and investigate the depletion effect inthe case of heterogeneous epitopes.

免疫染色实验的剂量反应曲线通常被描述为朗穆尔等温线。然而，对于常见的免疫染色实验，平衡假设被打破，剂量-反应行为受抗体积累的影响。如果结合抗体得到补充，即未结合抗体的浓度是恒定的，那么积累模型很容易通过分析求解。然而，在许多实验设置中，抗体的总量是固定的，因此抗体结合会降低游离抗体的浓度。在这种情况下求解积累模型比较困难，如果表位是异质的，似乎就不可能求解。本文针对表位完全相同的简单情况，用分析方法求解了抗体耗竭的积累模型。我们推导出了无损耗积累模型、积累模型和朗缪尔热之间的不等式。这使我们能够描述抗体耗竭效应的特征。我们将这一问题推广到异质表位，证明了一个解的存在性和唯一性，该解的行为符合实验设定的预期。利用这些特性，我们推导出了多表位类积累模型的边界，并研究了异质表位情况下的耗竭效应。

{"title":"Modelling the effect of antibody depletion on dose-response behavior for common immunostaining protocols","authors":"Dominik Tschimmel, Steffen Waldherr, Tim Hucho","doi":"arxiv-2409.06895","DOIUrl":"https://doi.org/arxiv-2409.06895","url":null,"abstract":"Dose-response curves of immunostaining experiments are commonly described as\u0000Langmuir isotherm. However, for common immunostaining protocols the equilibrium\u0000assumption is violated and the dose-response behavior is governed by antibody\u0000accumulation. If bound antibodies are replenished, i.e. the concentration of\u0000unbound antibodies is constant, the accumulation model can easily be solved\u0000analytically. Yet, in many experimental setups the overall amount of antibodies\u0000is fixed such that antibody binding reduces the concentration of free\u0000antibodies. Solving the accumulation model for this case is more difficult and\u0000seems to be impossible if the epitopes are heterogeneous. In this paper, we\u0000solve the accumulation model with antibody depletion analytically for the\u0000simple case of identical epitopes. We derive inequalities between the\u0000depletion-free accumulation model, the accumulation model and the Langmuir\u0000isotherm. This allows us to characterize the antibody depletion effect. We\u0000generalize the problem to heterogeneous epitopes, where we prove the existence\u0000and uniqueness of a solution that behaves as expected by the experimental\u0000setting. With these properties we derive bounds for the resulting\u0000multi-epitope-class accumulation model and investigate the depletion effect in\u0000the case of heterogeneous epitopes.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ProteinBench: A Holistic Evaluation of Protein Foundation Models ProteinBench：蛋白质基础模型的整体评估

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-10 DOI: arxiv-2409.06744

Fei Ye, Zaixiang Zheng, Dongyu Xue, Yuning Shen, Lihao Wang, Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu

Recent years have witnessed a surge in the development of protein foundationmodels, significantly improving performance in protein prediction andgenerative tasks ranging from 3D structure prediction and protein design toconformational dynamics. However, the capabilities and limitations associatedwith these models remain poorly understood due to the absence of a unifiedevaluation framework. To fill this gap, we introduce ProteinBench, a holisticevaluation framework designed to enhance the transparency of protein foundationmodels. Our approach consists of three key components: (i) A taxonomicclassification of tasks that broadly encompass the main challenges in theprotein domain, based on the relationships between different proteinmodalities; (ii) A multi-metric evaluation approach that assesses performanceacross four key dimensions: quality, novelty, diversity, and robustness; and(iii) In-depth analyses from various user objectives, providing a holistic viewof model performance. Our comprehensive evaluation of protein foundation modelsreveals several key findings that shed light on their current capabilities andlimitations. To promote transparency and facilitate further research, werelease the evaluation dataset, code, and a public leaderboard publicly forfurther analysis and a general modular toolkit. We intend for ProteinBench tobe a living benchmark for establishing a standardized, in-depth evaluationframework for protein foundation models, driving their development andapplication while fostering collaboration within the field.

近年来，蛋白质基础模型的发展突飞猛进，大大提高了蛋白质预测和生成任务（从三维结构预测和蛋白质设计到构象动力学）的性能。然而，由于缺乏统一的评估框架，人们对这些模型的能力和局限性仍然知之甚少。为了填补这一空白，我们推出了 ProteinBench，一个旨在提高蛋白质基础模型透明度的整体评估框架。我们的方法由三个关键部分组成：(i) 基于不同蛋白质模式之间的关系，对任务进行分类，这些任务广泛涵盖了蛋白质领域的主要挑战；(ii) 多指标评估方法，评估四个关键维度的性能：质量、新颖性、多样性和稳健性；(iii) 从不同用户目标进行深入分析，提供模型性能的整体视图。我们对蛋白质基础模型的全面评估揭示了几个关键发现，阐明了这些模型目前的能力和局限性。为了提高透明度和促进进一步的研究，我们公开发布了评估数据集、代码和公共排行榜，供进一步分析和通用模块化工具包使用。我们打算让 ProteinBench 成为一个有生命力的基准，为蛋白质基础模型建立一个标准化、深入的评估框架，推动蛋白质基础模型的开发和应用，同时促进领域内的合作。

{"title":"ProteinBench: A Holistic Evaluation of Protein Foundation Models","authors":"Fei Ye, Zaixiang Zheng, Dongyu Xue, Yuning Shen, Lihao Wang, Yiming Ma, Yan Wang, Xinyou Wang, Xiangxin Zhou, Quanquan Gu","doi":"arxiv-2409.06744","DOIUrl":"https://doi.org/arxiv-2409.06744","url":null,"abstract":"Recent years have witnessed a surge in the development of protein foundation\u0000models, significantly improving performance in protein prediction and\u0000generative tasks ranging from 3D structure prediction and protein design to\u0000conformational dynamics. However, the capabilities and limitations associated\u0000with these models remain poorly understood due to the absence of a unified\u0000evaluation framework. To fill this gap, we introduce ProteinBench, a holistic\u0000evaluation framework designed to enhance the transparency of protein foundation\u0000models. Our approach consists of three key components: (i) A taxonomic\u0000classification of tasks that broadly encompass the main challenges in the\u0000protein domain, based on the relationships between different protein\u0000modalities; (ii) A multi-metric evaluation approach that assesses performance\u0000across four key dimensions: quality, novelty, diversity, and robustness; and\u0000(iii) In-depth analyses from various user objectives, providing a holistic view\u0000of model performance. Our comprehensive evaluation of protein foundation models\u0000reveals several key findings that shed light on their current capabilities and\u0000limitations. To promote transparency and facilitate further research, we\u0000release the evaluation dataset, code, and a public leaderboard publicly for\u0000further analysis and a general modular toolkit. We intend for ProteinBench to\u0000be a living benchmark for establishing a standardized, in-depth evaluation\u0000framework for protein foundation models, driving their development and\u0000application while fostering collaboration within the field.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepFM-Crispr: Prediction of CRISPR On-Target Effects via Deep Learning DeepFM-Crispr：通过深度学习预测 CRISPR 靶向效应

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-09 DOI: arxiv-2409.05938

Condy Bao, Fuxiao Liu

Since the advent of CRISPR-Cas9, a groundbreaking gene-editing technologythat enables precise genomic modifications via a short RNA guide sequence,there has been a marked increase in the accessibility and application of thistechnology across various fields. The success of CRISPR-Cas9 has spurredfurther investment and led to the discovery of additional CRISPR systems,including CRISPR-Cas13. Distinct from Cas9, which targets DNA, Cas13 targetsRNA, offering unique advantages for gene modulation. We focus on Cas13d, avariant known for its collateral activity where it non-specifically cleavesadjacent RNA molecules upon activation, a feature critical to its function. Weintroduce DeepFM-Crispr, a novel deep learning model developed to predict theon-target efficiency and evaluate the off-target effects of Cas13d. This modelharnesses a large language model to generate comprehensive representations richin evolutionary and structural data, thereby enhancing predictions of RNAsecondary structures and overall sgRNA efficacy. A transformer-basedarchitecture processes these inputs to produce a predictive efficacy score.Comparative experiments show that DeepFM-Crispr not only surpasses traditionalmodels but also outperforms recent state-of-the-art deep learning methods interms of prediction accuracy and reliability.

CRISPR-Cas9 是一种开创性的基因编辑技术，可通过短 RNA 引导序列对基因组进行精确修饰，自 CRISPR-Cas9 诞生以来，该技术在各个领域的普及和应用都有了显著提高。CRISPR-Cas9的成功刺激了进一步的投资，并导致了包括CRISPR-Cas13在内的其他CRISPR系统的发现。与以DNA为靶标的Cas9不同，Cas13以RNA为靶标，在基因调控方面具有独特的优势。我们重点研究了Cas13d，它是一种因其附带活性而闻名的变体，在激活时会非特异性地裂解相邻的RNA分子，这对其功能至关重要。我们介绍了 DeepFM-Crispr，这是一种新型深度学习模型，用于预测 Cas13d 的靶上效率和评估其靶外效应。该模型利用大型语言模型生成富含进化和结构数据的综合表征，从而增强了对 RNA 二级结构和 sgRNA 整体功效的预测。对比实验表明，DeepFM-Crispr 不仅超越了传统模型，而且在预测准确性和可靠性方面也优于最近最先进的深度学习方法。

{"title":"DeepFM-Crispr: Prediction of CRISPR On-Target Effects via Deep Learning","authors":"Condy Bao, Fuxiao Liu","doi":"arxiv-2409.05938","DOIUrl":"https://doi.org/arxiv-2409.05938","url":null,"abstract":"Since the advent of CRISPR-Cas9, a groundbreaking gene-editing technology\u0000that enables precise genomic modifications via a short RNA guide sequence,\u0000there has been a marked increase in the accessibility and application of this\u0000technology across various fields. The success of CRISPR-Cas9 has spurred\u0000further investment and led to the discovery of additional CRISPR systems,\u0000including CRISPR-Cas13. Distinct from Cas9, which targets DNA, Cas13 targets\u0000RNA, offering unique advantages for gene modulation. We focus on Cas13d, a\u0000variant known for its collateral activity where it non-specifically cleaves\u0000adjacent RNA molecules upon activation, a feature critical to its function. We\u0000introduce DeepFM-Crispr, a novel deep learning model developed to predict the\u0000on-target efficiency and evaluate the off-target effects of Cas13d. This model\u0000harnesses a large language model to generate comprehensive representations rich\u0000in evolutionary and structural data, thereby enhancing predictions of RNA\u0000secondary structures and overall sgRNA efficacy. A transformer-based\u0000architecture processes these inputs to produce a predictive efficacy score.\u0000Comparative experiments show that DeepFM-Crispr not only surpasses traditional\u0000models but also outperforms recent state-of-the-art deep learning methods in\u0000terms of prediction accuracy and reliability.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"94 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

signDNE: A python package for ariaDNE and its sign-oriented extension signDNE：ariaDNE 及其面向符号扩展的 python 软件包

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-09 DOI: arxiv-2409.05549

Felix Risbro Hjerrild, Shan Shan, Doug M Boyer, Ingrid Daubechies

A key challenge in evolutionary biology is to develop robust computationaltools that can accurately analyze shape variations across diverse anatomicalstructures. The Dirichlet Normal Energy (DNE) is a shape complexity metric thataddresses this by summarizing the local curvature of surfaces, particularlyaiding the analytical studies and providing insights into evolutionary andfunctional adaptations. Building on the DNE concept, we introduce aPython-based implementation, designed to compute both the original DNE and anewly developed sign-oriented DNE metric. This Python package includes auser-friendly command line interface (CLI) and built-in visualization tools tofacilitate the interpretation of the surface's local curvature properties. Theaddition of signDNE, which integrates the convexity and concavity of surfaces,enhances the tool's ability to identify fine-scale features across a broadrange of biological structures. We validate the robustness of our method bycomparing its performance with standard implementations on a dataset oftriangular meshes with varying discrete representations. Additionally, wedemonstrate its potential applications through visualization of the localcurvature field (i.e., local curvature value over the surface) on variousbiological specimens, showing how it effectively captures complex biologicalfeatures. In this paper, we offer a brief overview of the Python CLI for easeof use. Alongside the Python implementation, we have also updated the originalMATLAB package to ensure consistent and accurate DNE computation acrossplatforms. These improvements enhance the tool's flexibility, reducesensitivity to sampling density and mesh quality, and support a more accurateinterpretation of biological surface topography.

进化生物学的一个关键挑战是开发强大的计算工具，以准确分析不同解剖结构的形状变化。Dirichlet Normal Energy（DNE）是一种形状复杂性度量，它通过总结表面的局部曲率来解决这一问题，尤其有助于分析研究，并提供对进化和功能适应性的见解。在 DNE 概念的基础上，我们引入了基于 Python 的实现，旨在计算原始 DNE 和新开发的面向符号的 DNE 指标。这个 Python 软件包包括用户友好的命令行界面（CLI）和内置可视化工具，以方便解释曲面的局部曲率属性。添加的符号 DNE 综合了表面的凸度和凹度，增强了该工具在广泛的生物结构中识别精细尺度特征的能力。我们通过比较我们的方法与标准实现方法在不同离散表示的三角形网格数据集上的性能，验证了我们方法的稳健性。此外，我们还通过对各种生物标本上的局部曲率场（即表面上的局部曲率值）进行可视化，展示了该方法如何有效捕捉复杂的生物特征，从而证明了它的潜在应用价值。在本文中，我们简要介绍了 Python CLI，以方便使用。除了 Python 实现之外，我们还更新了原始的 MATLAB 软件包，以确保跨平台 DNE 计算的一致性和准确性。这些改进增强了工具的灵活性，降低了对采样密度和网格质量的敏感性，并支持更准确地解释生物表面地形。

{"title":"signDNE: A python package for ariaDNE and its sign-oriented extension","authors":"Felix Risbro Hjerrild, Shan Shan, Doug M Boyer, Ingrid Daubechies","doi":"arxiv-2409.05549","DOIUrl":"https://doi.org/arxiv-2409.05549","url":null,"abstract":"A key challenge in evolutionary biology is to develop robust computational\u0000tools that can accurately analyze shape variations across diverse anatomical\u0000structures. The Dirichlet Normal Energy (DNE) is a shape complexity metric that\u0000addresses this by summarizing the local curvature of surfaces, particularly\u0000aiding the analytical studies and providing insights into evolutionary and\u0000functional adaptations. Building on the DNE concept, we introduce a\u0000Python-based implementation, designed to compute both the original DNE and a\u0000newly developed sign-oriented DNE metric. This Python package includes a\u0000user-friendly command line interface (CLI) and built-in visualization tools to\u0000facilitate the interpretation of the surface's local curvature properties. The\u0000addition of signDNE, which integrates the convexity and concavity of surfaces,\u0000enhances the tool's ability to identify fine-scale features across a broad\u0000range of biological structures. We validate the robustness of our method by\u0000comparing its performance with standard implementations on a dataset of\u0000triangular meshes with varying discrete representations. Additionally, we\u0000demonstrate its potential applications through visualization of the local\u0000curvature field (i.e., local curvature value over the surface) on various\u0000biological specimens, showing how it effectively captures complex biological\u0000features. In this paper, we offer a brief overview of the Python CLI for ease\u0000of use. Alongside the Python implementation, we have also updated the original\u0000MATLAB package to ensure consistent and accurate DNE computation across\u0000platforms. These improvements enhance the tool's flexibility, reduce\u0000sensitivity to sampling density and mesh quality, and support a more accurate\u0000interpretation of biological surface topography.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement CRADLE-VAE：利用基于反事实推理的伪差解除增强单细胞基因扰动建模

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-09 DOI: arxiv-2409.05484

Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang

Predicting cellular responses to various perturbations is a critical focus indrug discovery and personalized therapeutics, with deep learning models playinga significant role in this endeavor. Single-cell datasets contain technicalartifacts that may hinder the predictability of such models, which posesquality control issues highly regarded in this area. To address this, wepropose CRADLE-VAE, a causal generative framework tailored for single-cell geneperturbation modeling, enhanced with counterfactual reasoning-based artifactdisentanglement. Throughout training, CRADLE-VAE models the underlying latentdistribution of technical artifacts and perturbation effects present insingle-cell datasets. It employs counterfactual reasoning to effectivelydisentangle such artifacts by modulating the latent basal spaces and learnsrobust features for generating cellular response data with improved quality.Experimental results demonstrate that this approach improves not only treatmenteffect estimation performance but also generative quality as well. TheCRADLE-VAE codebase is publicly available athttps://github.com/dmis-lab/CRADLE-VAE.

预测细胞对各种扰动的反应是药物发现和个性化治疗的一个关键重点，深度学习模型在这方面发挥着重要作用。单细胞数据集包含的技术因素可能会阻碍此类模型的可预测性，这就带来了该领域备受关注的质量控制问题。为了解决这个问题，我们提出了 CRADLE-VAE，这是一个为单细胞基因扰动建模量身定制的因果生成框架，并通过基于反事实推理的人工制品解纠缠功能进行了增强。在整个训练过程中，CRADLE-VAE 对单细胞数据集中存在的技术假象和扰动效应的潜在分布进行建模。实验结果表明，这种方法不仅能提高治疗效果估计性能，还能提高生成质量。CRADLE-VAE代码库可在https://github.com/dmis-lab/CRADLE-VAE。

{"title":"CRADLE-VAE: Enhancing Single-Cell Gene Perturbation Modeling with Counterfactual Reasoning-based Artifact Disentanglement","authors":"Seungheun Baek, Soyon Park, Yan Ting Chok, Junhyun Lee, Jueon Park, Mogan Gim, Jaewoo Kang","doi":"arxiv-2409.05484","DOIUrl":"https://doi.org/arxiv-2409.05484","url":null,"abstract":"Predicting cellular responses to various perturbations is a critical focus in\u0000drug discovery and personalized therapeutics, with deep learning models playing\u0000a significant role in this endeavor. Single-cell datasets contain technical\u0000artifacts that may hinder the predictability of such models, which poses\u0000quality control issues highly regarded in this area. To address this, we\u0000propose CRADLE-VAE, a causal generative framework tailored for single-cell gene\u0000perturbation modeling, enhanced with counterfactual reasoning-based artifact\u0000disentanglement. Throughout training, CRADLE-VAE models the underlying latent\u0000distribution of technical artifacts and perturbation effects present in\u0000single-cell datasets. It employs counterfactual reasoning to effectively\u0000disentangle such artifacts by modulating the latent basal spaces and learns\u0000robust features for generating cellular response data with improved quality.\u0000Experimental results demonstrate that this approach improves not only treatment\u0000effect estimation performance but also generative quality as well. The\u0000CRADLE-VAE codebase is publicly available at\u0000https://github.com/dmis-lab/CRADLE-VAE.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding how chromatin folding and enzyme competition affect rugged epigenetic landscapes 了解染色质折叠和酶竞争如何影响崎岖的表观遗传景观

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-09 DOI: arxiv-2409.06116

Daria Stepanova, Meritxell Brunet Guasch, Helen M. Byrne, Tomás Alarcón

Epigenetics plays a key role in cellular differentiation and maintaining cellidentity, enabling cells to regulate their genetic activity without alteringthe DNA sequence. Epigenetic regulation occurs within the context ofhierarchically folded chromatin, yet the interplay between the dynamics ofepigenetic modifications and chromatin architecture remains poorly understood.In addition, it remains unclear what mechanisms drive the formation of ruggedepigenetic patterns, characterised by alternating genomic regions enriched inactivating and repressive marks. In this study, we focus on post-translationalmodifications of histone H3 tails, particularly H3K27me3, H3K4me3, and H3K27ac.We introduce a mesoscopic stochastic model that incorporates chromatinarchitecture and competition of histone-modifying enzymes into the dynamics ofepigenetic modifications in small genomic loci comprising several nucleosomes.Our approach enables us to investigate the mechanisms by which epigeneticpatterns form on larger scales of chromatin organisation, such as loops anddomains. Through bifurcation analysis and stochastic simulations, wedemonstrate that the model can reproduce uniform chromatin states (open,closed, and bivalent) and generate previously unexplored rugged profiles. Ourresults suggest that enzyme competition and chromatin conformations withhigh-frequency interactions between distant genomic loci can drive theemergence of rugged epigenetic landscapes. Additionally, we hypothesise thatbivalent chromatin can act as an intermediate state, facilitating transitionsbetween uniform and rugged landscapes. This work offers a powerful mathematicalframework for understanding the dynamic interactions between chromatinarchitecture and epigenetic regulation, providing new insights into theformation of complex epigenetic patterns.

表观遗传学在细胞分化和保持细胞特性方面起着关键作用，它使细胞能够在不改变DNA序列的情况下调节其遗传活动。表观遗传调控发生在层次折叠的染色质中，但人们对表观遗传修饰动态与染色质结构之间的相互作用仍然知之甚少。此外，人们仍然不清楚是什么机制驱动形成了崎岖不平的表观遗传模式，其特点是基因组区域交替富含失活和抑制标记。在这项研究中，我们重点研究了组蛋白 H3 尾部的翻译后修饰，特别是 H3K27me3、H3K4me3 和 H3K27ac。我们引入了一个介观随机模型，该模型将染色质结构和组蛋白修饰酶的竞争纳入了由多个核小体组成的小基因组位点的表观遗传修饰动态。我们的方法使我们能够研究表观遗传模式在更大尺度的染色质组织（如环和域）中的形成机制。通过分岔分析和随机模拟，我们证明该模型可以再现统一的染色质状态（开放、封闭和二价），并生成以前未探索过的崎岖曲线。我们的研究结果表明，酶竞争和染色质构象与遥远的基因组位点之间的高频率相互作用可以驱动崎岖不平的表观遗传景观的出现。此外，我们还假设，双价染色质可以作为一种中间状态，促进均匀地貌和崎岖地貌之间的过渡。这项研究为理解染色质结构与表观遗传调控之间的动态相互作用提供了一个强大的数学框架，为复杂表观遗传模式的形成提供了新的见解。

{"title":"Understanding how chromatin folding and enzyme competition affect rugged epigenetic landscapes","authors":"Daria Stepanova, Meritxell Brunet Guasch, Helen M. Byrne, Tomás Alarcón","doi":"arxiv-2409.06116","DOIUrl":"https://doi.org/arxiv-2409.06116","url":null,"abstract":"Epigenetics plays a key role in cellular differentiation and maintaining cell\u0000identity, enabling cells to regulate their genetic activity without altering\u0000the DNA sequence. Epigenetic regulation occurs within the context of\u0000hierarchically folded chromatin, yet the interplay between the dynamics of\u0000epigenetic modifications and chromatin architecture remains poorly understood.\u0000In addition, it remains unclear what mechanisms drive the formation of rugged\u0000epigenetic patterns, characterised by alternating genomic regions enriched in\u0000activating and repressive marks. In this study, we focus on post-translational\u0000modifications of histone H3 tails, particularly H3K27me3, H3K4me3, and H3K27ac.\u0000We introduce a mesoscopic stochastic model that incorporates chromatin\u0000architecture and competition of histone-modifying enzymes into the dynamics of\u0000epigenetic modifications in small genomic loci comprising several nucleosomes.\u0000Our approach enables us to investigate the mechanisms by which epigenetic\u0000patterns form on larger scales of chromatin organisation, such as loops and\u0000domains. Through bifurcation analysis and stochastic simulations, we\u0000demonstrate that the model can reproduce uniform chromatin states (open,\u0000closed, and bivalent) and generate previously unexplored rugged profiles. Our\u0000results suggest that enzyme competition and chromatin conformations with\u0000high-frequency interactions between distant genomic loci can drive the\u0000emergence of rugged epigenetic landscapes. Additionally, we hypothesise that\u0000bivalent chromatin can act as an intermediate state, facilitating transitions\u0000between uniform and rugged landscapes. This work offers a powerful mathematical\u0000framework for understanding the dynamic interactions between chromatin\u0000architecture and epigenetic regulation, providing new insights into the\u0000formation of complex epigenetic patterns.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"20 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hierarchical novel class discovery for single-cell transcriptomic profiles 单细胞转录组图谱的分级新类别发现

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-09 DOI: arxiv-2409.05937

Malek Senoussi, Thierry Artières, Paul Villoutreix

One of the major challenges arising from single-cell transcriptomicsexperiments is the question of how to annotate the associated single-celltranscriptomic profiles. Because of the large size and the high dimensionalityof the data, automated methods for annotation are needed. We focus here ondatasets obtained in the context of developmental biology, where thedifferentiation process leads to a hierarchical structure. We consider afrequent setting where both labeled and unlabeled data are available attraining time, but the sets of the labels of labeled data on one side and ofthe unlabeled data on the other side, are disjoint. It is an instance of theNovel Class Discovery problem. The goal is to achieve two objectives,clustering the data and mapping the clusters with labels. We propose extensionsof k-Means and GMM clustering methods for solving the problem and reportcomparative results on artificial and experimental transcriptomic datasets. Ourapproaches take advantage of the hierarchical nature of the data.

单细胞转录组实验面临的主要挑战之一是如何注释相关的单细胞转录组图谱。由于数据量大、维度高，因此需要自动化的注释方法。在此，我们将重点放在发育生物学背景下获得的数据集上，其中的分化过程会导致分层结构。我们考虑的情况是，在训练时，有标签和无标签数据都可用，但一边是有标签数据的标签集，另一边是无标签数据的标签集，两者互不相交。这是新类发现问题的一个实例。我们的目标是实现两个目标：对数据进行聚类和将聚类与标签进行映射。我们提出了 k-Means 和 GMM 聚类方法的扩展来解决这个问题，并报告了在人工和实验转录组数据集上的比较结果。我们的方法利用了数据的层次性。

引用次数: 0

Bayesian estimation of transmission networks for infectious diseases 传染病传播网络的贝叶斯估计

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-08 DOI: arxiv-2409.05245

Jianing Xu, Huimin Hu, Gregory Ellison, Lili Yu, Christopher Whalen, Liang Liu

Reconstructing transmission networks is essential for identifying key factorslike superspreaders and high-risk locations, which are critical for developingeffective pandemic prevention strategies. In this study, we developed aBayesian framework that integrates genomic and temporal data to reconstructtransmission networks for infectious diseases. The Bayesian transmission modelaccounts for the latent period and differentiates between symptom onset andactual infection time, enhancing the accuracy of transmission dynamics andepidemiological models. Additionally, the model allows for the transmission ofmultiple pathogen lineages, reflecting the complexity of real-worldtransmission events more accurately than models that assume a single lineagetransmission. Simulation results show that the Bayesian model reliablyestimates both the model parameters and the transmission network. Moreover,hypothesis testing effectively identifies direct transmission events. Thisapproach highlights the crucial role of genetic data in reconstructingtransmission networks and understanding the origins and transmission dynamicsof infectious diseases.

重建传播网络对于确定超级传播者和高风险地点等关键因素至关重要，而这些因素对于制定有效的流行病预防策略至关重要。在这项研究中，我们开发了一个贝叶斯框架，它整合了基因组数据和时间数据来重建传染病的传播网络。贝叶斯传播模型考虑了潜伏期，并区分了症状出现时间和实际感染时间，从而提高了传播动力学和流行病学模型的准确性。此外，该模型允许多个病原体系的传播，比假设单系传播的模型更准确地反映了现实世界传播事件的复杂性。模拟结果表明，贝叶斯模型能可靠地估计模型参数和传播网络。此外，假设检验还能有效识别直接传输事件。这种方法凸显了基因数据在重建传播网络和了解传染病起源与传播动态方面的关键作用。

{"title":"Bayesian estimation of transmission networks for infectious diseases","authors":"Jianing Xu, Huimin Hu, Gregory Ellison, Lili Yu, Christopher Whalen, Liang Liu","doi":"arxiv-2409.05245","DOIUrl":"https://doi.org/arxiv-2409.05245","url":null,"abstract":"Reconstructing transmission networks is essential for identifying key factors\u0000like superspreaders and high-risk locations, which are critical for developing\u0000effective pandemic prevention strategies. In this study, we developed a\u0000Bayesian framework that integrates genomic and temporal data to reconstruct\u0000transmission networks for infectious diseases. The Bayesian transmission model\u0000accounts for the latent period and differentiates between symptom onset and\u0000actual infection time, enhancing the accuracy of transmission dynamics and\u0000epidemiological models. Additionally, the model allows for the transmission of\u0000multiple pathogen lineages, reflecting the complexity of real-world\u0000transmission events more accurately than models that assume a single lineage\u0000transmission. Simulation results show that the Bayesian model reliably\u0000estimates both the model parameters and the transmission network. Moreover,\u0000hypothesis testing effectively identifies direct transmission events. This\u0000approach highlights the crucial role of genetic data in reconstructing\u0000transmission networks and understanding the origins and transmission dynamics\u0000of infectious diseases.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Microsimulation Estimates of Decision Uncertainty and Value of Information Are Biased but Consistent 微观模拟对决策不确定性和信息价值的估计有偏差但一致

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-08 DOI: arxiv-2409.05183

Jeremy D. Goldhaber-Fiebert, Hawre Jalal, Fernando Alarid Escudero

Individual-level state-transition microsimulations (iSTMs) have proliferatedfor economic evaluations in place of cohort state transition models (cSTMs).Probabilistic economic evaluations quantify decision uncertainty and value ofinformation (VOI). Prior studies show that iSTMs provide unbiased estimates ofexpected incremental net monetary benefits (EINMB), but statistical propertiesof their estimates of decision uncertainty and VOI are uncharacterized. Wecompare such iSTMs-produced estimates to corresponding cSTMs. For a2-alternative decision and normally distributed incremental costs and benefits,we derive analytical expressions for the probability of being cost-effectiveand the expected value of perfect information (EVPI) for cSTMs and iSTMs,accounting for correlations in incremental outcomes at the population andindividual levels. Numerical simulations illustrate our findings and explorerelaxation of normality assumptions or having >2 decision alternatives. iSTMestimates of decision uncertainty and VOI are biased but asymptoticallyconsistent (i.e., bias->0 as number of microsimulated individuals->infinity).Decision uncertainty depends on one tail of the INMB distribution (e.g.,P(INMB<=0)) which depends on estimated variance (larger with iSTMs givenfirst-order noise). While iSTMs overestimate EVPI, their direction of bias forthe probability of being cost-effective is ambiguous. Bias is larger whenuncertainties in incremental costs and effects are negatively correlated. Whilemore samples at the population uncertainty level are interchangeable with moremicrosimulations for estimating EINMB, minimizing iSTM bias in estimatingdecision uncertainty and VOI depends on sufficient microsimulations. Analystsshould account for this when allocating their computational budgets and, atminimum, characterize such bias in their reported results.

个体水平的状态转换微观模拟（iSTMs）在经济评估中大量出现，以取代队列状态转换模型（cSTMs）。先前的研究表明，iSTM 可提供无偏的预期增量净货币效益（EINMB）估算值，但其对决策不确定性和信息价值（VOI）估算值的统计特性尚未定性。我们将 iSTM 得出的估计值与相应的 cSTM 进行比较。对于 2 备选决策和正态分布的增量成本和收益，我们推导出了 cSTM 和 iSTM 的成本效益概率和完美信息预期值 (EVPI) 的分析表达式，并考虑了人群和个体层面增量结果的相关性。iSTM 对决策不确定性和 VOI 的估计是有偏差的，但在渐近上是一致的（即当微观模拟个体数大于无限时，偏差大于 0）。决策不确定性取决于 INMB 分布的一个尾部（如 P(INMB<=0)），而 INMB 分布的尾部取决于估计方差（考虑到一阶噪声，iSTM 的估计方差更大）。虽然 iSTM 高估了 EVPI，但其对具有成本效益概率的偏差方向并不明确。当增量成本和效果的不确定性呈负相关时，偏差会更大。虽然在总体不确定性水平上更多的样本可以与更多的微观模拟互换，以估计 EINMB，但在估计决策不确定性和 VOI 时尽量减少 iSTM 偏差取决于足够的微观模拟。分析人员在分配计算预算时应考虑到这一点，并至少在报告结果中描述这种偏差。

{"title":"Microsimulation Estimates of Decision Uncertainty and Value of Information Are Biased but Consistent","authors":"Jeremy D. Goldhaber-Fiebert, Hawre Jalal, Fernando Alarid Escudero","doi":"arxiv-2409.05183","DOIUrl":"https://doi.org/arxiv-2409.05183","url":null,"abstract":"Individual-level state-transition microsimulations (iSTMs) have proliferated\u0000for economic evaluations in place of cohort state transition models (cSTMs).\u0000Probabilistic economic evaluations quantify decision uncertainty and value of\u0000information (VOI). Prior studies show that iSTMs provide unbiased estimates of\u0000expected incremental net monetary benefits (EINMB), but statistical properties\u0000of their estimates of decision uncertainty and VOI are uncharacterized. We\u0000compare such iSTMs-produced estimates to corresponding cSTMs. For a\u00002-alternative decision and normally distributed incremental costs and benefits,\u0000we derive analytical expressions for the probability of being cost-effective\u0000and the expected value of perfect information (EVPI) for cSTMs and iSTMs,\u0000accounting for correlations in incremental outcomes at the population and\u0000individual levels. Numerical simulations illustrate our findings and explore\u0000relaxation of normality assumptions or having >2 decision alternatives. iSTM\u0000estimates of decision uncertainty and VOI are biased but asymptotically\u0000consistent (i.e., bias->0 as number of microsimulated individuals->infinity).\u0000Decision uncertainty depends on one tail of the INMB distribution (e.g.,\u0000P(INMB<=0)) which depends on estimated variance (larger with iSTMs given\u0000first-order noise). While iSTMs overestimate EVPI, their direction of bias for\u0000the probability of being cost-effective is ambiguous. Bias is larger when\u0000uncertainties in incremental costs and effects are negatively correlated. While\u0000more samples at the population uncertainty level are interchangeable with more\u0000microsimulations for estimating EINMB, minimizing iSTM bias in estimating\u0000decision uncertainty and VOI depends on sufficient microsimulations. Analysts\u0000should account for this when allocating their computational budgets and, at\u0000minimum, characterize such bias in their reported results.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Training of Transformers for Molecule Property Prediction on Small-scale Datasets 在小规模数据集上高效训练用于分子性质预测的变换器

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-09-07 DOI: arxiv-2409.04909

Shivesh Prakash

The blood-brain barrier (BBB) serves as a protective barrier that separatesthe brain from the circulatory system, regulating the passage of substancesinto the central nervous system. Assessing the BBB permeability of potentialdrugs is crucial for effective drug targeting. However, traditionalexperimental methods for measuring BBB permeability are challenging andimpractical for large-scale screening. Consequently, there is a need to developcomputational approaches to predict BBB permeability. This paper proposes a GPSTransformer architecture augmented with Self Attention, designed to performwell in the low-data regime. The proposed approach achieved a state-of-the-artperformance on the BBB permeability prediction task using the BBBP dataset,surpassing existing models. With a ROC-AUC of 78.8%, the approach sets astate-of-the-art by 5.5%. We demonstrate that standard Self Attention coupledwith GPS transformer performs better than other variants of attention coupledwith GPS Transformer.

血脑屏障（BBB）是将大脑与循环系统分隔开来的一道保护屏障，可调节物质进入中枢神经系统的通道。评估潜在药物的血脑屏障通透性对于有效的药物靶向至关重要。然而，测量 BBB 通透性的传统实验方法对于大规模筛选来说具有挑战性且不切实际。因此，有必要开发预测 BBB 渗透性的计算方法。本文提出了一种利用自我关注（Self Attention）增强的全球定位系统转换器（GPSTransformer）架构，其设计目的是在低数据机制下实现良好的性能。所提出的方法在使用 BBBP 数据集进行 BBB 渗透性预测任务时取得了超越现有模型的最佳性能。该方法的 ROC-AUC 为 78.8%，比先进水平提高了 5.5%。我们证明，标准的 "自我注意力 "与 GPS 变换器相结合，比注意力与 GPS 变换器相结合的其他变体表现更好。

引用次数: 0