首页 > 最新文献

ArXiv最新文献

英文 中文
Inferring resource competition in microbial communities from time series. 从时间序列推断微生物群落中的资源竞争。
Pub Date : 2025-01-17
Xiaowen Chen, Kyle Crocker, Seppe Kuehn, Aleksandra M Walczak, Thierry Mora

The competition for resources is a defining feature of microbial communities. In many contexts, from soils to host-associated communities, highly diverse microbes are organized into metabolic groups or guilds with similar resource preferences. The resource preferences of individual taxa that give rise to these guilds are critical for understanding fluxes of resources through the community and the structure of diversity in the system. However, inferring the metabolic capabilities of individual taxa, and their competition with other taxa, within a community is challenging and unresolved. Here we address this gap in knowledge by leveraging dynamic measurements of abundances in communities. We show that simple correlations are often misleading in predicting resource competition. We show that spectral methods such as the cross-power spectral density (CPSD) and coherence that account for time-delayed effects are superior metrics for inferring the structure of resource competition in communities. We first demonstrate this fact on synthetic data generated from consumer-resource models with time-dependent resource availability, where taxa are organized into groups or guilds with similar resource preferences. By applying spectral methods to oceanic plankton time-series data, we demonstrate that these methods detect interaction structures among species with similar genomic sequences. Our results indicate that analyzing temporal data across multiple timescales can reveal the underlying structure of resource competition within communities.

{"title":"Inferring resource competition in microbial communities from time series.","authors":"Xiaowen Chen, Kyle Crocker, Seppe Kuehn, Aleksandra M Walczak, Thierry Mora","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The competition for resources is a defining feature of microbial communities. In many contexts, from soils to host-associated communities, highly diverse microbes are organized into metabolic groups or guilds with similar resource preferences. The resource preferences of individual taxa that give rise to these guilds are critical for understanding fluxes of resources through the community and the structure of diversity in the system. However, inferring the metabolic capabilities of individual taxa, and their competition with other taxa, within a community is challenging and unresolved. Here we address this gap in knowledge by leveraging dynamic measurements of abundances in communities. We show that simple correlations are often misleading in predicting resource competition. We show that spectral methods such as the cross-power spectral density (CPSD) and coherence that account for time-delayed effects are superior metrics for inferring the structure of resource competition in communities. We first demonstrate this fact on synthetic data generated from consumer-resource models with time-dependent resource availability, where taxa are organized into groups or guilds with similar resource preferences. By applying spectral methods to oceanic plankton time-series data, we demonstrate that these methods detect interaction structures among species with similar genomic sequences. Our results indicate that analyzing temporal data across multiple timescales can reveal the underlying structure of resource competition within communities.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11759850/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unconditional stability of a recurrent neural circuit implementing divisive normalization. 实施除法归一化的递归神经回路的无条件稳定性
Pub Date : 2025-01-15
Shivang Rawat, David J Heeger, Stefano Martiniani

Stability in recurrent neural models poses a significant challenge, particularly in developing biologically plausible neurodynamical models that can be seamlessly trained. Traditional cortical circuit models are notoriously difficult to train due to expansive nonlinearities in the dynamical system, leading to an optimization problem with nonlinear stability constraints that are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in tasks involving sequential data but lack biological plausibility and interpretability. In this work, we address these challenges by linking dynamic divisive normalization (DN) to the stability of "oscillatory recurrent gated neural integrator circuits" (ORGaNICs), a biologically plausible recurrent cortical circuit model that dynamically achieves DN and that has been shown to simulate a wide range of neurophysiological phenomena. By using the indirect method of Lyapunov, we prove the remarkable property of unconditional local stability for an arbitrary-dimensional ORGaNICs circuit when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a system of coupled damped harmonic oscillators, which enables us to derive the circuit's energy function, providing a normative principle of what the circuit, and individual neurons, aim to accomplish. Further, for a generic recurrent weight matrix, we prove the stability of the 2D model and demonstrate empirically that stability holds in higher dimensions. Finally, we show that ORGaNICs can be trained by backpropagation through time without gradient clipping/scaling, thanks to its intrinsic stability property and adaptive time constants, which address the problems of exploding, vanishing, and oscillating gradients. By evaluating the model's performance on RNN benchmarks, we find that ORGaNICs outperform alternative neurodynamical models on static image classification tasks and perform comparably to LSTMs on sequential tasks.

循环神经模型的稳定性是一项重大挑战,尤其是在开发可无缝训练的生物学上可信的神经动力学模型方面。传统的大脑皮层电路模型由于动态系统中的扩展非线性而难以训练,导致优化问题中的非线性稳定性约束难以施加。相反,递归神经网络(RNN)在涉及序列数据的任务中表现出色,但缺乏生物合理性和可解释性。在这项工作中,我们通过将动态分裂归一化(DN)与 ORGaNICs 的稳定性联系起来来应对这些挑战。ORGaNICs 是一种生物学上可信的递归皮层电路模型,可动态实现 DN,并已被证明能模拟各种神经生理现象。通过使用李亚普诺夫的间接方法,我们证明了任意维度的 ORGaNICs 电路在递归权重矩阵为同一值时无条件局部稳定的显著特性。因此,我们将 ORGaNICs 与耦合阻尼谐振子系统联系起来,从而推导出电路的能量函数,为电路和单个神经元的目标提供了规范原理。此外,对于一般的递归权重矩阵,我们证明了二维模型的稳定性,并通过经验证明稳定性在更高维度上也是成立的。最后,我们证明 ORGaNICs 可以通过时间反向传播进行训练,而无需梯度剪切/缩放,这要归功于其固有的稳定性和自适应时间常数,它们解决了梯度爆炸、消失和振荡的问题。通过在 RNN 基准上评估该模型的性能,我们发现 ORGaNIC 在静态图像分类任务中的表现优于其他神经动力学模型,而在顺序任务中的表现则与 LSTM 不相上下。
{"title":"Unconditional stability of a recurrent neural circuit implementing divisive normalization.","authors":"Shivang Rawat, David J Heeger, Stefano Martiniani","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Stability in recurrent neural models poses a significant challenge, particularly in developing biologically plausible neurodynamical models that can be seamlessly trained. Traditional cortical circuit models are notoriously difficult to train due to expansive nonlinearities in the dynamical system, leading to an optimization problem with nonlinear stability constraints that are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in tasks involving sequential data but lack biological plausibility and interpretability. In this work, we address these challenges by linking dynamic divisive normalization (DN) to the stability of \"oscillatory recurrent gated neural integrator circuits\" (ORGaNICs), a biologically plausible recurrent cortical circuit model that dynamically achieves DN and that has been shown to simulate a wide range of neurophysiological phenomena. By using the indirect method of Lyapunov, we prove the remarkable property of unconditional local stability for an arbitrary-dimensional ORGaNICs circuit when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a system of coupled damped harmonic oscillators, which enables us to derive the circuit's energy function, providing a normative principle of what the circuit, and individual neurons, aim to accomplish. Further, for a generic recurrent weight matrix, we prove the stability of the 2D model and demonstrate empirically that stability holds in higher dimensions. Finally, we show that ORGaNICs can be trained by backpropagation through time without gradient clipping/scaling, thanks to its intrinsic stability property and adaptive time constants, which address the problems of exploding, vanishing, and oscillating gradients. By evaluating the model's performance on RNN benchmarks, we find that ORGaNICs outperform alternative neurodynamical models on static image classification tasks and perform comparably to LSTMs on sequential tasks.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11469413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A mathematical language for linking fine-scale structure in spikes from hundreds to thousands of neurons with behaviour. 将数百到数千个神经元尖峰的精细结构与行为联系起来的数学语言。
Pub Date : 2025-01-15
Alexandra N Busch, Roberto C Budzinski, Federico W Pasini, Ján Mináč, Jonathan A Michaels, Megan Roussy, Roberto A Gulli, Benjamin W Corrigan, J Andrew Pruszynski, Julio Martinez-Trujillo, Lyle E Muller

Recent advances in neural recording technology allow simultaneously recording action potentials from hundreds to thousands of neurons in awake, behaving animals. However, characterizing spike patterns in the resulting data, and linking these patterns to behaviour, remains a challenging task. The lack of a rigorous mathematical language for variable numbers of events (spikes) emitted by multiple agents (neurons) is an important limiting factor. We introduce a new mathematical operation to decompose complex spike patterns into a set of simple, structured elements. This creates a mathematical language that allows comparing spike patterns across trials, detecting sub-patterns, and making links to behaviour via a clear distance measure. We first demonstrate the method using Neuropixel recordings from macaque motor cortex. We then apply the method to dual Utah array recordings from macaque prefrontal cortex, where this technique reveals previously unseen structure that can predict both memory-guided decisions and errors in a virtual-reality working memory task. These results demonstrate that this technique provides a powerful new approach to understand structure in the spike times of neural populations, at a scale that will continue to grow more and more rapidly in upcoming years.

神经记录技术的最新进展允许同时记录醒着的、有行为的动物的数百到数千个神经元的动作电位。然而,表征结果数据中的峰值模式,并将这些模式与行为联系起来,仍然是一项具有挑战性的任务。对于由多个代理(神经元)发出的可变数量的事件(尖峰)缺乏严格的数学语言是一个重要的限制因素。我们引入了一种新的数学运算,将复杂的尖峰图案分解成一组简单的、结构化的元素。这创造了一种数学语言,可以比较试验中的峰值模式,检测子模式,并通过明确的距离测量与行为联系起来。我们将该方法应用于猕猴前额叶皮层的双犹他阵列记录,该技术揭示了以前看不见的结构,可以预测记忆引导的决策和虚拟现实工作记忆任务中的错误。这些结果表明,该技术提供了一种强大的新方法来理解神经群体尖峰时间的结构,其规模将在未来几年继续快速增长。
{"title":"A mathematical language for linking fine-scale structure in spikes from hundreds to thousands of neurons with behaviour.","authors":"Alexandra N Busch, Roberto C Budzinski, Federico W Pasini, Ján Mináč, Jonathan A Michaels, Megan Roussy, Roberto A Gulli, Benjamin W Corrigan, J Andrew Pruszynski, Julio Martinez-Trujillo, Lyle E Muller","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Recent advances in neural recording technology allow simultaneously recording action potentials from hundreds to thousands of neurons in awake, behaving animals. However, characterizing spike patterns in the resulting data, and linking these patterns to behaviour, remains a challenging task. The lack of a rigorous mathematical language for variable numbers of events (spikes) emitted by multiple agents (neurons) is an important limiting factor. We introduce a new mathematical operation to decompose complex spike patterns into a set of simple, structured elements. This creates a mathematical language that allows comparing spike patterns across trials, detecting sub-patterns, and making links to behaviour via a clear distance measure. We first demonstrate the method using Neuropixel recordings from macaque motor cortex. We then apply the method to dual Utah array recordings from macaque prefrontal cortex, where this technique reveals previously unseen structure that can predict both memory-guided decisions and errors in a virtual-reality working memory task. These results demonstrate that this technique provides a powerful new approach to understand structure in the spike times of neural populations, at a scale that will continue to grow more and more rapidly in upcoming years.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11643227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI Foundation Models for Wearable Movement Data in Mental Health Research. 注意力是所有你需要的活动?用于心理健康研究的可穿戴加速度计数据基础模型。
Pub Date : 2025-01-14
Franklin Y Ruan, Aiwei Zhang, Jenny Y Oh, SouYoung Jin, Nicholas C Jacobson

Pretrained foundation models and transformer architectures have driven the success of large language models (LLMs) and other modern AI breakthroughs. However, similar advancements in health data modeling remain limited due to the need for innovative adaptations. Wearable movement data offers a valuable avenue for exploration, as it's a core feature in nearly all commercial smartwatches, well established in clinical and mental health research, and the sequential nature of the data shares similarities to language. We introduce the Pretrained Actigraphy Transformer (PAT), the first open source foundation model designed for time-series wearable movement data. Leveraging transformer-based architectures and novel techniques, such as patch embeddings, and pretraining on data from 29,307 participants in a national U.S. sample, PAT achieves state-of-the-art performance in several mental health prediction tasks. PAT is also lightweight and easily interpretable, making it a robust tool for mental health research. GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/.

自20世纪70年代以来,可穿戴式加速度计(活动记录仪)为临床见解提供了有价值的数据,随着可穿戴设备的不断普及,它变得越来越重要。活动描记在研究和临床环境中的有效性在很大程度上取决于所使用的建模架构。为了解决这个问题,我们开发了预训练活动图转换器(PAT),这是第一个专门用于处理活动图的预训练和完全基于注意力的模型。在NHANES中,对29,307名参与者的活动描记进行了预训练,使PAT能够在心理健康领域的各种活动描记预测任务中进行微调,即使在数据有限的情况下也能提供最先进的性能。例如,当训练预测苯二氮卓类药物的使用时,仅使用500名标记参与者的活动描记图,PAT在最佳基线上实现了8.8个百分点的AUC改善。PAT拥有不到200万个参数和内置的模型可解释性,功能强大,但易于在卫生研究环境中部署。GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/。
{"title":"AI Foundation Models for Wearable Movement Data in Mental Health Research.","authors":"Franklin Y Ruan, Aiwei Zhang, Jenny Y Oh, SouYoung Jin, Nicholas C Jacobson","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Pretrained foundation models and transformer architectures have driven the success of large language models (LLMs) and other modern AI breakthroughs. However, similar advancements in health data modeling remain limited due to the need for innovative adaptations. Wearable movement data offers a valuable avenue for exploration, as it's a core feature in nearly all commercial smartwatches, well established in clinical and mental health research, and the sequential nature of the data shares similarities to language. We introduce the Pretrained Actigraphy Transformer (PAT), the first open source foundation model designed for time-series wearable movement data. Leveraging transformer-based architectures and novel techniques, such as patch embeddings, and pretraining on data from 29,307 participants in a national U.S. sample, PAT achieves state-of-the-art performance in several mental health prediction tasks. PAT is also lightweight and easily interpretable, making it a robust tool for mental health research. GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11623705/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142804036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MassSpecGym: A benchmark for the discovery and identification of molecules. MassSpecGym:发现和识别分子的基准。
Pub Date : 2025-01-14
Roman Bushuiev, Anton Bushuiev, Niek F de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, Marcus Ludwig, Nils A Haupt, Apurva Kalia, Corinna Brungs, Robin Schmid, Russell Greiner, Bo Wang, David S Wishart, Li-Ping Liu, Juho Rousu, Wout Bittremieux, Hannes Rost, Tytus D Mak, Soha Hassoun, Florian Huber, Justin J J van der Hooft, Michael A Stravs, Sebastian Böcker, Josef Sivic, Tomáš Pluskal

The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at url{https://github.com/pluskal-lab/MassSpecGym}.

发现和鉴定生物与环境样本中的分子对于推动生物医学和化学科学的发展至关重要。串联质谱(MS/MS)是高通量阐明分子结构的领先技术。然而,从质谱中解码分子结构是一项极具挑战性的工作,即使由人类专家来完成也是如此。因此,绝大多数获得的 MS/MS 图谱仍然无法解读,从而限制了我们对潜在(生物)化学过程的了解。尽管从 MS/MS 图谱预测分子结构的机器学习应用取得了几十年的进展,但由于缺乏标准数据集和评估协议,新方法的开发受到严重阻碍。为了解决这个问题,我们提出了 MassSpecGym -- 第一个从 MS/MS 数据中发现和识别分子的综合基准。我们的基准包括最大的公开高质量标记 MS/MS 图谱集,并定义了三个 MS/MS 注释挑战:文本{de novo}分子结构生成、分子检索和光谱模拟。它包括新的评估指标和泛化需求的数据拆分,从而实现了 MS/MS 注释任务的标准化,并使广泛的机器学习社区能够解决这一问题。MassSpecGym 在 url{https://github.com/pluskal-lab/MassSpecGym} 上公开发布。
{"title":"MassSpecGym: A benchmark for the discovery and identification of molecules.","authors":"Roman Bushuiev, Anton Bushuiev, Niek F de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, Marcus Ludwig, Nils A Haupt, Apurva Kalia, Corinna Brungs, Robin Schmid, Russell Greiner, Bo Wang, David S Wishart, Li-Ping Liu, Juho Rousu, Wout Bittremieux, Hannes Rost, Tytus D Mak, Soha Hassoun, Florian Huber, Justin J J van der Hooft, Michael A Stravs, Sebastian Böcker, Josef Sivic, Tomáš Pluskal","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at url{https://github.com/pluskal-lab/MassSpecGym}.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11581121/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Systematic Computational Method for Practical Identifiability Analysis in Mathematical Models Arising from Biology. 生物学数学模型中可识别性分析的系统计算框架。
Pub Date : 2025-01-12
Shun Wang, Wenrui Hao

Practical identifiability is a critical concern in data-driven modeling of mathematical systems. In this paper, we propose a novel framework for practical identifiability analysis to evaluate parameter identifiability in mathematical models of biological systems. Starting with a rigorous mathematical definition of practical identifiability, we demonstrate its equivalence to the invertibility of the Fisher Information Matrix. Our framework establishes the relationship between practical identifiability and coordinate identifiability, introducing a novel metric that simplifies and accelerates the evaluation of parameter identifiability compared to the profile likelihood method. Additionally, we introduce new regularization terms to address non-identifiable parameters, enabling uncertainty quantification and improving model reliability. To guide experimental design, we present an optimal data collection algorithm that ensures all model parameters are practically identifiable. Applications to Hill functions, neural networks, and dynamic biological models demonstrate the feasibility and efficiency of the proposed computational framework in uncovering critical biological processes and identifying key observable variables.

在数学系统的数据驱动建模中,实际可识别性是一个关键问题。本文提出了一种实用的可辨识性分析框架,用于评价生物系统数学模型中参数的可辨识性。从实际可辨识性的严格数学定义开始,我们证明了它与费雪信息矩阵的可逆性是等价的。我们的框架建立了实际可识别性和坐标可识别性之间的关系,引入了一种新的度量,与轮廓似然法相比,它简化和加速了参数可识别性的评估。此外,我们引入了新的正则化术语来处理不可识别的参数,使不确定性量化和提高模型可靠性。为了指导实验设计,我们提出了一个最佳的数据收集算法,以确保所有模型参数实际上是可识别的。Hill函数、神经网络和动态生物学模型的应用证明了所提出的计算框架在揭示关键生物过程和识别关键可观察变量方面的可行性和有效性。
{"title":"A Systematic Computational Method for Practical Identifiability Analysis in Mathematical Models Arising from Biology.","authors":"Shun Wang, Wenrui Hao","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Practical identifiability is a critical concern in data-driven modeling of mathematical systems. In this paper, we propose a novel framework for practical identifiability analysis to evaluate parameter identifiability in mathematical models of biological systems. Starting with a rigorous mathematical definition of practical identifiability, we demonstrate its equivalence to the invertibility of the Fisher Information Matrix. Our framework establishes the relationship between practical identifiability and coordinate identifiability, introducing a novel metric that simplifies and accelerates the evaluation of parameter identifiability compared to the profile likelihood method. Additionally, we introduce new regularization terms to address non-identifiable parameters, enabling uncertainty quantification and improving model reliability. To guide experimental design, we present an optimal data collection algorithm that ensures all model parameters are practically identifiable. Applications to Hill functions, neural networks, and dynamic biological models demonstrate the feasibility and efficiency of the proposed computational framework in uncovering critical biological processes and identifying key observable variables.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11722522/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142973702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reusable specimen-level inference in computational pathology.
Pub Date : 2025-01-10
Jakub R Kaczmarzyk, Rishul Sharma, Peter K Koo, Joel H Saltz

Foundation models for computational pathology have shown great promise for specimen-level tasks and are increasingly accessible to researchers. However, specimen-level models built on these foundation models remain largely unavailable, hindering their broader utility and impact. To address this gap, we developed SpinPath, a toolkit designed to democratize specimen-level deep learning by providing a zoo of pretrained specimen-level models, a Python-based inference engine, and a JavaScript-based inference platform. We demonstrate the utility of SpinPath in metastasis detection tasks across nine foundation models. SpinPath may foster reproducibility, simplify experimentation, and accelerate the adoption of specimen-level deep learning in computational pathology research.

{"title":"Reusable specimen-level inference in computational pathology.","authors":"Jakub R Kaczmarzyk, Rishul Sharma, Peter K Koo, Joel H Saltz","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Foundation models for computational pathology have shown great promise for specimen-level tasks and are increasingly accessible to researchers. However, specimen-level models built on these foundation models remain largely unavailable, hindering their broader utility and impact. To address this gap, we developed SpinPath, a toolkit designed to democratize specimen-level deep learning by providing a zoo of pretrained specimen-level models, a Python-based inference engine, and a JavaScript-based inference platform. We demonstrate the utility of SpinPath in metastasis detection tasks across nine foundation models. SpinPath may foster reproducibility, simplify experimentation, and accelerate the adoption of specimen-level deep learning in computational pathology research.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11759856/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MyESL: Sparse learning in molecular evolution and phylogenetic analysis. MyESL:分子进化和系统发育分析中的稀疏学习。
Pub Date : 2025-01-09
Maxwell Sanderford, Sudip Sharma, Glen Stecher, Jun Liu, Jieping Ye, Sudhir Kumar

Evolutionary sparse learning (ESL) uses a supervised machine learning approach, Least Absolute Shrinkage and Selection Operator (LASSO), to build models explaining the relationship between a hypothesis and the variation across genomic features (e.g., sites) in sequence alignments. ESL employs sparsity between and within the groups of genomic features (e.g., genomic loci) by using sparse-group LASSO. Although some software packages are available for performing sparse group LASSO, we found them less well-suited for processing and analyzing genome-scale data containing millions of features, such as bases. MyESL software fills the need for open-source software for conducting ESL analyses with facilities to pre-process the input hypotheses and large alignments, make LASSO flexible and computationally efficient, and post-process the output model to produce different metrics useful in functional or evolutionary genomics. MyESL can take phylogenetic trees and sequence alignments as input and transform them into numeric responses and features, respecetively. The model outputs are processed into user-friendly text and graphical files. The computational core of MyESL is written in C++, which offers model building with or without group sparsity, while the pre- and post-processing of inputs and model outputs is performed using customized functions written in Python. One of its applications in phylogenomics showcases the utility of MyESL. Our analysis of empirical genome-scale datasets shows that MyESL can build evolutionary models quickly and efficiently on a personal desktop, while other computational packages were unable due to their prohibitive requirements of computational resources and time. MyESL is available for Python environments on Linux and distributed as a standalone application for Windows and macOS. It is available from https://github.com/kumarlabgit/MyESL.

{"title":"MyESL: Sparse learning in molecular evolution and phylogenetic analysis.","authors":"Maxwell Sanderford, Sudip Sharma, Glen Stecher, Jun Liu, Jieping Ye, Sudhir Kumar","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Evolutionary sparse learning (ESL) uses a supervised machine learning approach, Least Absolute Shrinkage and Selection Operator (LASSO), to build models explaining the relationship between a hypothesis and the variation across genomic features (e.g., sites) in sequence alignments. ESL employs sparsity between and within the groups of genomic features (e.g., genomic loci) by using sparse-group LASSO. Although some software packages are available for performing sparse group LASSO, we found them less well-suited for processing and analyzing genome-scale data containing millions of features, such as bases. MyESL software fills the need for open-source software for conducting ESL analyses with facilities to pre-process the input hypotheses and large alignments, make LASSO flexible and computationally efficient, and post-process the output model to produce different metrics useful in functional or evolutionary genomics. MyESL can take phylogenetic trees and sequence alignments as input and transform them into numeric responses and features, respecetively. The model outputs are processed into user-friendly text and graphical files. The computational core of MyESL is written in C++, which offers model building with or without group sparsity, while the pre- and post-processing of inputs and model outputs is performed using customized functions written in Python. One of its applications in phylogenomics showcases the utility of MyESL. Our analysis of empirical genome-scale datasets shows that MyESL can build evolutionary models quickly and efficiently on a personal desktop, while other computational packages were unable due to their prohibitive requirements of computational resources and time. MyESL is available for Python environments on Linux and distributed as a standalone application for Windows and macOS. It is available from https://github.com/kumarlabgit/MyESL.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11760232/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How Large is the Universe of RNA-Like Motifs? A Clustering Analysis of RNA Graph Motifs Using Topological Descriptors. RNA 样式的宇宙有多大?利用拓扑描述符对 RNA 图元进行聚类分析
Pub Date : 2025-01-08
Rui Wang, Tamar Schlick

We introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% graphs correspond to approximately 200,000 known RNA atomic fragments (collected in 2021) using the RNA-as-Graphs (RAG) mapping method. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters, RNA-like cluster and non-RNA-like cluster. The distance of each dual graph to the center of the RNA-like cluster represents the likelihood of it belonging to RNA structures. From validation, our PSG-based RNA-like cluster includes 97.3% of the 121 known RNA dual graphs, suggesting good performance. Furthermore, 46.017% of the hypothetical RNAs are predicted to be RNA-like. Significantly, we observe that all the top 15 RNA-like dual graphs can be separated into multiple subgraphs, whereas the top 15 non-RNA-like dual graphs tend not to have any subgraphs. Moreover, a significant topological difference between top RNA-like and non-RNA-like graphs is evident when comparing their topological features. These findings provide valuable insights into the size of the RNA motif universe and RNA design strategies, offering a novel framework for predicting RNA graph topologies and guiding the discovery of novel RNA motifs, perhaps anti-viral therapeutics by subgraph assembly.

{"title":"How Large is the Universe of RNA-Like Motifs? A Clustering Analysis of RNA Graph Motifs Using Topological Descriptors.","authors":"Rui Wang, Tamar Schlick","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We introduce a computational topology-based approach with unsupervised machine-learning algorithms to estimate the database size and content of RNA-like graph topologies. Specifically, we apply graph theory enumeration to generate all 110,667 possible 2D dual graphs for vertex numbers ranging from 2 to 9. Among them, only 0.11% graphs correspond to approximately 200,000 known RNA atomic fragments (collected in 2021) using the RNA-as-Graphs (RAG) mapping method. The remaining 99.89% of the dual graphs may be RNA-like or non-RNA-like. To determine which dual graphs in the 99.89% hypothetical set are more likely to be associated with RNA structures, we apply computational topology descriptors using the Persistent Spectral Graphs (PSG) method to characterize each graph using 19 PSG-based features and use clustering algorithms that partition all possible dual graphs into two clusters, RNA-like cluster and non-RNA-like cluster. The distance of each dual graph to the center of the RNA-like cluster represents the likelihood of it belonging to RNA structures. From validation, our PSG-based RNA-like cluster includes 97.3% of the 121 known RNA dual graphs, suggesting good performance. Furthermore, 46.017% of the hypothetical RNAs are predicted to be RNA-like. Significantly, we observe that all the top 15 RNA-like dual graphs can be separated into multiple subgraphs, whereas the top 15 non-RNA-like dual graphs tend not to have any subgraphs. Moreover, a significant topological difference between top RNA-like and non-RNA-like graphs is evident when comparing their topological features. These findings provide valuable insights into the size of the RNA motif universe and RNA design strategies, offering a novel framework for predicting RNA graph topologies and guiding the discovery of novel RNA motifs, perhaps anti-viral therapeutics by subgraph assembly.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11760235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy Dynamics Powered by Traction and Stress Control Formation and Motion of +1/2 Topological Defects in Epithelial Cell Monolayers.
Pub Date : 2025-01-08
Pradip K Bera, Molly McCord, Jun Zhang, Jacob Notbohm

In confluent cell monolayers, patterns of cell forces and motion are systematically altered near topological defects in cell shape. In turn, defects have been proposed to alter cell density, extrusion, and invasion, but it remains unclear how the defects form and how they affect cell forces and motion. Here, we studied +1/2 defects, and, in contrast to prior studies, we observed both tail-to-head and head-to-tail defect motion occurring at the same time in the same cell monolayer. We quantified the cell velocities, the tractions at the cell-substrate interface, and stresses within the cell layer near +1/2 defects. Results revealed that both traction and stress are sources of activity within the epithelial cell monolayer, with their competition defining whether the cells inject or dissipate energy and determining the direction of motion of +1/2 defects. Interestingly, patterns of motion, traction, stress, and energy injection near +1/2 defects existed before defect formation, suggesting that defects form as a result of spatially coordinated patterns in cell forces and motion. These findings reverse the current picture, from one in which defects define the cell forces and motion to one in which coordinated patterns of cell forces and motion cause defects to form and move.

{"title":"Energy Dynamics Powered by Traction and Stress Control Formation and Motion of +1/2 Topological Defects in Epithelial Cell Monolayers.","authors":"Pradip K Bera, Molly McCord, Jun Zhang, Jacob Notbohm","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In confluent cell monolayers, patterns of cell forces and motion are systematically altered near topological defects in cell shape. In turn, defects have been proposed to alter cell density, extrusion, and invasion, but it remains unclear how the defects form and how they affect cell forces and motion. Here, we studied +1/2 defects, and, in contrast to prior studies, we observed both tail-to-head and head-to-tail defect motion occurring at the same time in the same cell monolayer. We quantified the cell velocities, the tractions at the cell-substrate interface, and stresses within the cell layer near +1/2 defects. Results revealed that both traction and stress are sources of activity within the epithelial cell monolayer, with their competition defining whether the cells inject or dissipate energy and determining the direction of motion of +1/2 defects. Interestingly, patterns of motion, traction, stress, and energy injection near +1/2 defects existed before defect formation, suggesting that defects form as a result of spatially coordinated patterns in cell forces and motion. These findings reverse the current picture, from one in which defects define the cell forces and motion to one in which coordinated patterns of cell forces and motion cause defects to form and move.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11759851/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143049098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1