bioRxiv : the preprint server for biology最新文献_第3页

Singe cell RNA sequencing data processing using cloud-based serverless computing. 使用基于云的无服务器计算的单细胞RNA测序数据处理。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.1101/2025.04.26.650787

Ling-Hong Hung, Niharika Nasam, Chris Biju, Wes Lloyd, Ka Yee Yeung

Singe cell RNA sequencing (scRNA-seq) has become a routine method for measuring cell activities. Processing large scRNA-seq datasets requires high-performance computing resources. The emergence of cloud computing allows us to leverage its on-demand capabilities without major investment in infrastructure. Serverless computing provides cost efficiency by allowing users to pay only for actual resource usage, eliminating the necessity for pre-allocated server capacities. Additionally, there is no requirement to set up servers in advance. We present a novel and generalizable methodology using serverless cloud computing to accelerate computationally intensive workflows. We create an on-demand "supercomputer" using rapidly deployable cloud serverless functions as automatically provisioned computation units. We tested our methodology of optimizing a scRNA-seq workflow by leveraging serverless functions on the cloud using two publicly available peripheral blood mononuclear cell (PBMC) datasets. In addition, we demonstrate our approach using data generated by the NIH MorPhiC program, where we process a 450 GB human scRNA-seq dataset across 86 cell lines designed to study the temporal impact of perturbations on pancreatic differentiation. We compared the total execution time of the scRNA-seq serverless workflow with the traditional workflow without using serverless functions, and demonstrate major speedup for large scRNA-seq datasets.

单细胞RNA测序（scRNA-seq）已成为测定细胞活性的常规方法。处理大型scRNA-seq数据集需要高性能的计算资源。云计算的出现使我们能够利用其按需功能，而无需在基础设施上进行重大投资。无服务器计算通过允许用户仅为实际资源使用付费，从而消除了预先分配服务器容量的必要性，从而提供了成本效率。此外，不需要预先设置服务器。我们提出了一种新的、可推广的方法，使用无服务器云计算来加速计算密集型工作流程。我们创建了一个按需“超级计算机”，使用快速部署的云无服务器功能作为自动配置的计算单元。我们通过使用两个公开可用的外周血单核细胞（PBMC）数据集，利用云上的无服务器功能，测试了我们优化scRNA-seq工作流的方法。此外，我们使用NIH MorPhiC项目生成的数据证明了我们的方法，在该项目中，我们处理了横跨86个细胞系的450gb人类scRNA-seq数据集，旨在研究扰动对胰腺分化的时间影响。我们比较了scRNA-seq无服务器工作流与不使用无服务器功能的传统工作流的总执行时间，并演示了大型scRNA-seq数据集的主要加速。

{"title":"Singe cell RNA sequencing data processing using cloud-based serverless computing.","authors":"Ling-Hong Hung, Niharika Nasam, Chris Biju, Wes Lloyd, Ka Yee Yeung","doi":"10.1101/2025.04.26.650787","DOIUrl":"10.1101/2025.04.26.650787","url":null,"abstract":"Singe cell RNA sequencing (scRNA-seq) has become a routine method for measuring cell activities. Processing large scRNA-seq datasets requires high-performance computing resources. The emergence of cloud computing allows us to leverage its on-demand capabilities without major investment in infrastructure. Serverless computing provides cost efficiency by allowing users to pay only for actual resource usage, eliminating the necessity for pre-allocated server capacities. Additionally, there is no requirement to set up servers in advance. We present a novel and generalizable methodology using serverless cloud computing to accelerate computationally intensive workflows. We create an on-demand \"supercomputer\" using rapidly deployable cloud serverless functions as automatically provisioned computation units. We tested our methodology of optimizing a scRNA-seq workflow by leveraging serverless functions on the cloud using two publicly available peripheral blood mononuclear cell (PBMC) datasets. In addition, we demonstrate our approach using data generated by the NIH MorPhiC program, where we process a 450 GB human scRNA-seq dataset across 86 cell lines designed to study the temporal impact of perturbations on pancreatic differentiation. We compared the total execution time of the scRNA-seq serverless workflow with the traditional workflow without using serverless functions, and demonstrate major speedup for large scRNA-seq datasets.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12934634/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147314400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Oral 4'fluorouridine provides postexposure protection against lethal Nipah virus infection. 口服4'氟吡啶可提供暴露后保护，防止致命的尼帕病毒感染。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2026.02.21.707194

Robert W Cross, Declan D Pigeaud, Victoriya Borisevich, Krystle N Agans, Mack B Harrison, Rachel O'Toole, Abhishek N Prasad, Thomas W Geisbert

There are no approved medical countermeasures for combatting Nipah virus (NiV) which causes regular outbreaks in humans and animals in South and Southeast Asia with mortality rates in humans ranging from 40% to more than 90%. Recently, it was shown that 4'-fluorouridine (4'-FlU; EIDD-2749), an orally available ribonucleoside analog, protected guinea pigs and nonhuman primates from lethal challenge with Lassa virus and that 4'-FlU has in vitro antiviral activity against NiV. Here, we assessed the postexposure protective efficacy of 4'-FlU in a lethal hamster model of NiV infection. Daily treatment with 4'-FlU beginning 3 days after exposure to NiV resulted in complete protection from lethal infection. Our findings support the further development of 4'-FlU as a therapy for NiV disease.

尼帕病毒在南亚和东南亚经常在人类和动物中暴发，人类死亡率从40%到90%以上不等，目前尚无经批准的防治尼帕病毒的医疗对策。最近，研究表明，4′-氟吡啶（4′-FlU; EIDD-2749）是一种可口服的核糖核苷类似物，可保护豚鼠和非人灵长类动物免受拉沙病毒的致命攻击，并且4′-FlU对NiV具有体外抗病毒活性。在这里，我们评估了4'-FlU在致命的NiV感染仓鼠模型中的暴露后保护作用。在接触NiV后3天开始每日使用4'-FlU治疗，可完全保护患者免受致命感染。我们的发现支持了4′-FlU作为治疗NiV疾病的进一步发展。

引用次数: 0

Accuracy of occurrence and abundance estimates from insect metabarcoding. 昆虫元条形码估计发生度和丰度的准确性。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2026.02.20.707016

Ela Iwaszkiewicz-Eggebrecht, Emma Granqvist, Karol H Nowak, Catalina Valdivia, Mateusz Buczek, Amrita Srivathsan, Emily Hartop, Andreia Miraldo, Tomas Roslin, Ayco J M Tack, Piotr Łukasik, Rudolf Meier, Fredrik Ronquist

1. DNA metabarcoding-high-throughput sequencing of barcode regions from bulk samples-has become a key tool for insect biodiversity assessment. Yet, how methodological choices affect the accuracy of metabarcoding data remains insufficiently explored. In this paper, we ask: (1) How does the lysis method (non-destructive lysis vs. destructive homogenization) affect community recovery? (2) How comprehensively does metabarcoding capture species richness? (3) To what extent can spike-ins improve abundance estimates? (4) How accurately can species abundances be estimated?2. We evaluated the accuracy of insect metabarcoding using 4,749 bulk samples from a large-scale biodiversity survey subjected to mild lysis. Of these samples, 856 were also homogenized, allowing a systematic comparison of the effect of alternative treatments. To potentially improve abundance estimates, we added six biological spike-ins (i.e., foreign insects) to all samples, and two synthetic spike-ins (artificial DNA fragments) to the homogenization treatment. In addition, we established the contents of 15 samples by individually barcoding all specimens, enabling direct assessment of occurrence and abundance estimates.3. Our results revealed consistent differences between destructive and non-destructive treatments. While both methods reliably detected the majority of species, small and soft-bodied taxa were more often recovered after mild lysis than after homogenization, while the reverse was true for heavily sclerotized, hairy, and large taxa. Using biological spike-ins for calibration reduced the variance in read numbers per specimen considerably, especially in homogenized samples, while synthetic spike-ins were less effective. In a Bayesian analysis, where species data were matched to the best-fitting spike-in calibration curve, accurate abundance estimates (+/-1 individual) were obtained for 72.9% of species occurrences.4. Our results show that it is possible to obtain reasonably accurate abundance estimates from metabarcoding data, and that mild lysis and homogenization result in different taxon-specific biases in terms of occurrence data, with neither method outperforming the other. Accuracy is improved by homogenization rather than mild lysis of samples, and by the use of biological rather than synthetic spike-ins. Together, these findings provide a major step towards robust, quantitative biodiversity monitoring using DNA-metabarcoding.

1. DNA元条形码技术是对大量样本进行高通量测序，已成为昆虫生物多样性评估的重要工具。然而，方法选择如何影响元条形码数据的准确性仍然没有得到充分的探讨。在本文中，我们的问题是：(1)裂解方法（非破坏性裂解与破坏性均质化）如何影响群落恢复？(2)元条形码捕获物种丰富度的全面程度如何？(3)峰值效应能在多大程度上改善丰度估算？(4)物种丰度的估计有多准确？我们使用来自大规模生物多样性调查的4,749个散装样品进行轻度裂解，评估了昆虫元条形码的准确性。在这些样本中，856个也进行了均质处理，以便系统地比较不同处理的效果。为了潜在地提高丰度估计，我们在所有样品中添加了6个生物尖刺（即外来昆虫），并在均质处理中添加了2个合成尖刺（人工DNA片段）。此外，我们还对15个样品进行了单独的条形码鉴定，确定了样品的含量，从而实现了发生度和丰度的直接评估。我们的结果揭示了破坏性和非破坏性治疗之间的一致差异。虽然这两种方法都能可靠地检测到大多数物种，但较小和软体的分类群在轻度裂解后比均质化后更容易恢复，而对于严重硬化、多毛和大型分类群则相反。使用生物尖刺进行校准大大减少了每个标本的读数差异，特别是在均质样品中，而合成尖刺的效果较差。在贝叶斯分析中，当物种数据与最佳拟合的峰值校准曲线相匹配时，72.9%的物种发现获得了准确的丰度估计（+/-1个体）。我们的研究结果表明，可以从元条形码数据中获得相当准确的丰度估计，并且轻度裂解和均质化会导致不同的分类群特异性偏差，就发生数据而言，两种方法都不优于其他方法。通过均质化而不是温和的裂解样品，以及通过使用生物而不是合成尖峰来提高准确性。总之，这些发现为利用dna元条形码进行稳健、定量的生物多样性监测迈出了重要的一步。

{"title":"Accuracy of occurrence and abundance estimates from insect metabarcoding.","authors":"Ela Iwaszkiewicz-Eggebrecht, Emma Granqvist, Karol H Nowak, Catalina Valdivia, Mateusz Buczek, Amrita Srivathsan, Emily Hartop, Andreia Miraldo, Tomas Roslin, Ayco J M Tack, Piotr Łukasik, Rudolf Meier, Fredrik Ronquist","doi":"10.64898/2026.02.20.707016","DOIUrl":"10.64898/2026.02.20.707016","url":null,"abstract":"1. DNA metabarcoding-high-throughput sequencing of barcode regions from bulk samples-has become a key tool for insect biodiversity assessment. Yet, how methodological choices affect the accuracy of metabarcoding data remains insufficiently explored. In this paper, we ask: (1) How does the lysis method (non-destructive lysis vs. destructive homogenization) affect community recovery? (2) How comprehensively does metabarcoding capture species richness? (3) To what extent can spike-ins improve abundance estimates? (4) How accurately can species abundances be estimated?2. We evaluated the accuracy of insect metabarcoding using 4,749 bulk samples from a large-scale biodiversity survey subjected to mild lysis. Of these samples, 856 were also homogenized, allowing a systematic comparison of the effect of alternative treatments. To potentially improve abundance estimates, we added six biological spike-ins (i.e., foreign insects) to all samples, and two synthetic spike-ins (artificial DNA fragments) to the homogenization treatment. In addition, we established the contents of 15 samples by individually barcoding all specimens, enabling direct assessment of occurrence and abundance estimates.3. Our results revealed consistent differences between destructive and non-destructive treatments. While both methods reliably detected the majority of species, small and soft-bodied taxa were more often recovered after mild lysis than after homogenization, while the reverse was true for heavily sclerotized, hairy, and large taxa. Using biological spike-ins for calibration reduced the variance in read numbers per specimen considerably, especially in homogenized samples, while synthetic spike-ins were less effective. In a Bayesian analysis, where species data were matched to the best-fitting spike-in calibration curve, accurate abundance estimates (+/-1 individual) were obtained for 72.9% of species occurrences.4. Our results show that it is possible to obtain reasonably accurate abundance estimates from metabarcoding data, and that mild lysis and homogenization result in different taxon-specific biases in terms of occurrence data, with neither method outperforming the other. Accuracy is improved by homogenization rather than mild lysis of samples, and by the use of biological rather than synthetic spike-ins. Together, these findings provide a major step towards robust, quantitative biodiversity monitoring using DNA-metabarcoding.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12934785/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147314240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Stochastic System Identification Toolkit (SSIT) to model, fit, predict, and design experiments. 随机系统识别工具包（SSIT）模型，拟合，预测和设计实验。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2026.02.20.707039

Alex N Popinga, Jack Forman, Dmitri Svetlov, Huy Vo, Brian Munsky

Biological data is prone to both intrinsic and extrinsic noise and variability between experimental replicas. That same stochasticity and heterogeneity can carry information about underlying biochemical mechanisms but, if not incorporated in modeling and probabilistic inference, can also bias parameter estimates and misguide predictions and, subsequently, experiment design. Mechanistic inference typically requires lengthy simulations (e.g., the Stochastic Simulation Algorithm (SSA)); approximations to chemical master equation (CME) solutions that lack rigorous error tracking; or deterministic averaging that lacks the complexity necessary to reflect the data. We introduce the Stochastic System Identification Toolkit (SSIT) - a fast, flexible, and open-source software package available on GitHub that makes use of MATLAB's efficient and diverse computational architecture. The SSIT is designed for building, simulating, and solving chemical reaction models using ODEs, moments, SSA, Finite State Projection truncations of the CME, or hybrid methods; sensitivity analysis and Fisher information quantification; parameter fitting using likelihood-or Bayesian-based methods; handling of experimental noise and measurement errors using probabilistic distortion operators; and sequential experiment design that empowers users to save time and resources while gaining the most information possible out of their data. The SSIT also offers advanced modeling tools, including model reduction methods for increased efficiency and joint fitting of models and datasets with overlapping reactions/parameters. To facilitate the ease and speed of use, the SSIT provides a graphical user interface and ready-made, adaptable pipelines that can be run in the background from commandline or high-performance computing clusters. We demonstrate features of the SSIT on two experimental datasets: the first consists of published mRNA count data that reflect Saccharomyces cerevisiae yeast cell response to osmotic shock using single-cell single-molecule fluorescence in situ hybridization; the second consists of single-cell RNA sequencing measurements of 151 activating genes in breast cancer cells following treatment with dexamethasone.Author summary: We present the Stochastic System Identification Toolkit (SSIT) to model, fit, and predict any data that can be interpreted as changing populations or counts through time, including but not limited to single-cell experiments, economics, epidemiology, ecology, sociology, agriculture, and biotechnology. The SSIT was constructed particularly for stochastic modeling, which is important for systems whose states may experience significant fluctuations from mean behavior, thus affecting the inference of the underlying rate parameters and predictions of subsequent behavior. The SSIT provides statistical inference tools for parameter estimation; sensitivity analysis and information calculation; handling of distortions to

生物数据容易受到内在和外在的噪声以及实验复制品之间的可变性的影响。同样的随机性和异质性可以携带潜在生化机制的信息，但如果不将其纳入建模和概率推断中，也会使参数估计产生偏差，并误导预测，进而影响实验设计。机械推理通常需要冗长的模拟（例如，随机模拟算法（SSA））；缺乏严格误差跟踪的化学主方程（CME）解的近似；或者缺乏反映数据所需的复杂性的确定性平均。我们介绍随机系统识别工具包(SSIT) -一个快速，灵活的开源软件包，可以在GitHub上使用MATLAB的高效和多样化的计算架构。SSIT设计用于使用ode，矩，SSA， CME的有限状态投影截断或混合方法构建，模拟和求解化学反应模型；敏感性分析和Fisher信息量化；基于似然或贝叶斯方法的参数拟合；利用概率失真算子处理实验噪声和测量误差；顺序实验设计，使用户能够节省时间和资源，同时从数据中获得尽可能多的信息。SSIT还提供先进的建模工具，包括提高效率的模型简化方法和具有重叠反应/参数的模型和数据集的联合拟合。为了方便和快速使用，SSIT提供了一个图形用户界面和现成的、适应性强的管道，可以在命令行或高性能计算集群的后台运行。我们在两个实验数据集上展示了SSIT的特征：第一个数据集由已发表的mRNA计数数据组成，该数据反映了使用单细胞单分子荧光原位杂交的酵母细胞对渗透休克的反应；第二项是对地塞米松治疗后乳腺癌细胞中151个激活基因的单细胞RNA测序测量。作者总结：我们提出了随机系统识别工具包（SSIT）来建模，拟合和预测任何可以解释为随时间变化的人口或数量的数据，包括但不限于单细胞实验，经济学，流行病学，生态学，社会学，农业和生物技术。SSIT是专门为随机建模而构建的，这对于系统的状态可能经历从平均行为的显著波动，从而影响潜在速率参数的推断和后续行为的预测的系统是重要的。SSIT为参数估计提供了统计推断工具；敏感性分析与信息计算；处理由实验和/或测量过程引起的概率分布失真（例如，单细胞RNA序列数据和总荧光强度与点计数/点分析的差异）；定量实验设计。SSIT还提供了各种复杂的建模工具，包括模型简化方法和组合模型/数据集的拟合，这些模型/数据集具有一些共同的行为，但仍然不同（例如，不同的基因响应单一刺激）。SSIT生成简单、高效的分析管道，可以在MATLAB环境、命令行后台或高性能计算集群中运行，从而促进用户对他们的下一组实验做出明智的、时间和成本效益的决策。

{"title":"The Stochastic System Identification Toolkit (SSIT) to model, fit, predict, and design experiments.","authors":"Alex N Popinga, Jack Forman, Dmitri Svetlov, Huy Vo, Brian Munsky","doi":"10.64898/2026.02.20.707039","DOIUrl":"10.64898/2026.02.20.707039","url":null,"abstract":"Biological data is prone to both intrinsic and extrinsic noise and variability between experimental replicas. That same stochasticity and heterogeneity can carry information about underlying biochemical mechanisms but, if not incorporated in modeling and probabilistic inference, can also bias parameter estimates and misguide predictions and, subsequently, experiment design. Mechanistic inference typically requires lengthy simulations (e.g., the Stochastic Simulation Algorithm (SSA)); approximations to chemical master equation (CME) solutions that lack rigorous error tracking; or deterministic averaging that lacks the complexity necessary to reflect the data. We introduce the Stochastic System Identification Toolkit (SSIT) - a fast, flexible, and open-source software package available on GitHub that makes use of MATLAB's efficient and diverse computational architecture. The SSIT is designed for building, simulating, and solving chemical reaction models using ODEs, moments, SSA, Finite State Projection truncations of the CME, or hybrid methods; sensitivity analysis and Fisher information quantification; parameter fitting using likelihood-or Bayesian-based methods; handling of experimental noise and measurement errors using probabilistic distortion operators; and sequential experiment design that empowers users to save time and resources while gaining the most information possible out of their data. The SSIT also offers advanced modeling tools, including model reduction methods for increased efficiency and joint fitting of models and datasets with overlapping reactions/parameters. To facilitate the ease and speed of use, the SSIT provides a graphical user interface and ready-made, adaptable pipelines that can be run in the background from commandline or high-performance computing clusters. We demonstrate features of the SSIT on two experimental datasets: the first consists of published mRNA count data that reflect Saccharomyces cerevisiae yeast cell response to osmotic shock using single-cell single-molecule fluorescence in situ hybridization; the second consists of single-cell RNA sequencing measurements of 151 activating genes in breast cancer cells following treatment with dexamethasone.Author summary: We present the Stochastic System Identification Toolkit (SSIT) to model, fit, and predict any data that can be interpreted as changing populations or counts through time, including but not limited to single-cell experiments, economics, epidemiology, ecology, sociology, agriculture, and biotechnology. The SSIT was constructed particularly for stochastic modeling, which is important for systems whose states may experience significant fluctuations from mean behavior, thus affecting the inference of the underlying rate parameters and predictions of subsequent behavior. The SSIT provides statistical inference tools for parameter estimation; sensitivity analysis and information calculation; handling of distortions to ","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12934706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147314311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Process for Standardizing and Assessing the Parameters Governing MS2 Virus-Like Particle Reassembly around Nucleic Acid Cargo. 控制MS2病毒样粒子重组的过程级参数的决定因素。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2025.12.02.691839

Daniel de Castro Assumpcao, Emma Sofia Vinokour, Madeline Marie Mills, Shiqi Liang, Carolyn Elaine Mills, Aline Carvalho da Costa, Nolan Warren Kennedy, Danielle Tullman-Ercek

MS2 virus-like particles (VLPs) are widely used as protein nanocages for cargo encapsulation, yet in vitro disassembly/reassembly protocols remain poorly standardized, and reassembly yields are reported inconsistently. As a result, the same experiments reported in literature produce widely divergent yields, limiting reproducibility and cross-study comparability. Here, we introduce a cargo-specific, quantitative framework for standardized MS2 VLP reassembly yield determination. We evaluate commonly used disassembly and post-disassembly processing methods and identify practical trade-offs between protein recovery, accessibility, and reproducibility. Reassembly yield is quantified using size exclusion chromatography calibrated against purified VLP standards, enabling robust, cargo-specific yield measurement. Using this framework, we apply a full factorial design of experiments to quantify the individual and combined effects of coat protein concentration, ionic strength, buffer pH, and molecular crowding on reassembly yield. The resulting statistical model explains more than 99% of the explainable variance and its linear fit to the experimental data indicates that optimal reassembly conditions extend beyond those tested to date. Protein concentration and ionic strength dominate reassembly yield, whereas pH and osmolyte concentration contribute more modestly within the tested ranges. Finally, we propose practical guidelines for standardized MS2 VLP disassembly, reassembly, and yield reporting, defining a transferable operating envelope for MS2 VLP reconstruction. While demonstrated here using a single nucleic acid cargo (tr-DNA), the framework is readily extensible to alternative cargos and coat protein variants.

MS2病毒样颗粒（vlp）广泛用于纳米技术和治疗递送。虽然货物通常是通过拆卸蛋白质外壳并在感兴趣的货物周围重新组装来装载到VLPs中，但拆卸和重组的协议仍然不一致且缺乏标准化。在这里，我们系统地评估了广泛的报告技术，并通过实验评估了这一过程的关键步骤。首先，我们优化了酸基拆解和除酸方法，并确定了速度、采收率和可及性之间的权衡。接下来，我们研究了影响VLP重组产率的实验条件。我们建立了一个标准化的重组产率定量框架，使用尺寸排除色谱对纯化的VLP标准进行校准。然后，我们在实验的全因子设计中采用这一框架，揭示氯化钠浓度的增加强烈抑制重组效率，而pH和拥挤剂仅发挥微小的影响。线性模型拟合实验数据，表明最佳重组条件超出了迄今为止测试的条件。我们的研究结果为MS2 VLP的拆卸和重组提供了实用的指导方针和可重复的框架，使跨应用程序的封装策略更加可靠。

{"title":"Process for Standardizing and Assessing the Parameters Governing MS2 Virus-Like Particle Reassembly around Nucleic Acid Cargo.","authors":"Daniel de Castro Assumpcao, Emma Sofia Vinokour, Madeline Marie Mills, Shiqi Liang, Carolyn Elaine Mills, Aline Carvalho da Costa, Nolan Warren Kennedy, Danielle Tullman-Ercek","doi":"10.64898/2025.12.02.691839","DOIUrl":"10.64898/2025.12.02.691839","url":null,"abstract":"MS2 virus-like particles (VLPs) are widely used as protein nanocages for cargo encapsulation, yet in vitro disassembly/reassembly protocols remain poorly standardized, and reassembly yields are reported inconsistently. As a result, the same experiments reported in literature produce widely divergent yields, limiting reproducibility and cross-study comparability. Here, we introduce a cargo-specific, quantitative framework for standardized MS2 VLP reassembly yield determination. We evaluate commonly used disassembly and post-disassembly processing methods and identify practical trade-offs between protein recovery, accessibility, and reproducibility. Reassembly yield is quantified using size exclusion chromatography calibrated against purified VLP standards, enabling robust, cargo-specific yield measurement. Using this framework, we apply a full factorial design of experiments to quantify the individual and combined effects of coat protein concentration, ionic strength, buffer pH, and molecular crowding on reassembly yield. The resulting statistical model explains more than 99% of the explainable variance and its linear fit to the experimental data indicates that optimal reassembly conditions extend beyond those tested to date. Protein concentration and ionic strength dominate reassembly yield, whereas pH and osmolyte concentration contribute more modestly within the tested ranges. Finally, we propose practical guidelines for standardized MS2 VLP disassembly, reassembly, and yield reporting, defining a transferable operating envelope for MS2 VLP reconstruction. While demonstrated here using a single nucleic acid cargo (tr-DNA), the framework is readily extensible to alternative cargos and coat protein variants.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12714026/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145807064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Demographic History of a Prairie Vole ( Microtus Ochrogaster ) Breeding Colony (2004-2020). 草原田鼠（Microtus Ochrogaster）繁殖群体的人口统计学历史（2004-2020）。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2026.02.20.707040

Adele M H Seelke, Christina L Hung, Sabrina L Mederos, Sophia Rogers, Tiffany Lam, Lauren A Meckler, Karen L Bales

Prairie voles ( Microtus ochrogaster ) are highly social rodents that have become a valuable animal model for studying social attachment, pair bonding, parental care, and the neurobiological mechanisms underlying social behavior. In recent years, due in part to the publication of the prairie vole genome and deeper mechanistic understanding of their social behavior, prairie voles have become a more popular research model, especially for translational research. However, generating reliable and reproducible findings requires effective colony management, including thoughtful breeding strategies, consistent husbandry practices, and clear documentation. In this paper, we describe the demographic history of and husbandry techniques employed in our prairie vole breeding colony at UC Davis from 2004 to 2020. Well-organized and transparent colony management allows for the preservation of informative behavioral traits in prairie voles and strengthens the impact of the prairie vole model across behavioral and biomedical science.

草原田鼠（Microtus ochrogaster）是一种高度社会化的啮齿动物，已成为研究社会依恋、配偶结合、亲代照顾和社会行为背后的神经生物学机制的有价值的动物模型。近年来，部分由于草原田鼠基因组的发表和对其社会行为的更深入的机制理解，草原田鼠已经成为一个更受欢迎的研究模型，特别是在转化研究中。然而，产生可靠和可重复的发现需要有效的群体管理，包括周到的育种策略，一致的饲养实践和清晰的文件。本文描述了2004年至2020年加州大学戴维斯分校草原田鼠繁殖地的人口统计历史和饲养技术。良好的组织和透明的群体管理允许保存草原田鼠的信息行为特征，并加强草原田鼠模型在行为和生物医学科学中的影响。

{"title":"A Demographic History of a Prairie Vole ( Microtus Ochrogaster ) Breeding Colony (2004-2020).","authors":"Adele M H Seelke, Christina L Hung, Sabrina L Mederos, Sophia Rogers, Tiffany Lam, Lauren A Meckler, Karen L Bales","doi":"10.64898/2026.02.20.707040","DOIUrl":"10.64898/2026.02.20.707040","url":null,"abstract":"Prairie voles ( Microtus ochrogaster ) are highly social rodents that have become a valuable animal model for studying social attachment, pair bonding, parental care, and the neurobiological mechanisms underlying social behavior. In recent years, due in part to the publication of the prairie vole genome and deeper mechanistic understanding of their social behavior, prairie voles have become a more popular research model, especially for translational research. However, generating reliable and reproducible findings requires effective colony management, including thoughtful breeding strategies, consistent husbandry practices, and clear documentation. In this paper, we describe the demographic history of and husbandry techniques employed in our prairie vole breeding colony at UC Davis from 2004 to 2020. Well-organized and transparent colony management allows for the preservation of informative behavioral traits in prairie voles and strengthens the impact of the prairie vole model across behavioral and biomedical science.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12934815/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147314070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reprogramming CAR T-Cells with designed bioPROTACs. 用设计的生物protacs重新编程CAR - t细胞。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2026.02.21.706835

Vivek S Peche, Sebastian Kenny, Tae Gun Kang, Brian Coventry, Tian Mi, Inna Goreshnik, Mariana Garcia Sanchez, Reid Martin, Macey Smith, Dionne Vafeados, Rahul S Kathayat, Yu Kaiwen, Zuo-Fei Yuan, Long Wu, Anthony High, Andrew Nemecek, Elizabeth Wickmann, Adeleye Adeshakin, Francesca Ferrara, Robert E Throm, Taosheng Chen, Benjamin Youngblood, David Baker, Stephen Gottschalk

Gene editing has been used to enhance CAR T-cell function by disrupting negative regulators but has limitations. Here we show that de novo-designed generated targeted degraders (bioPROTACs) provide an alternative approach. Expression of bioPROTACs in CAR T-cells targeting DNMT3A, a key regulator of T-cell exhaustion, phenocopied gene knockout. Our reversible, non-gene editing approach provides a tunable strategy to reprogram T-cell fate which should be broadly applicable for next-generation cell therapies.

基因编辑已经被用于通过破坏负调节因子来增强CAR - t细胞的功能，但它有局限性。在这里，我们展示了新设计的生成靶向降解剂（bioPROTACs）提供了一种替代方法。靶向DNMT3A的CAR - t细胞中bioPROTACs的表达，DNMT3A是t细胞衰竭的关键调节因子，表型基因敲除。我们的可逆、非基因编辑方法提供了一种可调策略来重新编程t细胞的命运，这应该广泛适用于下一代细胞疗法。

引用次数: 0

The Untangle Challenge for accurate ensemble models. 解开缠结挑战精确集成模型。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2026.02.21.706873

Mehagan S Hopkins, Thomas C Terwilliger, Pavel V Afonine, Helen M Ginn, James M Holton

We report the discovery of a new class of local minima that has severely limited the accuracy of macromolecular models. Termed density misfit barrier traps, these minima explain much of the poor fit between macromolecular models and experimental data relative to that of smaller molecules: not just high R factors, but distorted chemical geometry. We postulated that proteins exist as an ensemble of conformations that each have good geometry, but refinement algorithms have been unable to converge to them due to a tangling phenomenon arising from these traps. To demonstrate, a synthetic ground truth data set was generated, consisting of a 2-member ensemble with excellent geometry. A series of starting models, each trapped in increasingly difficult local minima, were prepared, a unified validation score defined, and an open Challenge issued. This Challenge inspired algorithms for escaping such traps, and new programs have been released that are expected to substantially improve the accuracy of macromolecular ensemble models.

Synopsis: A synthetic 2-member conformational ensemble of a small protein and corresponding electron density data was generated to demonstrate how topological local minima hinder simultaneous agreement with density data and chemical geometry restraints in conventional structure refinement.

我们报告发现了一类新的局部极小值，它严重限制了大分子模型的准确性。这些最小值被称为密度错配势垒陷阱，它们解释了大分子模型和实验数据之间相对于小分子模型和实验数据的差拟合：不仅是高R因子，而且是扭曲的化学几何形状。我们假设蛋白质作为一组构象存在，每个构象都具有良好的几何形状，但由于这些陷阱产生的缠结现象，精细算法无法收敛到它们。为了证明这一点，生成了一个合成的地面真值数据集，由一个具有优异几何形状的2成员集合组成。准备了一系列初始模型，每个模型都陷入了越来越困难的局部最小值，定义了统一的验证分数，并发布了公开挑战。这项挑战启发了逃避这些陷阱的算法，并且已经发布了新的程序，预计将大大提高大分子系综模型的准确性。摘要：生成了一个小蛋白质的合成2元构象系综和相应的电子密度数据，以证明拓扑局部极小值如何阻碍传统结构精细中密度数据和化学几何约束的同时一致性。

{"title":"The Untangle Challenge for accurate ensemble models.","authors":"Mehagan S Hopkins, Thomas C Terwilliger, Pavel V Afonine, Helen M Ginn, James M Holton","doi":"10.64898/2026.02.21.706873","DOIUrl":"10.64898/2026.02.21.706873","url":null,"abstract":"We report the discovery of a new class of local minima that has severely limited the accuracy of macromolecular models. Termed density misfit barrier traps, these minima explain much of the poor fit between macromolecular models and experimental data relative to that of smaller molecules: not just high R factors, but distorted chemical geometry. We postulated that proteins exist as an ensemble of conformations that each have good geometry, but refinement algorithms have been unable to converge to them due to a tangling phenomenon arising from these traps. To demonstrate, a synthetic ground truth data set was generated, consisting of a 2-member ensemble with excellent geometry. A series of starting models, each trapped in increasingly difficult local minima, were prepared, a unified validation score defined, and an open Challenge issued. This Challenge inspired algorithms for escaping such traps, and new programs have been released that are expected to substantially improve the accuracy of macromolecular ensemble models.Synopsis: A synthetic 2-member conformational ensemble of a small protein and corresponding electron density data was generated to demonstrate how topological local minima hinder simultaneous agreement with density data and chemical geometry restraints in conventional structure refinement.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12934704/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147314363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SMCHD1 loss re-wires MYOD1 enhancer nexuses and chromatin accessibility landscapes in muscle cells. 肌细胞中SMCHD1缺失重新连接MYOD1增强子连接和染色质可及性景观。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-22 DOI: 10.64898/2026.02.21.707202

Zhijun Huang, Wei Cui, Adam Klaiss, Gerd P Pfeifer

Human SMCHD1 (Structural Maintenance of Chromosomes Flexible Hinge Domain Containing 1) is a chromatin architectural protein linked to heterochromatin repression. Loss of function mutations of SMCHD1 cause facioscapulohumeral muscular dystrophy type 2 (FSHD2) through activation of the DUX4 homeobox transcription factor gene. However, it is unknown how SMCHD1 may regulate myogenic transcription independently of DUX4. Here, we show that SMCHD1 safeguards enhancer organization within the three-dimensional (3D) genome in human myoblasts. Loss of SMCHD1 leads to widespread gains in chromatin accessibility, aberrant transcription and a global redistribution of the myogenic transcription factor MYOD1. Integrative analyses of histone modifications, chromatin accessibility, Hi-C looping, and activity-by-contact enhancer-gene modeling reveal that SMCHD1 loss rewires the landscape of clustered enhancers and promotes the emergence of a new MYOD1-related network of enhancer elements, termed MYOD1 enhancer nexuses. These structures are marked by increased enhancer-enhancer connectivity, increased local 3D chromatin interactions, and coordinated activation of genes likely relevant for FSHD pathology. Together, our findings identify SMCHD1 as a key architectural constraint that suppresses hyperactive enhancer networks, thereby preserving transcriptional homeostasis in myoblasts.

人类SMCHD1 （Structural Maintenance of chromosome Flexible Hinge Domain Containing 1）是一种染色质结构蛋白，与异染色质抑制有关。SMCHD1的功能突变缺失通过激活DUX4同源盒转录因子基因导致2型面肩肱骨肌营养不良（FSHD2）。然而，尚不清楚SMCHD1如何独立于DUX4调节肌原性转录。在这里，我们发现SMCHD1在人成肌细胞的三维（3D）基因组中保护增强子组织。SMCHD1的缺失导致染色质可及性的广泛增加、转录异常和肌源性转录因子MYOD1的全球再分布。对组蛋白修饰、染色质可及性、Hi-C环和接触活性增强子基因模型的综合分析表明，SMCHD1缺失重塑了群集增强子的格局，并促进了一个新的MYOD1相关增强子元件网络的出现，称为MYOD1增强子连接。这些结构的特点是增强子与增强子之间的连通性增加，局部三维染色质相互作用增加，以及可能与FSHD病理相关的基因的协调激活。总之，我们的研究结果确定SMCHD1是抑制过度活跃的增强子网络的关键结构约束，从而保持成肌细胞的转录稳态。

{"title":"SMCHD1 loss re-wires MYOD1 enhancer nexuses and chromatin accessibility landscapes in muscle cells.","authors":"Zhijun Huang, Wei Cui, Adam Klaiss, Gerd P Pfeifer","doi":"10.64898/2026.02.21.707202","DOIUrl":"10.64898/2026.02.21.707202","url":null,"abstract":"Human SMCHD1 (Structural Maintenance of Chromosomes Flexible Hinge Domain Containing 1) is a chromatin architectural protein linked to heterochromatin repression. Loss of function mutations of SMCHD1 cause facioscapulohumeral muscular dystrophy type 2 (FSHD2) through activation of the DUX4 homeobox transcription factor gene. However, it is unknown how SMCHD1 may regulate myogenic transcription independently of DUX4. Here, we show that SMCHD1 safeguards enhancer organization within the three-dimensional (3D) genome in human myoblasts. Loss of SMCHD1 leads to widespread gains in chromatin accessibility, aberrant transcription and a global redistribution of the myogenic transcription factor MYOD1. Integrative analyses of histone modifications, chromatin accessibility, Hi-C looping, and activity-by-contact enhancer-gene modeling reveal that SMCHD1 loss rewires the landscape of clustered enhancers and promotes the emergence of a new MYOD1-related network of enhancer elements, termed MYOD1 enhancer nexuses. These structures are marked by increased enhancer-enhancer connectivity, increased local 3D chromatin interactions, and coordinated activation of genes likely relevant for FSHD pathology. Together, our findings identify SMCHD1 as a key architectural constraint that suppresses hyperactive enhancer networks, thereby preserving transcriptional homeostasis in myoblasts.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12934784/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147314405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Linking biochemical and cellular efficacy of MERS coronavirus main protease inhibitors. 连接MERS冠状病毒主要蛋白酶抑制剂的生化和细胞功效。

bioRxiv : the preprint server for biology

Pub Date : 2026-02-21 DOI: 10.64898/2026.02.20.707097

Van N T La, Noa Lahav, Mario Rodriguez, Randy Diaz-Tapia, Briana McGovern, Jared Benjamin, Haim Barr, Kris M White, Lulu Kang, John D Chodera, David D L Minh

Compounds that bind to the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (MPro) often produce biphasic concentration-response curves (CRCs) in biochemical assays; low concentrations activate the enzyme and high concentrations inhibit it. This biphasic behavior complicates data analysis. Here, we compare three approaches to data analysis: fitting the Hill equation to the activation phase, fitting it to the inhibition phase, and fitting an enzyme kinetics model that incorporates dimerization and ligand binding to the complete CRC. In the latter case, cellular efficacy is predicted by extrapolating the model to high enzyme concentrations. For compounds in our drug lead series, all three procedures yield inhibitory concentrations that are correlated with live-virus antiviral assays. The latter procedure provides the most accurate forecast of cellular efficacy rank. These data analysis procedures may be valuable for antiviral drug discovery against MERS-CoV MPro and other enzymes with similar kinetics.

与中东呼吸综合征冠状病毒（MERS-CoV）主要蛋白酶（MPro）结合的化合物在生化分析中经常产生双相浓度-反应曲线（CRCs）；低浓度激活酶，高浓度抑制它。这种双相行为使数据分析变得复杂。在这里，我们比较了三种数据分析方法：将Hill方程拟合到激活阶段，将其拟合到抑制阶段，并拟合将二聚化和配体结合结合到完整CRC的酶动力学模型。在后一种情况下，通过将模型外推到高酶浓度来预测细胞功效。对于我们药物先导系列中的化合物，所有三种程序产生的抑制浓度与活病毒抗病毒测定相关。后一种方法提供了最准确的细胞功效等级预测。这些数据分析程序可能对发现针对MERS-CoV MPro和其他具有类似动力学的酶的抗病毒药物有价值。

{"title":"Linking biochemical and cellular efficacy of MERS coronavirus main protease inhibitors.","authors":"Van N T La, Noa Lahav, Mario Rodriguez, Randy Diaz-Tapia, Briana McGovern, Jared Benjamin, Haim Barr, Kris M White, Lulu Kang, John D Chodera, David D L Minh","doi":"10.64898/2026.02.20.707097","DOIUrl":"10.64898/2026.02.20.707097","url":null,"abstract":"Compounds that bind to the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) main protease (MPro) often produce biphasic concentration-response curves (CRCs) in biochemical assays; low concentrations activate the enzyme and high concentrations inhibit it. This biphasic behavior complicates data analysis. Here, we compare three approaches to data analysis: fitting the Hill equation to the activation phase, fitting it to the inhibition phase, and fitting an enzyme kinetics model that incorporates dimerization and ligand binding to the complete CRC. In the latter case, cellular efficacy is predicted by extrapolating the model to high enzyme concentrations. For compounds in our drug lead series, all three procedures yield inhibitory concentrations that are correlated with live-virus antiviral assays. The latter procedure provides the most accurate forecast of cellular efficacy rank. These data analysis procedures may be valuable for antiviral drug discovery against MERS-CoV MPro and other enzymes with similar kinetics.","PeriodicalId":519960,"journal":{"name":"bioRxiv : the preprint server for biology","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2026-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12934682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147314338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0