arXiv - QuanBio - Quantitative Methods最新文献_第6页

BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets BioBricks.ai：生命科学数据资产的版本化数据注册中心

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-30 DOI: arxiv-2408.17320

Yifan Gao, Zakariyya Mughal, Jose A. Jaramillo-Villegas, Marie Corradi, Alexandre Borrel, Ben Lieberman, Suliman Sharif, John Shaffer, Karamarie Fecho, Ajay Chatrath, Alexandra Maertens, Marc A. T. Teunis, Nicole Kleinstreuer, Thomas Hartung, Thomas Luechtefeld

Researchers in biomedical research, public health, and the life sciencesoften spend weeks or months discovering, accessing, curating, and integratingdata from disparate sources, significantly delaying the onset of actualanalysis and innovation. Instead of countless developers creating redundant andinconsistent data pipelines, BioBricks.ai offers a centralized data repositoryand a suite of developer-friendly tools to simplify access to scientific data.Currently, BioBricks.ai delivers over ninety biological and chemical datasets.It provides a package manager-like system for installing and managingdependencies on data sources. Each 'brick' is a Data Version Control gitrepository that supports an updateable pipeline for extraction, transformation,and loading data into the BioBricks.ai backend at https://biobricks.ai. Usecases include accelerating data science workflows and facilitating the creationof novel data assets by integrating multiple datasets into unified, harmonizedresources. In conclusion, BioBricks.ai offers an opportunity to accelerateaccess and use of public data through a single open platform.

生物医学研究、公共卫生和生命科学领域的研究人员往往需要花费数周或数月的时间来发现、访问、整理和整合来自不同来源的数据，这大大延误了实际分析和创新的开始。目前，BioBricks.ai 提供了九十多个生物和化学数据集。它提供了一个类似于软件包管理器的系统，用于安装和管理数据源的依赖性。每个 "砖块 "都是一个数据版本控制 git 仓库，它支持一个可更新的管道，用于提取、转换数据并将其加载到 BioBricks.ai 后端（https://biobricks.ai）。使用案例包括加速数据科学工作流程，以及通过将多个数据集整合为统一、协调的资源来促进新型数据资产的创建。总之，BioBricks.ai 提供了一个通过单一开放平台加速获取和使用公共数据的机会。

{"title":"BioBricks.ai: A Versioned Data Registry for Life Sciences Data Assets","authors":"Yifan Gao, Zakariyya Mughal, Jose A. Jaramillo-Villegas, Marie Corradi, Alexandre Borrel, Ben Lieberman, Suliman Sharif, John Shaffer, Karamarie Fecho, Ajay Chatrath, Alexandra Maertens, Marc A. T. Teunis, Nicole Kleinstreuer, Thomas Hartung, Thomas Luechtefeld","doi":"arxiv-2408.17320","DOIUrl":"https://doi.org/arxiv-2408.17320","url":null,"abstract":"Researchers in biomedical research, public health, and the life sciences\u0000often spend weeks or months discovering, accessing, curating, and integrating\u0000data from disparate sources, significantly delaying the onset of actual\u0000analysis and innovation. Instead of countless developers creating redundant and\u0000inconsistent data pipelines, BioBricks.ai offers a centralized data repository\u0000and a suite of developer-friendly tools to simplify access to scientific data.\u0000Currently, BioBricks.ai delivers over ninety biological and chemical datasets.\u0000It provides a package manager-like system for installing and managing\u0000dependencies on data sources. Each 'brick' is a Data Version Control git\u0000repository that supports an updateable pipeline for extraction, transformation,\u0000and loading data into the BioBricks.ai backend at https://biobricks.ai. Use\u0000cases include accelerating data science workflows and facilitating the creation\u0000of novel data assets by integrating multiple datasets into unified, harmonized\u0000resources. In conclusion, BioBricks.ai offers an opportunity to accelerate\u0000access and use of public data through a single open platform.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"66 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A note on promotion time cure models with a new biological consideration 关于推广时间固化模型的说明：新的生物学考虑因素

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-30 DOI: arxiv-2408.17188

Zhi Zhao, Fatih Kızılaslan

We introduce a generalized promotion time cure model motivated by a newbiological consideration. The new approach is flexible to model heterogeneoussurvival data, in particular for addressing intra-sample heterogeneity.

我们引入了一种基于新生物学考虑的广义促进时间治愈模型。这种新方法可以灵活地为异质性生存数据建模，特别是在处理样本内异质性时。

引用次数: 0

Uncertainty Quantification of Antibody Measurements: Physical Principles and Implications for Standardization 抗体测量的不确定性量化：物理原理及对标准化的影响

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-30 DOI: arxiv-2409.00191

Paul N. Patrone, Lili Wang, Sheng Lin-Gibson, Anthony J. Kearsley

Harmonizing serology measurements is critical for identifying referencematerials that permit standardization and comparison of results acrossdifferent diagnostic platforms. However, the theoretical foundations of suchtasks have yet to be fully explored in the context of antibody thermodynamicsand uncertainty quantification (UQ). This has restricted the usefulness ofstandards currently deployed and limited the scope of materials considered asviable reference material. To address these problems, we develop rigoroustheories of antibody normalization and harmonization, as well as formulate aprobabilistic framework for defining correlates of protection. We begin byproposing a mathematical definition of harmonization equipped with structureneeded to quantify uncertainty associated with the choice of standard, assay,etc. We then show how a thermodynamic description of serology measurements (i)relates this structure to the Gibbs free-energy of antibody binding, andthereby (ii) induces a regression analysis that directly harmonizesmeasurements. We supplement this with a novel, optimization-based normalization(not harmonization!) method that checks for consistency between reference andsample dilution curves. Last, we relate these analyses to uncertaintypropagation techniques to estimate correlates of protection. A key result ofthese analyses is that under physically reasonable conditions, the choice ofreference material does not increase uncertainty associated with harmonizationor correlates of protection. We provide examples and validate main ideas in thecontext of an interlab study that lays the foundation for using monoclonalantibodies as a reference for SARS-CoV-2 serology measurements.

统一血清学测量方法对于确定参考材料至关重要，这些参考材料可使不同诊断平台的结果标准化并进行比较。然而，在抗体热力学和不确定性量化（UQ）的背景下，此类任务的理论基础尚未得到充分探索。这限制了目前采用的标准的实用性，也限制了被视为可行参考材料的材料范围。为了解决这些问题，我们提出了严谨的抗体规范化和统一化理论，并制定了用于定义保护相关性的概率框架。首先，我们提出了协调的数学定义，该定义具有量化与标准选择、检测等相关的不确定性所需的结构。然后，我们展示了血清学测量的热力学描述如何(i)将此结构与抗体结合的吉布斯自由能联系起来，从而(ii)诱导出直接协调测量的回归分析。作为补充，我们采用了一种新颖的、基于优化的归一化（而非协调！）方法，该方法可检查参考曲线与样本稀释曲线之间的一致性。最后，我们将这些分析与不确定性传播技术联系起来，以估计保护的相关性。这些分析的一个关键结果是，在物理条件合理的情况下，参考材料的选择不会增加与协调或保护相关性有关的不确定性。我们在一项实验室间研究中提供了实例并验证了主要观点，该研究为使用单克隆抗体作为 SARS-CoV-2 血清学测量的参考奠定了基础。

{"title":"Uncertainty Quantification of Antibody Measurements: Physical Principles and Implications for Standardization","authors":"Paul N. Patrone, Lili Wang, Sheng Lin-Gibson, Anthony J. Kearsley","doi":"arxiv-2409.00191","DOIUrl":"https://doi.org/arxiv-2409.00191","url":null,"abstract":"Harmonizing serology measurements is critical for identifying reference\u0000materials that permit standardization and comparison of results across\u0000different diagnostic platforms. However, the theoretical foundations of such\u0000tasks have yet to be fully explored in the context of antibody thermodynamics\u0000and uncertainty quantification (UQ). This has restricted the usefulness of\u0000standards currently deployed and limited the scope of materials considered as\u0000viable reference material. To address these problems, we develop rigorous\u0000theories of antibody normalization and harmonization, as well as formulate a\u0000probabilistic framework for defining correlates of protection. We begin by\u0000proposing a mathematical definition of harmonization equipped with structure\u0000needed to quantify uncertainty associated with the choice of standard, assay,\u0000etc. We then show how a thermodynamic description of serology measurements (i)\u0000relates this structure to the Gibbs free-energy of antibody binding, and\u0000thereby (ii) induces a regression analysis that directly harmonizes\u0000measurements. We supplement this with a novel, optimization-based normalization\u0000(not harmonization!) method that checks for consistency between reference and\u0000sample dilution curves. Last, we relate these analyses to uncertainty\u0000propagation techniques to estimate correlates of protection. A key result of\u0000these analyses is that under physically reasonable conditions, the choice of\u0000reference material does not increase uncertainty associated with harmonization\u0000or correlates of protection. We provide examples and validate main ideas in the\u0000context of an interlab study that lays the foundation for using monoclonal\u0000antibodies as a reference for SARS-CoV-2 serology measurements.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9 基于人工智能的方法联盟利用真实世界的临床数据预测乳腺癌转移的 15 年风险，AUC 高达 0.9

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-29 DOI: arxiv-2408.16256

Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky

Breast cancer is one of the two cancers responsible for the most deaths inwomen, with about 42,000 deaths each year in the US. That there are over300,000 breast cancers newly diagnosed each year suggests that only a fractionof the cancers result in mortality. Thus, most of the women undergo seeminglycurative treatment for localized cancers, but a significant later succumb tometastatic disease for which current treatments are only temporizing for thevast majority. The current prognostic metrics are of little actionable valuefor 4 of the 5 women seemingly cured after local treatment, and many women areexposed to morbid and even mortal adjuvant therapies unnecessarily, with theseadjuvant therapies reducing metastatic recurrence by only a third. Thus, thereis a need for better prognostics to target aggressive treatment at those whoare likely to relapse and spare those who were actually cured. While there is aplethora of molecular and tumor-marker assays in use and under-development todetect recurrence early, these are time consuming, expensive and still oftenun-validated as to actionable prognostic utility. A different approach woulduse large data techniques to determine clinical and histopathologicalparameters that would provide accurate prognostics using existing data. Herein,we report on machine learning, together with grid search and Bayesian Networksto develop algorithms that present a AUC of up to 0.9 in ROC analyses, usingonly extant data. Such algorithms could be rapidly translated to clinicalmanagement as they do not require testing beyond routine tumor evaluations.

乳腺癌是导致女性死亡人数最多的两种癌症之一，美国每年约有 42,000 人死于乳腺癌。每年新确诊的乳腺癌患者超过 30 万，这表明只有一小部分癌症会导致死亡。因此，大多数妇女接受了看似治愈的局部癌症治疗，但也有相当一部分人死于转移性疾病，而目前的治疗方法对绝大多数人来说只是暂时性的。目前的预后指标对于经过局部治疗看似痊愈的 5 名妇女中的 4 名来说几乎没有可操作的价值，许多妇女不必要地接受了病态甚至致命的辅助治疗，而这些辅助治疗仅能减少三分之一的转移性复发。因此，需要更好的预后分析，以便针对可能复发的患者进行积极治疗，而放过那些真正治愈的患者。虽然目前有大量分子和肿瘤标志物检测方法正在使用和开发中，以早期检测复发，但这些方法耗时长、费用高，而且往往仍未验证是否具有可操作的预后效用。一种不同的方法是利用大数据技术来确定临床和组织病理学参数，从而利用现有数据提供准确的预后。在本文中，我们报告了机器学习、网格搜索和贝叶斯网络开发出的算法，这些算法仅使用现有数据进行 ROC 分析，其 AUC 可高达 0.9。这种算法无需进行常规肿瘤评估以外的测试，因此可迅速应用于临床管理。

{"title":"Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9","authors":"Xia Jiang, Yijun Zhou, Alan Wells, Adam Brufsky","doi":"arxiv-2408.16256","DOIUrl":"https://doi.org/arxiv-2408.16256","url":null,"abstract":"Breast cancer is one of the two cancers responsible for the most deaths in\u0000women, with about 42,000 deaths each year in the US. That there are over\u0000300,000 breast cancers newly diagnosed each year suggests that only a fraction\u0000of the cancers result in mortality. Thus, most of the women undergo seemingly\u0000curative treatment for localized cancers, but a significant later succumb to\u0000metastatic disease for which current treatments are only temporizing for the\u0000vast majority. The current prognostic metrics are of little actionable value\u0000for 4 of the 5 women seemingly cured after local treatment, and many women are\u0000exposed to morbid and even mortal adjuvant therapies unnecessarily, with these\u0000adjuvant therapies reducing metastatic recurrence by only a third. Thus, there\u0000is a need for better prognostics to target aggressive treatment at those who\u0000are likely to relapse and spare those who were actually cured. While there is a\u0000plethora of molecular and tumor-marker assays in use and under-development to\u0000detect recurrence early, these are time consuming, expensive and still often\u0000un-validated as to actionable prognostic utility. A different approach would\u0000use large data techniques to determine clinical and histopathological\u0000parameters that would provide accurate prognostics using existing data. Herein,\u0000we report on machine learning, together with grid search and Bayesian Networks\u0000to develop algorithms that present a AUC of up to 0.9 in ROC analyses, using\u0000only extant data. Such algorithms could be rapidly translated to clinical\u0000management as they do not require testing beyond routine tumor evaluations.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rapid and accurate mosquito abundance forecasting with Aedes-AI neural networks 利用伊蚊-人工智能神经网络快速准确地预测蚊子数量

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-28 DOI: arxiv-2408.16152

Adrienne C. Kinney, Roberto Barrera, Joceline Lega

We present a method to convert weather data into probabilistic forecasts ofAedes aegypti abundance. The approach, which relies on the Aedes-AI suite ofneural networks, produces weekly point predictions with correspondinguncertainty estimates. Once calibrated on past trap and weather data, the modelis designed to use weather forecasts to estimate future trap catches. Wedemonstrate that when reliable input data are used, the resulting predictionshave high skill. This technique may therefore be used to supplement vectorsurveillance efforts or identify periods of elevated risk for vector-bornedisease outbreaks.

我们介绍了一种将天气数据转换为埃及伊蚊丰度概率预测的方法。该方法依赖于 Aedes-AI 神经网络套件，可产生每周点预测值及相应的不确定性估计值。一旦根据过去的诱捕器和天气数据进行校准，该模型就可以利用天气预报来估计未来的诱捕器捕获量。我们证明，如果使用可靠的输入数据，预测结果具有很高的技能。因此，这项技术可用于补充病媒监测工作或确定病媒传染病爆发的高危期。

引用次数: 0

Q-MRS: A Deep Learning Framework for Quantitative Magnetic Resonance Spectra Analysis Q-MRS：用于定量磁共振频谱分析的深度学习框架

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-28 DOI: arxiv-2408.15999

Christopher J. Wu, Lawrence S. Kegeles, Jia Guo

Magnetic resonance spectroscopy (MRS) is an established technique forstudying tissue metabolism, particularly in central nervous system disorders.While powerful and versatile, MRS is often limited by challenges associatedwith data quality, processing, and quantification. Existing MRS quantificationmethods face difficulties in balancing model complexity and reproducibilityduring spectral modeling, often falling into the trap of eitheroversimplification or over-parameterization. To address these limitations, thisstudy introduces a deep learning (DL) framework that employs transfer learning,in which the model is pre-trained on simulated datasets before it undergoesfine-tuning on in vivo data. The proposed framework showed promisingperformance when applied to the Philips dataset from the BIG GABA repositoryand represents an exciting advancement in MRS data analysis.

磁共振波谱（MRS）是研究组织代谢，尤其是中枢神经系统疾病的成熟技术。虽然 MRS 功能强大且用途广泛，但它往往受限于与数据质量、处理和量化相关的挑战。现有的 MRS 定量方法在光谱建模过程中难以在模型复杂性和可重复性之间取得平衡，往往会陷入过度简化或过度参数化的陷阱。为了解决这些局限性，本研究引入了一种采用迁移学习的深度学习（DL）框架，即先在模拟数据集上对模型进行预训练，然后再在体内数据上进行微调。所提出的框架在应用于 BIG GABA 数据库中的飞利浦数据集时表现出了良好的性能，代表了 MRS 数据分析领域令人兴奋的进步。

引用次数: 0

Generating Binary Species Range Maps 生成二进制物种分布图

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-28 DOI: arxiv-2408.15956

Filip Dorm, Christian Lange, Scott Loarie, Oisin Mac Aodha

Accurately predicting the geographic ranges of species is crucial forassisting conservation efforts. Traditionally, range maps were manually createdby experts. However, species distribution models (SDMs) and, more recently,deep learning-based variants offer a potential automated alternative. Deeplearning-based SDMs generate a continuous probability representing thepredicted presence of a species at a given location, which must be binarized bysetting per-species thresholds to obtain binary range maps. However, selectingappropriate per-species thresholds to binarize these predictions is non-trivialas different species can require distinct thresholds. In this work, we evaluatedifferent approaches for automatically identifying the best thresholds forbinarizing range maps using presence-only data. This includes approaches thatrequire the generation of additional pseudo-absence data, along with ones thatonly require presence data. We also propose an extension of an existingpresence-only technique that is more robust to outliers. We perform a detailedevaluation of different thresholding techniques on the tasks of binary rangeestimation and large-scale fine-grained visual classification, and wedemonstrate improved performance over existing pseudo-absence free approachesusing our method.

准确预测物种的地理分布范围对于协助物种保护工作至关重要。传统上，物种分布图是由专家手动绘制的。然而，物种分布模型（SDM）以及最近基于深度学习的变体提供了一种潜在的自动化替代方法。基于深度学习的物种分布模型会生成一个连续概率，代表一个物种在给定地点的预测存在概率，必须通过设置每个物种的阈值对其进行二值化处理，以获得二值分布图。然而，选择适当的物种阈值来对这些预测进行二值化并非易事，因为不同的物种可能需要不同的阈值。在这项工作中，我们对不同的方法进行了评估，以自动识别最佳阈值，从而使用纯存在数据对范围图进行二值化。其中包括需要生成额外伪存在数据的方法，以及只需要存在数据的方法。我们还提出了对现有纯存在技术的一种扩展，这种技术对异常值更有鲁棒性。我们在二进制范围估计和大规模细粒度视觉分类任务中对不同的阈值技术进行了详细评估，并利用我们的方法展示了比现有无伪存在方法更高的性能。

{"title":"Generating Binary Species Range Maps","authors":"Filip Dorm, Christian Lange, Scott Loarie, Oisin Mac Aodha","doi":"arxiv-2408.15956","DOIUrl":"https://doi.org/arxiv-2408.15956","url":null,"abstract":"Accurately predicting the geographic ranges of species is crucial for\u0000assisting conservation efforts. Traditionally, range maps were manually created\u0000by experts. However, species distribution models (SDMs) and, more recently,\u0000deep learning-based variants offer a potential automated alternative. Deep\u0000learning-based SDMs generate a continuous probability representing the\u0000predicted presence of a species at a given location, which must be binarized by\u0000setting per-species thresholds to obtain binary range maps. However, selecting\u0000appropriate per-species thresholds to binarize these predictions is non-trivial\u0000as different species can require distinct thresholds. In this work, we evaluate\u0000different approaches for automatically identifying the best thresholds for\u0000binarizing range maps using presence-only data. This includes approaches that\u0000require the generation of additional pseudo-absence data, along with ones that\u0000only require presence data. We also propose an extension of an existing\u0000presence-only technique that is more robust to outliers. We perform a detailed\u0000evaluation of different thresholding techniques on the tasks of binary range\u0000estimation and large-scale fine-grained visual classification, and we\u0000demonstrate improved performance over existing pseudo-absence free approaches\u0000using our method.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"111 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network 预测晚发乳腺癌转移的深度学习：用于深度前馈神经网络元调谐的单超参数网格搜索（SHGS）策略

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-28 DOI: arxiv-2408.15498

Yijun Zhou, Om Arora-Jain, Xia Jiang

While machine learning has advanced in medicine, its widespread use inclinical applications, especially in predicting breast cancer metastasis, isstill limited. We have been dedicated to constructing a DFNN model to predictbreast cancer metastasis n years in advance. However, the challenge lies inefficiently identifying optimal hyperparameter values through grid search,given the constraints of time and resources. Issues such as the infinitepossibilities for continuous hyperparameters like l1 and l2, as well as thetime-consuming and costly process, further complicate the task. To addressthese challenges, we developed Single Hyperparameter Grid Search (SHGS)strategy, serving as a preselection method before grid search. Our experimentswith SHGS applied to DFNN models for breast cancer metastasis prediction focuson analyzing eight target hyperparameters: epochs, batch size, dropout, L1, L2,learning rate, decay, and momentum. We created three figures, each depictingthe experiment results obtained from three LSM-I-10-Plus-year datasets. Thesefigures illustrate the relationship between model performance and the targethyperparameter values. For each hyperparameter, we analyzed whether changes inthis hyperparameter would affect model performance, examined if there werespecific patterns, and explored how to choose values for the particularhyperparameter. Our experimental findings reveal that the optimal value of ahyperparameter is not only dependent on the dataset but is also significantlyinfluenced by the settings of other hyperparameters. Additionally, ourexperiments suggested some reduced range of values for a target hyperparameter,which may be helpful for low-budget grid search. This approach serves as aprior experience and foundation for subsequent use of grid search to enhancemodel performance.

尽管机器学习在医学领域取得了很大进展，但其在临床应用中的广泛应用，尤其是在预测乳腺癌转移方面，仍然十分有限。我们一直致力于构建一个提前 n 年预测乳腺癌转移的 DFNN 模型。然而，由于时间和资源的限制，通过网格搜索确定最佳超参数值的效率很低，这是一个挑战。诸如 l1 和 l2 等连续超参数的无穷可能性，以及耗时耗资的过程等问题，使任务变得更加复杂。为了应对这些挑战，我们开发了单超参网格搜索（SHGS）策略，作为网格搜索前的预选方法。我们将 SHGS 应用于乳腺癌转移预测的 DFNN 模型的实验，重点分析了八个目标超参数：epochs、batch size、dropout、L1、L2、学习率、衰减和动量。我们绘制了三幅图，分别描述了从三个 LSM-I-10-Plus 年数据集获得的实验结果。这些图说明了模型性能与目标超参数值之间的关系。对于每个超参数，我们分析了改变该超参数是否会影响模型性能，研究了是否存在特定模式，并探讨了如何选择特定超参数的值。我们的实验结果表明，超参数的最佳值不仅取决于数据集，还受到其他超参数设置的显著影响。此外，我们的实验还建议缩小目标超参数的取值范围，这可能有助于低预算网格搜索。这种方法为以后使用网格搜索提高模型性能积累了经验，奠定了基础。

{"title":"Deep Learning to Predict Late-Onset Breast Cancer Metastasis: the Single Hyperparameter Grid Search (SHGS) Strategy for Meta Tuning Concerning Deep Feed-forward Neural Network","authors":"Yijun Zhou, Om Arora-Jain, Xia Jiang","doi":"arxiv-2408.15498","DOIUrl":"https://doi.org/arxiv-2408.15498","url":null,"abstract":"While machine learning has advanced in medicine, its widespread use in\u0000clinical applications, especially in predicting breast cancer metastasis, is\u0000still limited. We have been dedicated to constructing a DFNN model to predict\u0000breast cancer metastasis n years in advance. However, the challenge lies in\u0000efficiently identifying optimal hyperparameter values through grid search,\u0000given the constraints of time and resources. Issues such as the infinite\u0000possibilities for continuous hyperparameters like l1 and l2, as well as the\u0000time-consuming and costly process, further complicate the task. To address\u0000these challenges, we developed Single Hyperparameter Grid Search (SHGS)\u0000strategy, serving as a preselection method before grid search. Our experiments\u0000with SHGS applied to DFNN models for breast cancer metastasis prediction focus\u0000on analyzing eight target hyperparameters: epochs, batch size, dropout, L1, L2,\u0000learning rate, decay, and momentum. We created three figures, each depicting\u0000the experiment results obtained from three LSM-I-10-Plus-year datasets. These\u0000figures illustrate the relationship between model performance and the target\u0000hyperparameter values. For each hyperparameter, we analyzed whether changes in\u0000this hyperparameter would affect model performance, examined if there were\u0000specific patterns, and explored how to choose values for the particular\u0000hyperparameter. Our experimental findings reveal that the optimal value of a\u0000hyperparameter is not only dependent on the dataset but is also significantly\u0000influenced by the settings of other hyperparameters. Additionally, our\u0000experiments suggested some reduced range of values for a target hyperparameter,\u0000which may be helpful for low-budget grid search. This approach serves as a\u0000prior experience and foundation for subsequent use of grid search to enhance\u0000model performance.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"275 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A reaction network model of microscale liquid-liquid phase separation reveals effects of spatial dimension 微尺度液-液相分离反应网络模型揭示空间维度的影响

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-27 DOI: arxiv-2408.15303

Jinyoung Kim, Sean D. Lawley, Jinsu Kim

Proteins can form droplets via liquid-liquid phase separation (LLPS) incells. Recent experiments demonstrate that LLPS is qualitatively different ontwo-dimensional (2d) surfaces compared to three-dimensional (3d) solutions. Inthis paper, we use mathematical modeling to investigate the causes of thediscrepancies between LLPS in 2d versus 3d. We model the number of proteins anddroplets inducing LLPS by continuous-time Markov chains and use chemicalreaction network theory to analyze the model. To reflect the influence of spacedimension, droplet formation and dissociation rates are determined using thefirst hitting times of diffusing proteins. We first show that our stochasticmodel reproduces the appropriate phase diagram and is consistent with therelevant thermodynamic constraints. After further analyzing the model, we findthat it predicts that the space dimension induces qualitatively differentfeatures of LLPS which are consistent with recent experiments. While it hasbeen claimed that the differences between 2d and 3d LLPS stems mainly fromdifferent diffusion coefficients, our analysis is independent of the diffusioncoefficients of the proteins since we use the stationary model behavior.Therefore, our results give new hypotheses about how space dimension affectsLLPS.

蛋白质可通过液-液相分离（LLPS）作用形成液滴。最近的实验证明，二维（2d）表面上的 LLPS 与三维（3d）溶液上的 LLPS 有质的不同。在本文中，我们使用数学模型来研究二维与三维 LLPS 之间差异的原因。我们通过连续时间马尔可夫链对诱导 LLPS 的蛋白质和液滴数量进行建模，并利用化学反应网络理论对模型进行分析。为了反映空间维度的影响，液滴的形成和解离速率是通过扩散蛋白质的首次撞击时间来确定的。我们首先证明了我们的随机模型再现了适当的相图，并且与相关的热力学约束相一致。在进一步分析该模型后，我们发现该模型预测空间维度会诱发 LLPS 质量上的不同特征，这与最近的实验是一致的。虽然有人认为 2d 和 3d LLPS 的差异主要源于不同的扩散系数，但我们的分析与蛋白质的扩散系数无关，因为我们使用的是静态模型行为。

{"title":"A reaction network model of microscale liquid-liquid phase separation reveals effects of spatial dimension","authors":"Jinyoung Kim, Sean D. Lawley, Jinsu Kim","doi":"arxiv-2408.15303","DOIUrl":"https://doi.org/arxiv-2408.15303","url":null,"abstract":"Proteins can form droplets via liquid-liquid phase separation (LLPS) in\u0000cells. Recent experiments demonstrate that LLPS is qualitatively different on\u0000two-dimensional (2d) surfaces compared to three-dimensional (3d) solutions. In\u0000this paper, we use mathematical modeling to investigate the causes of the\u0000discrepancies between LLPS in 2d versus 3d. We model the number of proteins and\u0000droplets inducing LLPS by continuous-time Markov chains and use chemical\u0000reaction network theory to analyze the model. To reflect the influence of space\u0000dimension, droplet formation and dissociation rates are determined using the\u0000first hitting times of diffusing proteins. We first show that our stochastic\u0000model reproduces the appropriate phase diagram and is consistent with the\u0000relevant thermodynamic constraints. After further analyzing the model, we find\u0000that it predicts that the space dimension induces qualitatively different\u0000features of LLPS which are consistent with recent experiments. While it has\u0000been claimed that the differences between 2d and 3d LLPS stems mainly from\u0000different diffusion coefficients, our analysis is independent of the diffusion\u0000coefficients of the proteins since we use the stationary model behavior.\u0000Therefore, our results give new hypotheses about how space dimension affects\u0000LLPS.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A novel method to separate circadian from non-circadian masking effects in order to enhance daily circadian timing and amplitude estimation from core body temperature 一种分离昼夜节律与非昼夜节律掩蔽效应的新方法，以加强从核心体温估计每日昼夜节律的时间和振幅

arXiv - QuanBio - Quantitative Methods

Pub Date : 2024-08-27 DOI: arxiv-2408.15295

Phuc D Nguyen, Claire Dunbar, Hannah Scott, Bastien Lechat, Jack Manners, Gorica Micic, Nicole Lovato, Amy C Reynolds, Leon Lack, Robert Adams, Danny Eckert, Andrew Vakulin, Peter G Catcheside

Circadian disruption contributes to adverse effects on sleep, performance,and health. One accepted method to track continuous daily changes in circadiantiming is to measure core body temperature (CBT), and establish daily,circadian-related CBT minimum time (Tmin). This method typically appliescosine-model fits to measured CBT data, which may not adequately account forsubstantial wake metabolic activity and sleep effects on CBT that confound andmask circadian effects, and thus estimates of the circadian-related Tmin. Thisstudy introduced a novel physiology-grounded analytic approach to separatecircadian from non-circadian effects on CBT, which we compared againsttraditional cosine-based methods. The dataset comprised 33 healthy participantsattending a 39-hour in-laboratory study with an initial overnight sleepfollowed by an extended wake period. CBT data were collected at 30-secondintervals via ingestible capsules. Our design captured CBT during both thebaseline sleep period and during extended wake period (without sleep) andallowed us to model the influence of circadian and non-circadian effects ofsleep, wake, and activity on CBT using physiology-guided generalized additivemodels. Model fits and estimated Tmin inferred from extended wake without sleepwere compared with traditional cosine-based models fits. Compared to thetraditional cosine model, the new model exhibited superior fits to CBT (PearsonR 0.90 [95%CI; [0.83 - 0.96] versus 0.81 [0.55-0.93]). The difference betweenestimated vs measured circadian Tmin, derived from the day without sleep, wasbetter fit with our method (0.2 [-0.5,0.3] hours) versus previous methods (1.4[1.1 to 1.7] hours). This new method provides superior demasking ofnon-circadian influences compared to traditional cosine methods, including theremoval of a sleep-related bias towards an earlier estimate of circadian Tmin.

昼夜节律紊乱会对睡眠、工作表现和健康产生不利影响。追踪昼夜节律的每日连续变化的一种公认方法是测量核心体温（CBT），并确定每日与昼夜节律相关的 CBT 最低时间（Tmin）。这种方法通常采用余弦模型拟合测量的 CBT 数据，但余弦模型可能无法充分考虑到大量的清醒代谢活动和睡眠对 CBT 的影响，这些活动和影响会混淆和掩盖昼夜节律效应，从而影响与昼夜节律相关的 Tmin 的估计值。这项研究引入了一种新颖的生理学分析方法，用于区分昼夜节律效应和非昼夜节律效应对 CBT 的影响，我们将其与传统的余弦分析方法进行了比较。数据集由 33 名健康参与者组成，他们参加了 39 小时的实验室研究，其中包括最初的一夜睡眠和随后的长时间清醒。CBT数据是通过可食用胶囊以30秒为间隔收集的。我们的设计同时捕捉了基准睡眠期和延长唤醒期（无睡眠）的 CBT，并允许我们使用生理学指导的广义加法模型来模拟睡眠、唤醒和活动的昼夜节律和非昼夜节律效应对 CBT 的影响。我们将模型拟合结果和通过延长唤醒而不睡眠推断出的估计 Tmin 与传统的余弦模型拟合结果进行了比较。与传统余弦模型相比，新模型的 CBT 拟合效果更好（PearsonR 0.90 [95%CI; [0.83 - 0.96] 对 0.81 [0.55-0.93]）。我们的方法（0.2 [-0.5,0.3] 小时）比以前的方法（1.4[1.1-1.7] 小时）更好地拟合了从不眠日得出的估计昼夜节律 Tmin 与测量昼夜节律 Tmin 之间的差异。与传统余弦法相比，这种新方法能更好地消除非昼夜节律的影响，包括消除与睡眠有关的偏差，更早地估计昼夜节律Tmin。

{"title":"A novel method to separate circadian from non-circadian masking effects in order to enhance daily circadian timing and amplitude estimation from core body temperature","authors":"Phuc D Nguyen, Claire Dunbar, Hannah Scott, Bastien Lechat, Jack Manners, Gorica Micic, Nicole Lovato, Amy C Reynolds, Leon Lack, Robert Adams, Danny Eckert, Andrew Vakulin, Peter G Catcheside","doi":"arxiv-2408.15295","DOIUrl":"https://doi.org/arxiv-2408.15295","url":null,"abstract":"Circadian disruption contributes to adverse effects on sleep, performance,\u0000and health. One accepted method to track continuous daily changes in circadian\u0000timing is to measure core body temperature (CBT), and establish daily,\u0000circadian-related CBT minimum time (Tmin). This method typically applies\u0000cosine-model fits to measured CBT data, which may not adequately account for\u0000substantial wake metabolic activity and sleep effects on CBT that confound and\u0000mask circadian effects, and thus estimates of the circadian-related Tmin. This\u0000study introduced a novel physiology-grounded analytic approach to separate\u0000circadian from non-circadian effects on CBT, which we compared against\u0000traditional cosine-based methods. The dataset comprised 33 healthy participants\u0000attending a 39-hour in-laboratory study with an initial overnight sleep\u0000followed by an extended wake period. CBT data were collected at 30-second\u0000intervals via ingestible capsules. Our design captured CBT during both the\u0000baseline sleep period and during extended wake period (without sleep) and\u0000allowed us to model the influence of circadian and non-circadian effects of\u0000sleep, wake, and activity on CBT using physiology-guided generalized additive\u0000models. Model fits and estimated Tmin inferred from extended wake without sleep\u0000were compared with traditional cosine-based models fits. Compared to the\u0000traditional cosine model, the new model exhibited superior fits to CBT (Pearson\u0000R 0.90 [95%CI; [0.83 - 0.96] versus 0.81 [0.55-0.93]). The difference between\u0000estimated vs measured circadian Tmin, derived from the day without sleep, was\u0000better fit with our method (0.2 [-0.5,0.3] hours) versus previous methods (1.4\u0000[1.1 to 1.7] hours). This new method provides superior demasking of\u0000non-circadian influences compared to traditional cosine methods, including the\u0000removal of a sleep-related bias towards an earlier estimate of circadian Tmin.","PeriodicalId":501266,"journal":{"name":"arXiv - QuanBio - Quantitative Methods","volume":"275 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142213353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0