首页 > 最新文献

ArXiv最新文献

英文 中文
Impact of Stain Variation and Color Normalization for Prognostic Predictions in Pathology. 染色变异和颜色归一化对病理学预后预测的影响
Pub Date : 2024-09-12
Siyu Steven Lin, Haowen Zhou, Richard J Cote, Mark Watson, Ramaswamy Govindan, Changhuei Yang

In recent years, deep neural networks (DNNs) have demonstrated remarkable performance in pathology applications, potentially even outperforming expert pathologists due to their ability to learn subtle features from large datasets. One complication in preparing digital pathology datasets for DNN tasks is variation in tinctorial qualities. A common way to address this is to perform stain normalization on the images. In this study, we show that a well-trained DNN model trained on one batch of histological slides failed to generalize to another batch prepared at a different time from the same tissue blocks, even when stain normalization methods were applied. This study used sample data from a previously reported DNN that was able to identify patients with early stage non-small cell lung cancer (NSCLC) whose tumors did and did not metastasize, with high accuracy, based on training and then testing of digital images from H&E stained primary tumor tissue sections processed at the same time. In this study we obtained a new series of histologic slides from the adjacent recuts of same tissue blocks processed in the same lab but at a different time. We found that the DNN trained on the either batch of slides/images was unable to generalize and failed to predict progression in the other batch of slides/images (AUCcross-batch = 0.52 - 0.53 compared to AUCsame-batch = 0.74 - 0.81). The failure to generalize did not improve even when the tinctorial difference correction were made through either traditional color-tuning or stain normalization with the help of a Cycle Generative Adversarial Network (CycleGAN) process. This highlights the need to develop an entirely new way to process and collect consistent microscopy images from histologic slides that can be used to both train and allow for the general application of predictive DNN algorithms.

近年来,深度神经网络(DNN)在病理学应用中表现出了不俗的性能,甚至有可能超越病理专家,因为它们能够从大型数据集中学习微妙的特征。为 DNN 任务准备数字病理数据集的一个复杂问题是切面质量的变化。解决这一问题的常用方法是对图像进行染色归一化处理。在本研究中,我们发现在一批组织学切片上训练有素的 DNN 模型无法泛化到另一批在不同时间从相同组织块中制备的切片上,即使应用了染色归一化方法也是如此。本研究使用了以前报道过的 DNN 的样本数据,该 DNN 能够根据同时处理的 H&E 染色原发肿瘤组织切片的数字图像进行训练和测试,高精度地识别出肿瘤发生转移和未发生转移的早期非小细胞肺癌(NSCLC)患者。在这项研究中,我们获得了一系列新的组织切片,这些切片来自在同一实验室但在不同时间处理的相同组织块的相邻重切部分。我们发现,在其中一批切片/图像上训练的 DNN 无法泛化,也无法预测另一批切片/图像的进展(AUC_cross-batch = 0.52 - 0.53,而 AUC_same-batch = 0.74 - 0.81)。即使通过传统的颜色调整或借助循环生成对抗网络(CycleGAN)过程进行染色归一化,也无法改善无法泛化的问题。这突出表明,我们需要开发一种全新的方法来处理和收集来自组织切片的一致显微图像,这种方法既可用于训练预测性 DNN 算法,也可用于预测性 DNN 算法的普遍应用。
{"title":"Impact of Stain Variation and Color Normalization for Prognostic Predictions in Pathology.","authors":"Siyu Steven Lin, Haowen Zhou, Richard J Cote, Mark Watson, Ramaswamy Govindan, Changhuei Yang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>In recent years, deep neural networks (DNNs) have demonstrated remarkable performance in pathology applications, potentially even outperforming expert pathologists due to their ability to learn subtle features from large datasets. One complication in preparing digital pathology datasets for DNN tasks is variation in tinctorial qualities. A common way to address this is to perform stain normalization on the images. In this study, we show that a well-trained DNN model trained on one batch of histological slides failed to generalize to another batch prepared at a different time from the same tissue blocks, even when stain normalization methods were applied. This study used sample data from a previously reported DNN that was able to identify patients with early stage non-small cell lung cancer (NSCLC) whose tumors did and did not metastasize, with high accuracy, based on training and then testing of digital images from H&E stained primary tumor tissue sections processed at the same time. In this study we obtained a new series of histologic slides from the adjacent recuts of same tissue blocks processed in the same lab but at a different time. We found that the DNN trained on the either batch of slides/images was unable to generalize and failed to predict progression in the other batch of slides/images (AUC<sub>cross-batch</sub> = 0.52 - 0.53 compared to AUC<sub>same-batch</sub> = 0.74 - 0.81). The failure to generalize did not improve even when the tinctorial difference correction were made through either traditional color-tuning or stain normalization with the help of a Cycle Generative Adversarial Network (CycleGAN) process. This highlights the need to develop an entirely new way to process and collect consistent microscopy images from histologic slides that can be used to both train and allow for the general application of predictive DNN algorithms.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419173/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Digital Volumetric Biopsy Cores Improve Gleason Grading of Prostate Cancer Using Deep Learning. 数字容积活检核心利用深度学习改进前列腺癌的格里森分级
Pub Date : 2024-09-12
Ekaterina Redekop, Mara Pleasure, Zichen Wang, Anthony Sisk, Yang Zong, Kimberly Flores, William Speier, Corey W Arnold

Prostate cancer (PCa) was the most frequently diagnosed cancer among American men in 2023 [1]. The histological grading of biopsies is essential for diagnosis, and various deep learning-based solutions have been developed to assist with this task. Existing deep learning frameworks are typically applied to individual 2D cross-sections sliced from 3D biopsy tissue specimens. This process impedes the analysis of complex tissue structures such as glands, which can vary depending on the tissue slice examined. We propose a novel digital pathology data source called a "volumetric core," obtained via the extraction and co-alignment of serially sectioned tissue sections using a novel morphology-preserving alignment framework. We trained an attention-based multiple-instance learning (ABMIL) framework on deep features extracted from volumetric patches to automatically classify the Gleason Grade Group (GGG). To handle volumetric patches, we used a modified video transformer with a deep feature extractor pretrained using self-supervised learning. We ran our morphology preserving alignment framework to construct 10,210 volumetric cores, leaving out 30% for pretraining. The rest of the dataset was used to train ABMIL, which resulted in a 0.958 macro-average AUC, 0.671 F1 score, 0.661 precision, and 0.695 recall averaged across all five GGG significantly outperforming the 2D baselines.

前列腺癌(PCa)是 2023 年美国男性中最常诊断出的癌症。活检组织学分级对于诊断至关重要,目前已开发出各种基于深度学习的解决方案来协助完成这项任务。现有的深度学习框架通常应用于从三维活检组织标本中切片的单个二维横截面。这一过程阻碍了对复杂组织结构(如腺体)的分析,因为腺体结构会因检查的组织切片不同而变化。我们提出了一种名为 "体积核心 "的新型数字病理学数据源,它是通过使用新型形态保存配准框架提取并共同配准连续切片的组织切片而获得的。我们对基于注意力的多实例学习(ABMIL)框架进行了训练,利用从体积斑块中提取的深度特征自动对格里森等级组(GGG)进行分类。为了处理体积斑块,我们使用了改进的视频转换器,并使用自监督学习对深度特征提取器进行了预训练。我们使用形态保存配准框架构建了 10,210 个体积核心,其中 30% 用于预训练。数据集的其余部分用于训练 ABMIL,其结果是,在所有五个 GGG 中,ABMIL 的宏观平均 AUC 为 0.958,F1 得分为 0.671,精确度为 0.661,召回率为 0.695,明显优于 2D 基线。
{"title":"Digital Volumetric Biopsy Cores Improve Gleason Grading of Prostate Cancer Using Deep Learning.","authors":"Ekaterina Redekop, Mara Pleasure, Zichen Wang, Anthony Sisk, Yang Zong, Kimberly Flores, William Speier, Corey W Arnold","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Prostate cancer (PCa) was the most frequently diagnosed cancer among American men in 2023 [1]. The histological grading of biopsies is essential for diagnosis, and various deep learning-based solutions have been developed to assist with this task. Existing deep learning frameworks are typically applied to individual 2D cross-sections sliced from 3D biopsy tissue specimens. This process impedes the analysis of complex tissue structures such as glands, which can vary depending on the tissue slice examined. We propose a novel digital pathology data source called a \"volumetric core,\" obtained via the extraction and co-alignment of serially sectioned tissue sections using a novel morphology-preserving alignment framework. We trained an attention-based multiple-instance learning (ABMIL) framework on deep features extracted from volumetric patches to automatically classify the Gleason Grade Group (GGG). To handle volumetric patches, we used a modified video transformer with a deep feature extractor pretrained using self-supervised learning. We ran our morphology preserving alignment framework to construct 10,210 volumetric cores, leaving out 30% for pretraining. The rest of the dataset was used to train ABMIL, which resulted in a 0.958 macro-average AUC, 0.671 F1 score, 0.661 precision, and 0.695 recall averaged across all five GGG significantly outperforming the 2D baselines.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419188/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detailed delineation of the fetal brain in diffusion MRI via multi-task learning. 通过多任务学习在弥散核磁共振成像中详细描述胎儿大脑
Pub Date : 2024-09-12
Davood Karimi, Camilo Calixto, Haykel Snoussi, Maria Camila Cortes-Albornoz, Clemente Velasco-Annis, Caitlin Rollins, Camilo Jaimes, Ali Gholipour, Simon K Warfield

Diffusion-weighted MRI is increasingly used to study the normal and abnormal development of fetal brain inutero. Recent studies have shown that dMRI can offer invaluable insights into the neurodevelopmental processes in the fetal stage. However, because of the low data quality and rapid brain development, reliable analysis of fetal dMRI data requires dedicated computational methods that are currently unavailable. The lack of automated methods for fast, accurate, and reproducible data analysis has seriously limited our ability to tap the potential of fetal brain dMRI for medical and scientific applications. In this work, we developed and validated a unified computational framework to (1) segment the brain tissue into white matter, cortical/subcortical gray matter, and cerebrospinal fluid, (2) segment 31 distinct white matter tracts, and (3) parcellate the brain's cortex and delineate the deep gray nuclei and white matter structures into 96 anatomically meaningful regions. We utilized a set of manual, semi-automatic, and automatic approaches to annotate 97 fetal brains. Using these labels, we developed and validated a multi-task deep learning method to perform the three computations. Our evaluations show that the new method can accurately carry out all three tasks, achieving a mean Dice similarity coefficient of 0.865 on tissue segmentation, 0.825 on white matter tract segmentation, and 0.819 on parcellation. The proposed method can greatly advance the field of fetal neuroimaging as it can lead to substantial improvements in fetal brain tractography, tract-specific analysis, and structural connectivity assessment.

弥散加权磁共振成像(DMRI)越来越多地被用于研究胎儿在胎儿期大脑的正常和异常发育。最近的研究表明,dMRI 能为了解胎儿期的神经发育过程提供宝贵的信息。然而,由于数据质量低且大脑发育迅速,对胎儿 dMRI 数据进行可靠分析需要专门的计算方法,而这些方法目前尚不可用。缺乏快速、准确、可重复的数据分析自动化方法严重限制了我们挖掘胎儿大脑 dMRI 在医学和科学应用方面的潜力。在这项工作中,我们开发并验证了一个统一的计算框架,用于:(1)将脑组织分割为白质、皮层/皮层下灰质和脑脊液;(2)分割 31 个不同的白质束;(3)将大脑皮层划分为不同的区域,并将深灰核和白质结构划分为 96 个有解剖学意义的区域。我们采用了一套手动、半自动和自动方法来注释 97 胎儿大脑。利用这些标签,我们开发并验证了一种多任务深度学习方法来执行这三种计算。我们的评估结果表明,新方法可以准确地完成所有三项任务,组织分割的平均 Dice 相似性系数达到 0.865,白质束分割的平均 Dice 相似性系数达到 0.825,解析的平均 Dice 相似性系数达到 0.819。所提出的方法能极大地推动胎儿神经成像领域的发展,因为它能在胎儿大脑束成像、束特异性分析和结构连接性评估方面带来实质性的改进。
{"title":"Detailed delineation of the fetal brain in diffusion MRI via multi-task learning.","authors":"Davood Karimi, Camilo Calixto, Haykel Snoussi, Maria Camila Cortes-Albornoz, Clemente Velasco-Annis, Caitlin Rollins, Camilo Jaimes, Ali Gholipour, Simon K Warfield","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Diffusion-weighted MRI is increasingly used to study the normal and abnormal development of fetal brain inutero. Recent studies have shown that dMRI can offer invaluable insights into the neurodevelopmental processes in the fetal stage. However, because of the low data quality and rapid brain development, reliable analysis of fetal dMRI data requires dedicated computational methods that are currently unavailable. The lack of automated methods for fast, accurate, and reproducible data analysis has seriously limited our ability to tap the potential of fetal brain dMRI for medical and scientific applications. In this work, we developed and validated a unified computational framework to (1) segment the brain tissue into white matter, cortical/subcortical gray matter, and cerebrospinal fluid, (2) segment 31 distinct white matter tracts, and (3) parcellate the brain's cortex and delineate the deep gray nuclei and white matter structures into 96 anatomically meaningful regions. We utilized a set of manual, semi-automatic, and automatic approaches to annotate 97 fetal brains. Using these labels, we developed and validated a multi-task deep learning method to perform the three computations. Our evaluations show that the new method can accurately carry out all three tasks, achieving a mean Dice similarity coefficient of 0.865 on tissue segmentation, 0.825 on white matter tract segmentation, and 0.819 on parcellation. The proposed method can greatly advance the field of fetal neuroimaging as it can lead to substantial improvements in fetal brain tractography, tract-specific analysis, and structural connectivity assessment.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419175/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRIME: Phase Reversed Interleaved Multi-Echo acquisition enables highly accelerated distortion-free diffusion MRI. PRIME:相位反转交错多重回波采集技术可实现高度加速的无失真弥散磁共振成像。
Pub Date : 2024-09-11
Yohan Jun, Qiang Liu, Ting Gong, Jaejin Cho, Shohei Fujita, Xingwang Yong, Susie Y Huang, Lipeng Ning, Anastasia Yendiki, Yogesh Rathi, Berkin Bilgic

Purpose: To develop and evaluate a new pulse sequence for highly accelerated distortion-free diffusion MRI (dMRI) by inserting an additional echo without prolonging TR, when generalized slice dithered enhanced resolution (gSlider) radiofrequency encoding is used for volumetric acquisition.

Methods: A phase-reversed interleaved multi-echo acquisition (PRIME) was developed for rapid, high-resolution, and distortion-free dMRI, which includes two echoes where the first echo is for target diffusion-weighted imaging (DWI) acquisition with high-resolution and the second echo is acquired with either 1) lower-resolution for high-fidelity field map estimation, or 2) matching resolution to enable efficient diffusion relaxometry acquisitions. The sequence was evaluated on in vivo data acquired from healthy volunteers on clinical and Connectome 2.0 scanners.

Results: In vivo experiments demonstrated that 1) high in-plane acceleration (Rin-plane of 5-fold with 2D partial Fourier) was achieved using the high-fidelity field maps estimated from the second echo, which was made at a lower resolution/acceleration to increase its SNR while matching the effective echo spacing of the first readout, 2) high-resolution diffusion relaxometry parameters were estimated from dual-echo PRIME data using a white matter model of multi-TE spherical mean technique (MTE-SMT), and 3) high-fidelity mesoscale DWI at 550 um isotropic resolution could be obtained in vivo by capitalizing on the high-performance gradients of the Connectome 2.0 scanner.

Conclusion: The proposed PRIME sequence enabled highly accelerated, high-resolution, and distortion-free dMRI using an additional echo without prolonging scan time when gSlider encoding is utilized.

目的:开发并评估一种新的脉冲序列,当使用广义切片抖动增强分辨率(gSlider)射频编码进行容积采集时,通过插入额外的回波,在不延长TR的情况下高度加速无失真扩散磁共振成像(dMRI):该序列包括两个回波,其中第一个回波用于目标扩散加权成像(DWI)的高分辨率采集,第二个回波用于 1) 较低分辨率的高保真场图估算,或 2) 匹配分辨率的高效扩散弛豫测量采集。在临床和 Connectome 2.0 扫描仪上对健康志愿者采集的体内数据对该序列进行了评估:活体实验表明:1)利用从第二次回波中估算出的高保真场图实现了高平面内加速(二维部分傅立叶的 5 倍 Rin-平面),第二次回波以较低的分辨率/加速度进行,以提高 SNR,同时与第一次读出的有效回波间距相匹配、2)利用多回波球面均值技术(MTE-SMT)的白质模型,从双回波 PRIME 数据中估算出高分辨率的扩散弛豫参数;3)利用 Connectome 2.0 扫描仪的高性能梯度,在体内获得 550 um 各向同性分辨率的高保真中尺度 DWI。结论:结论:当使用 gSlider 编码时,所提出的 PRIME 序列可利用额外的回波实现高度加速、高分辨率和无失真 dMRI,而不会延长扫描时间。
{"title":"PRIME: Phase Reversed Interleaved Multi-Echo acquisition enables highly accelerated distortion-free diffusion MRI.","authors":"Yohan Jun, Qiang Liu, Ting Gong, Jaejin Cho, Shohei Fujita, Xingwang Yong, Susie Y Huang, Lipeng Ning, Anastasia Yendiki, Yogesh Rathi, Berkin Bilgic","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Purpose: </strong>To develop and evaluate a new pulse sequence for highly accelerated distortion-free diffusion MRI (dMRI) by inserting an additional echo without prolonging TR, when generalized slice dithered enhanced resolution (gSlider) radiofrequency encoding is used for volumetric acquisition.</p><p><strong>Methods: </strong>A phase-reversed interleaved multi-echo acquisition (PRIME) was developed for rapid, high-resolution, and distortion-free dMRI, which includes two echoes where the first echo is for target diffusion-weighted imaging (DWI) acquisition with high-resolution and the second echo is acquired with either 1) lower-resolution for high-fidelity field map estimation, or 2) matching resolution to enable efficient diffusion relaxometry acquisitions. The sequence was evaluated on in vivo data acquired from healthy volunteers on clinical and Connectome 2.0 scanners.</p><p><strong>Results: </strong>In vivo experiments demonstrated that 1) high in-plane acceleration (Rin-plane of 5-fold with 2D partial Fourier) was achieved using the high-fidelity field maps estimated from the second echo, which was made at a lower resolution/acceleration to increase its SNR while matching the effective echo spacing of the first readout, 2) high-resolution diffusion relaxometry parameters were estimated from dual-echo PRIME data using a white matter model of multi-TE spherical mean technique (MTE-SMT), and 3) high-fidelity mesoscale DWI at 550 um isotropic resolution could be obtained in vivo by capitalizing on the high-performance gradients of the Connectome 2.0 scanner.</p><p><strong>Conclusion: </strong>The proposed PRIME sequence enabled highly accelerated, high-resolution, and distortion-free dMRI using an additional echo without prolonging scan time when gSlider encoding is utilized.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419176/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Sample Size for Supervised Machine Learning with Bulk Transcriptomic Sequencing: A Learning Curve Approach. 利用批量转录组测序优化监督机器学习的样本量:学习曲线法
Pub Date : 2024-09-10
Yunhui Qi, Xinyi Wang, Li-Xuan Qin

Accurate sample classification using transcriptomics data is crucial for advancing personalized medicine. Achieving this goal necessitates determining a suitable sample size that ensures adequate statistical power without undue resource allocation. Current sample size calculation methods rely on assumptions and algorithms that may not align with supervised machine learning techniques for sample classification. Addressing this critical methodological gap, we present a novel computational approach that establishes the power-versus-sample-size relationship by employing a data augmentation strategy followed by fitting a learning curve. We comprehensively evaluated its performance for microRNA and RNA sequencing data, considering diverse data characteristics and algorithm configurations, based on a spectrum of evaluation metrics. To foster accessibility and reproducibility, the Python and R code for implementing our approach is available on GitHub. Its deployment will significantly facilitate the adoption of machine learning in transcriptomics studies and accelerate their translation into clinically useful classifiers for personalized treatment.

利用转录组学数据对样本进行准确分类对于推进个性化医疗至关重要。要实现这一目标,就必须确定合适的样本量,以确保在不过度分配资源的情况下获得足够的统计能力。目前的样本量计算方法所依赖的假设和算法可能与用于样本分类的监督机器学习技术不一致。针对这一关键的方法论空白,我们提出了一种新颖的计算方法,该方法通过采用数据扩增策略,然后拟合学习曲线,来建立统计能力与样本量之间的关系。考虑到不同的数据特征和算法配置,我们基于一系列评价指标,全面评估了该方法在微 RNA 和 RNA 测序数据方面的性能。为了提高可访问性和可重复性,我们在 GitHub 上提供了实现我们方法的 Python 和 R 代码。它的部署将极大地促进机器学习在转录组学研究中的应用,并加速将其转化为临床上有用的分类器,用于个性化治疗。
{"title":"Optimizing Sample Size for Supervised Machine Learning with Bulk Transcriptomic Sequencing: A Learning Curve Approach.","authors":"Yunhui Qi, Xinyi Wang, Li-Xuan Qin","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Accurate sample classification using transcriptomics data is crucial for advancing personalized medicine. Achieving this goal necessitates determining a suitable sample size that ensures adequate statistical power without undue resource allocation. Current sample size calculation methods rely on assumptions and algorithms that may not align with supervised machine learning techniques for sample classification. Addressing this critical methodological gap, we present a novel computational approach that establishes the power-versus-sample-size relationship by employing a data augmentation strategy followed by fitting a learning curve. We comprehensively evaluated its performance for microRNA and RNA sequencing data, considering diverse data characteristics and algorithm configurations, based on a spectrum of evaluation metrics. To foster accessibility and reproducibility, the Python and R code for implementing our approach is available on GitHub. Its deployment will significantly facilitate the adoption of machine learning in transcriptomics studies and accelerate their translation into clinically useful classifiers for personalized treatment.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419172/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Initial Experience of Metabolic Imaging with Hyperpolarized [1-13C]pyruvate MRI in Kidney Transplant Patients. 肾移植患者使用超极化[1-13C]丙酮酸核磁共振成像进行代谢成像的初步经验。
Pub Date : 2024-09-10
Xiaoxi Liu, Ying-Chieh Lai, Di Cui, Shiang-Cheng Kung, Meyeon Park, Laszik Zoltan, Peder E Z Larson, Zhen J Wang

Background: Kidney transplant is the treatment of choice for patients with end-stage renal disease. Early detection of allograft injury is important to delay or prevent irreversible damage.

Purpose: To investigate the feasibility of hyperpolarized (HP) [1-13C]pyruvate MRI for assessing kidney allograft metabolism.

Study type: Prospective.

Subjects: 6 participants (mean age, 45.2 ± 12.4 years, 2 females) scheduled for kidney allograft biopsy and 5 patients (mean age, 59.6 ± 10.4 years, 2 females) with renal cell carcinoma (RCC).

Field strength/sequence: 3 Tesla, T2-weighted fast spin echo, multi-echo gradient echo, single shot diffusion-weighted echo-planar imaging, and time-resolved HP 13C metabolite-selective imaging.

Assessment: Five of the six kidney allograft participants underwent biopsy after MRI. Estimated glomerular filtration rate (eGFR) and urine protein-to-creatine ratio (uPCR) were collected within 4 weeks of MRI. Kidney metabolism was quantified from HP [1-13C]pyruvate MRI using the lactate-to-pyruvate ratio in allograft kidneys and non-tumor bearing kidneys from RCC patients.

Statistical tests: Descriptive statistics (mean ± standard deviation).

Results: Biopsy was performed a mean of 9 days (range 5-19 days) after HP [1-13C]pyruvate MRI. Three biopsies were normal, one showed low-grade fibrosis and one showed moderate microvascular inflammation. All had stable functioning allografts with eGFR > 60 mL/min/1.73 m2 and normal uPCR. One participant who did not undergo biopsy had reduced eGFR of 49 mL/min/1.73 m2 and elevated uPCR. The mean lactate-to-pyruvate ratio was 0.373 in participants with normal findings (n = 3) and 0.552 in participants with abnormal findings (n = 2). The lactate-to-pyruvate ratio was highest (0.847) in the participant with reduced eGFR and elevated uPRC. Native non-tumor bearing kidneys had a mean lactate-to-pyruvate ratio of 0.309.

Data conclusion: Stable allografts with normal findings at biopsy showed lactate-to-pyruvate ratios similar to native non-tumor bearing kidneys, whereas allografts with abnormal findings showed higher lactate-to-pyruvate ratios.

背景:肾移植是终末期肾病患者的首选治疗方法。目的:研究超极化(HP)[1-13C]丙酮酸核磁共振成像评估肾移植新陈代谢的可行性:6名计划进行肾移植活检的参与者(平均年龄45.2 +- 12.4岁,2名女性)和5名肾细胞癌(RCC)患者(平均年龄59.6 +- 10.4岁,2名女性):6名肾脏异体移植参与者中有5人在核磁共振成像后进行了活组织检查。在核磁共振成像后 4 周内收集估计肾小球滤过率(eGFR)和尿蛋白肌酸比(uPCR)。通过HP [1-13C]丙酮酸核磁共振成像,利用RCC患者的异体肾脏和无肿瘤肾脏的乳酸与丙酮酸比率对肾脏代谢进行量化:活组织检查平均在 HP [1-13C]丙酮酸 MRI 后 9 天(5-19 天)进行。三例活检结果正常,一例显示低度纤维化,一例显示中度微血管炎症。所有受试者的异体移植物功能稳定,eGFR > 60 mL/min/1.73 m2,uPCR 正常。一名未进行活组织检查的参与者的 eGFR 降低至 49 mL/min/1.73 m2,uPCR 升高。检查结果正常的参与者(3 人)的平均乳酸丙酮酸比值为 0.373,检查结果异常的参与者(2 人)的平均乳酸丙酮酸比值为 0.552。eGFR降低和uPRC升高的参与者的乳酸丙酮酸比值最高(0.847)。数据结论:数据结论:活检结果正常的稳定异体移植肾的乳酸丙酮酸比值与原生非肿瘤肾相似,而活检结果异常的异体移植肾的乳酸丙酮酸比值较高。
{"title":"Initial Experience of Metabolic Imaging with Hyperpolarized [1-<sup>13</sup>C]pyruvate MRI in Kidney Transplant Patients.","authors":"Xiaoxi Liu, Ying-Chieh Lai, Di Cui, Shiang-Cheng Kung, Meyeon Park, Laszik Zoltan, Peder E Z Larson, Zhen J Wang","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Background: </strong>Kidney transplant is the treatment of choice for patients with end-stage renal disease. Early detection of allograft injury is important to delay or prevent irreversible damage.</p><p><strong>Purpose: </strong>To investigate the feasibility of hyperpolarized (HP) [1-<sup>13</sup>C]pyruvate MRI for assessing kidney allograft metabolism.</p><p><strong>Study type: </strong>Prospective.</p><p><strong>Subjects: </strong>6 participants (mean age, 45.2 ± 12.4 years, 2 females) scheduled for kidney allograft biopsy and 5 patients (mean age, 59.6 ± 10.4 years, 2 females) with renal cell carcinoma (RCC).</p><p><strong>Field strength/sequence: </strong>3 Tesla, T2-weighted fast spin echo, multi-echo gradient echo, single shot diffusion-weighted echo-planar imaging, and time-resolved HP <sup>13</sup>C metabolite-selective imaging.</p><p><strong>Assessment: </strong>Five of the six kidney allograft participants underwent biopsy after MRI. Estimated glomerular filtration rate (eGFR) and urine protein-to-creatine ratio (uPCR) were collected within 4 weeks of MRI. Kidney metabolism was quantified from HP [1-<sup>13</sup>C]pyruvate MRI using the lactate-to-pyruvate ratio in allograft kidneys and non-tumor bearing kidneys from RCC patients.</p><p><strong>Statistical tests: </strong>Descriptive statistics (mean ± standard deviation).</p><p><strong>Results: </strong>Biopsy was performed a mean of 9 days (range 5-19 days) after HP [1-<sup>13</sup>C]pyruvate MRI. Three biopsies were normal, one showed low-grade fibrosis and one showed moderate microvascular inflammation. All had stable functioning allografts with eGFR > 60 mL/min/1.73 m<sup>2</sup> and normal uPCR. One participant who did not undergo biopsy had reduced eGFR of 49 mL/min/1.73 m<sup>2</sup> and elevated uPCR. The mean lactate-to-pyruvate ratio was 0.373 in participants with normal findings (n = 3) and 0.552 in participants with abnormal findings (n = 2). The lactate-to-pyruvate ratio was highest (0.847) in the participant with reduced eGFR and elevated uPRC. Native non-tumor bearing kidneys had a mean lactate-to-pyruvate ratio of 0.309.</p><p><strong>Data conclusion: </strong>Stable allografts with normal findings at biopsy showed lactate-to-pyruvate ratios similar to native non-tumor bearing kidneys, whereas allografts with abnormal findings showed higher lactate-to-pyruvate ratios.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
1D Thermoembolization Model Using CT Imaging Data for Porcine Liver. 利用猪肝 CT 成像数据建立一维热栓塞模型
Pub Date : 2024-09-10
Rohan Amare, Danielle Stolley, Steve Parrish, Megan Jacobsen, Rick Layman, Chimamanda Santos, Beatrice Riviere, Natalie Fowlkes, David Fuentes, Erik Cressman

Objective: Innovative therapies such as thermoembolization are expected to play an important role in improving care for patients with diseases such as hepatocellular carcinoma. Thermoembolization is a minimally invasive strategy that combines thermal ablation and embolization in a single procedure. This approach exploits an exothermic chemical reaction that occurs when an acid chloride is delivered via an endovascular route. However, comprehension of the complexities of the biophysics of thermoembolization is challenging. Mathematical models can aid in understanding such complex processes and assisting clinicians in making informed decisions. In this study, we used a Hagen-Poiseuille 1D blood flow model to predict the mass transport and possible embolization locations in a porcine hepatic artery.

Method: The 1D flow model was used on imaging data of in-vivo embolization imaging data of three pigs. The hydrolysis time constant of acid chloride chemical reaction was optimized for each pig, and LOOCV method was used to test the model's predictive ability.

Conclusion: This basic model provided a balanced accuracy rate of 66.8% for identifying the possible locations of damage in the hepatic artery. Use of the model provides an initial understanding of the vascular transport phenomena that are predicted to occur as a result of thermoembolization.

目的:热栓塞等创新疗法有望在改善肝细胞癌等疾病患者的护理方面发挥重要作用。热栓塞疗法是一种微创策略,它将热消融和栓塞结合在一次手术中。这种方法利用了通过血管内途径输送氯化酸时发生的放热化学反应。然而,理解热栓塞的生物物理学复杂性具有挑战性。数学模型可以帮助理解这种复杂的过程,并协助临床医生做出明智的决定。在这项研究中,我们使用哈根-普瓦塞耶一维血流模型来预测猪肝动脉中的质量传输和可能的栓塞位置:方法:将一维血流模型用于三头猪的活体栓塞成像数据。方法:将一维血流模型用于三头猪的活体栓塞成像数据,对每头猪的氯化酸化学反应水解时间常数进行了优化,并使用 LOOCV 方法测试模型的预测能力:结论:这一基本模型在确定肝动脉栓塞的可能位置方面提供了 66.8% 的均衡准确率。使用该模型可以初步了解热栓塞预计会发生的血管运输现象。
{"title":"1D Thermoembolization Model Using CT Imaging Data for Porcine Liver.","authors":"Rohan Amare, Danielle Stolley, Steve Parrish, Megan Jacobsen, Rick Layman, Chimamanda Santos, Beatrice Riviere, Natalie Fowlkes, David Fuentes, Erik Cressman","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Objective: </strong>Innovative therapies such as thermoembolization are expected to play an important role in improving care for patients with diseases such as hepatocellular carcinoma. Thermoembolization is a minimally invasive strategy that combines thermal ablation and embolization in a single procedure. This approach exploits an exothermic chemical reaction that occurs when an acid chloride is delivered via an endovascular route. However, comprehension of the complexities of the biophysics of thermoembolization is challenging. Mathematical models can aid in understanding such complex processes and assisting clinicians in making informed decisions. In this study, we used a Hagen-Poiseuille 1D blood flow model to predict the mass transport and possible embolization locations in a porcine hepatic artery.</p><p><strong>Method: </strong>The 1D flow model was used on imaging data of <i>in-vivo</i> embolization imaging data of three pigs. The hydrolysis time constant of acid chloride chemical reaction was optimized for each pig, and LOOCV method was used to test the model's predictive ability.</p><p><strong>Conclusion: </strong>This basic model provided a balanced accuracy rate of 66.8% for identifying the possible locations of damage in the hepatic artery. Use of the model provides an initial understanding of the vascular transport phenomena that are predicted to occur as a result of thermoembolization.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419193/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Limits on the computational expressivity of non-equilibrium biophysical processes. 非平衡生物物理过程计算表达能力的限制。
Pub Date : 2024-09-09
Carlos Floyd, Aaron R Dinner, Arvind Murugan, Suriyanarayanan Vaikuntanathan

Many biological decision-making processes can be viewed as performing a classification task over a set of inputs, using various chemical and physical processes as "biological hardware." In this context, it is important to understand the inherent limitations on the computational expressivity of classification functions instantiated in biophysical media. Here, we model biochemical networks as Markov jump processes and train them to perform classification tasks, allowing us to investigate their computational expressivity. We reveal several unanticipated limitations on the input-output functions of these systems, which we further show can be lifted using biochemical mechanisms like promiscuous binding. We analyze the flexibility and sharpness of decision boundaries as well as the classification capacity of these networks. Additionally, we identify distinctive signatures of networks trained for classification, including the emergence of correlated subsets of spanning trees and a creased "energy landscape" with multiple basins. Our findings have implications for understanding and designing physical computing systems in both biological and synthetic chemical settings.

许多生物决策过程可以被视为利用各种化学和物理过程作为 "生物硬件",对一组输入执行分类任务。在这种情况下,了解在生物物理介质中实例化的分类函数在计算表达能力方面的固有限制就显得非常重要。在这里,我们将生化网络建模为马尔可夫跃迁过程,并训练它们执行分类任务,从而研究它们的计算表达能力。我们揭示了这些系统的输入-输出功能的几个意料之外的限制,并进一步证明这些限制可以通过杂交结合等生化机制来解除。我们分析了决策边界的灵活性和清晰度,以及这些网络的分类能力。此外,我们还发现了经过分类训练的网络的独特特征,包括生成树的相关子集的出现和具有多个盆地的褶皱 "能量景观"。我们的发现对于理解和设计生物与合成化学环境中的物理计算系统具有重要意义。
{"title":"Limits on the computational expressivity of non-equilibrium biophysical processes.","authors":"Carlos Floyd, Aaron R Dinner, Arvind Murugan, Suriyanarayanan Vaikuntanathan","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Many biological decision-making processes can be viewed as performing a classification task over a set of inputs, using various chemical and physical processes as \"biological hardware.\" In this context, it is important to understand the inherent limitations on the computational expressivity of classification functions instantiated in biophysical media. Here, we model biochemical networks as Markov jump processes and train them to perform classification tasks, allowing us to investigate their computational expressivity. We reveal several unanticipated limitations on the input-output functions of these systems, which we further show can be lifted using biochemical mechanisms like promiscuous binding. We analyze the flexibility and sharpness of decision boundaries as well as the classification capacity of these networks. Additionally, we identify distinctive signatures of networks trained for classification, including the emergence of correlated subsets of spanning trees and a creased \"energy landscape\" with multiple basins. Our findings have implications for understanding and designing physical computing systems in both biological and synthetic chemical settings.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419181/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Magnetization transfer explains most of the T 1 variability in the MRI literature. 磁化传递可以解释磁共振成像文献中大部分的 $T_1$ 变异。
Pub Date : 2024-09-09
Jakob Assländer

Purpose: To identify the predominant source of the T 1 variability described in the literature, which ranges from 0.6-1.1s for brain white matter at 3T.

Methods: 25 T 1 -mapping methods from the literature were simulated with a mono-exponential and magnetization-transfer (MT) models, each followed by mono-exponential fitting. A single set of model parameters was assumed for the simulation of all methods, and these parameters were estimated by fitting the simulation-based to the corresponding literature T 1 values of white matter at 3T.

Results: Mono-exponential simulations suggest good inter-method reproducibility and fail to explain the highly variable T 1 estimates in the literature. In contrast, MT simulations suggest that a mono-exponential fit results in a variable T 1 and explain up to 62% of the literature's variability.

Conclusion: The results suggest that a mono-exponential model does not adequately describe longitudinal relaxation in biological tissue. Therefore, T 1 in biological tissue should be considered only a semi-quantitative metric that is inherently contingent upon the imaging methodology; and comparisons between different T 1 -mapping methods and the use of simplistic spin systems-such as doped-water phantoms-for validation should be viewed with caution.

目的:确定文献中描述的$T_1$变异性的主要来源,文献中描述的脑白质在 3 T 下的$T_1$变异性范围为 0.6 - 1.1 秒。方法:用单指数模型和磁化转移(MT)模型模拟文献中的 25 种$T_1$绘图方法,每种方法都进行了单指数拟合。所有方法的模拟都假定有一组模型参数,这些参数是通过将模拟结果与相应文献中 3 T 白质的 $T_1$ 值进行拟合而估算出来的:单指数模拟表明方法间具有良好的可重复性,但无法解释文献中高度多变的 T_1$ 估计值。与此相反,MT 模拟表明单指数拟合会产生可变的 $T_1$,并能解释文献中高达 62% 的可变性:结果表明,单指数模型不能充分描述生物组织的纵向弛豫。因此,生物组织中的 $T_1$ 只应被视为一种半定量指标,其本身取决于成像方法;应谨慎看待不同 $T_1$ 绘图方法之间的比较以及使用简单自旋系统(如掺水模型)进行验证。
{"title":"<ArticleTitle xmlns:ns0=\"http://www.w3.org/1998/Math/MathML\">Magnetization transfer explains most of the <ns0:math> <ns0:mrow><ns0:msub><ns0:mi>T</ns0:mi> <ns0:mn>1</ns0:mn></ns0:msub> </ns0:mrow> </ns0:math> variability in the MRI literature.","authors":"Jakob Assländer","doi":"","DOIUrl":"","url":null,"abstract":"<p><strong>Purpose: </strong>To identify the predominant source of the <math> <mrow><msub><mi>T</mi> <mn>1</mn></msub> </mrow> </math> variability described in the literature, which ranges from 0.6-1.1s for brain white matter at 3T.</p><p><strong>Methods: </strong>25 <math> <mrow><msub><mi>T</mi> <mn>1</mn></msub> </mrow> </math> -mapping methods from the literature were simulated with a mono-exponential and magnetization-transfer (MT) models, each followed by mono-exponential fitting. A single set of model parameters was assumed for the simulation of all methods, and these parameters were estimated by fitting the simulation-based to the corresponding literature <math> <mrow><msub><mi>T</mi> <mn>1</mn></msub> </mrow> </math> values of white matter at 3T.</p><p><strong>Results: </strong>Mono-exponential simulations suggest good inter-method reproducibility and fail to explain the highly variable <math> <mrow><msub><mi>T</mi> <mn>1</mn></msub> </mrow> </math> estimates in the literature. In contrast, MT simulations suggest that a mono-exponential fit results in a variable <math> <mrow><msub><mi>T</mi> <mn>1</mn></msub> </mrow> </math> and explain up to 62% of the literature's variability.</p><p><strong>Conclusion: </strong>The results suggest that a mono-exponential model does not adequately describe longitudinal relaxation in biological tissue. Therefore, <math> <mrow><msub><mi>T</mi> <mn>1</mn></msub> </mrow> </math> in biological tissue should be considered only a <i>semi-quantitative</i> metric that is inherently contingent upon the imaging methodology; and comparisons between different <math> <mrow><msub><mi>T</mi> <mn>1</mn></msub> </mrow> </math> -mapping methods and the use of simplistic spin systems-such as doped-water phantoms-for validation should be viewed with caution.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419191/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Tree Probability Estimation with Stochastic Optimization and Variance Reduction. 利用随机优化和降低方差改进树概率估计。
Pub Date : 2024-09-09
Tianyu Xie, Musu Yuan, Minghua Deng, Cheng Zhang

Probability estimation of tree topologies is one of the fundamental tasks in phylogenetic inference. The recently proposed subsplit Bayesian networks (SBNs) provide a powerful probabilistic graphical model for tree topology probability estimation by properly leveraging the hierarchical structure of phylogenetic trees. However, the expectation maximization (EM) method currently used for learning SBN parameters does not scale up to large data sets. In this paper, we introduce several computationally efficient methods for training SBNs and show that variance reduction could be the key for better performance. Furthermore, we also introduce the variance reduction technique to improve the optimization of SBN parameters for variational Bayesian phylogenetic inference (VBPI). Extensive synthetic and real data experiments demonstrate that our methods outperform previous baseline methods on the tasks of tree topology probability estimation as well as Bayesian phylogenetic inference using SBNs.

树拓扑的概率估计是系统发育推断的基本任务之一。最近提出的子分裂贝叶斯网络(SBN)通过适当利用系统发生树的层次结构,为树拓扑概率估计提供了一个强大的概率图形模型。然而,目前用于学习 SBN 参数的期望最大化(EM)方法无法扩展到大型数据集。在本文中,我们介绍了几种用于训练 SBN 的高效计算方法,并表明方差缩小可能是提高性能的关键。此外,我们还介绍了方差缩小技术,以改进变异贝叶斯系统发育推断(VBPI)的 SBN 参数优化。广泛的合成和真实数据实验证明,在树拓扑概率估计以及使用 SBN 的贝叶斯系统发育推断任务上,我们的方法优于以前的基线方法。
{"title":"Improving Tree Probability Estimation with Stochastic Optimization and Variance Reduction.","authors":"Tianyu Xie, Musu Yuan, Minghua Deng, Cheng Zhang","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Probability estimation of tree topologies is one of the fundamental tasks in phylogenetic inference. The recently proposed subsplit Bayesian networks (SBNs) provide a powerful probabilistic graphical model for tree topology probability estimation by properly leveraging the hierarchical structure of phylogenetic trees. However, the expectation maximization (EM) method currently used for learning SBN parameters does not scale up to large data sets. In this paper, we introduce several computationally efficient methods for training SBNs and show that variance reduction could be the key for better performance. Furthermore, we also introduce the variance reduction technique to improve the optimization of SBN parameters for variational Bayesian phylogenetic inference (VBPI). Extensive synthetic and real data experiments demonstrate that our methods outperform previous baseline methods on the tasks of tree topology probability estimation as well as Bayesian phylogenetic inference using SBNs.</p>","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11419179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142309264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ArXiv
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1