ArXiv最新文献_第2页

Evaluation of radiomic feature harmonization techniques for benign and malignant pulmonary nodules.

ArXiv

Pub Date : 2025-01-15

Claire Huchthausen, Menglin Shi, Gabriel L A de Sousa, Jonathan Colen, Emery Shelley, James Larner, Einsley Janowski, Krishni Wijesooriya

Background: Conventional methods for detecting lung cancer early are often qualitative and subject to interpretation. Radiomics provides quantitative characteristics of pulmonary nodules (PNs) in medical images, but variability in medical image acquisition is an obstacle to consistent clinical application of these quantitative features. Correcting radiomic features' dependency on acquisition parameters is problematic when combining data from benign and malignant PNs, as is necessary when the goal is to diagnose lung cancer, because acquisition effects may differ between them due to their biological differences.Purpose: We evaluated whether we must account for biological differences between benign and malignant PNs when correcting the dependency of radiomic features on acquisition parameters, and we compared methods of doing this using ComBat harmonization.Methods: This study used a dataset of 567 clinical chest CT scans containing both malignant and benign PNs. Scans were grouped as benign, malignant, or lung cancer screening (mixed benign and malignant). Preprocessing and feature extraction from ROIs were performed using PyRadiomics. Optimized Permutation Nested ComBat harmonization was performed on extracted features to account for variability in four imaging protocols: contrast enhancement, scanner manufacturer, acquisition voltage, focal spot size. Three methods were compared: harmonizing all data collectively in the standard manner, harmonizing all data with a covariate to preserve distinctions between subgroups, and harmonizing subgroups separately. A significant (p ≤ 0.05) Kruskal-Wallis test determined whether harmonization removed a feature's dependency on an acquisition parameter. A LASSO-SVM pipeline was trained using acquisition-independent radiomic features to predict whether PNs were malignant or benign. To evaluate the predictive information made available by each harmonization method, the trained harmonization estimators and predictive model were applied to a corresponding unseen test set. Harmonization and predictive performance metrics were assessed over 10 trials of 5-fold cross validation.Results: Kruskal-Wallis defined an average 2.1% of features (95% CI: 1.9-2.4%) as acquisition-independent when data were harmonized collectively, 27.3% of features (95% CI: 25.7-28.9%) as acquisition-independent when harmonized with a covariate, and 90.9% of features (95% CI: 90.4-91.5%) as acquisition-independent when harmonized separately. LASSO-SVM models trained on data harmonized separately or with a covariate had higher ROC-AUC for lung cancer screening scans than models trained on data harmonized without distinction between benign and malignant tissues (Delong test, Holm-Bonferroni adjusted p ≤ 0.05). There was not a conclusive difference in ROC-AUC between models trained on data harmonized separately and models trained on data har

{"title":"Evaluation of radiomic feature harmonization techniques for benign and malignant pulmonary nodules.","authors":"Claire Huchthausen, Menglin Shi, Gabriel L A de Sousa, Jonathan Colen, Emery Shelley, James Larner, Einsley Janowski, Krishni Wijesooriya","doi":"","DOIUrl":"","url":null,"abstract":"Background: Conventional methods for detecting lung cancer early are often qualitative and subject to interpretation. Radiomics provides quantitative characteristics of pulmonary nodules (PNs) in medical images, but variability in medical image acquisition is an obstacle to consistent clinical application of these quantitative features. Correcting radiomic features' dependency on acquisition parameters is problematic when combining data from benign and malignant PNs, as is necessary when the goal is to diagnose lung cancer, because acquisition effects may differ between them due to their biological differences.Purpose: We evaluated whether we must account for biological differences between benign and malignant PNs when correcting the dependency of radiomic features on acquisition parameters, and we compared methods of doing this using ComBat harmonization.Methods: This study used a dataset of 567 clinical chest CT scans containing both malignant and benign PNs. Scans were grouped as benign, malignant, or lung cancer screening (mixed benign and malignant). Preprocessing and feature extraction from ROIs were performed using PyRadiomics. Optimized Permutation Nested ComBat harmonization was performed on extracted features to account for variability in four imaging protocols: contrast enhancement, scanner manufacturer, acquisition voltage, focal spot size. Three methods were compared: harmonizing all data collectively in the standard manner, harmonizing all data with a covariate to preserve distinctions between subgroups, and harmonizing subgroups separately. A significant (p ≤ 0.05) Kruskal-Wallis test determined whether harmonization removed a feature's dependency on an acquisition parameter. A LASSO-SVM pipeline was trained using acquisition-independent radiomic features to predict whether PNs were malignant or benign. To evaluate the predictive information made available by each harmonization method, the trained harmonization estimators and predictive model were applied to a corresponding unseen test set. Harmonization and predictive performance metrics were assessed over 10 trials of 5-fold cross validation.Results: Kruskal-Wallis defined an average 2.1% of features (95% CI: 1.9-2.4%) as acquisition-independent when data were harmonized collectively, 27.3% of features (95% CI: 25.7-28.9%) as acquisition-independent when harmonized with a covariate, and 90.9% of features (95% CI: 90.4-91.5%) as acquisition-independent when harmonized separately. LASSO-SVM models trained on data harmonized separately or with a covariate had higher ROC-AUC for lung cancer screening scans than models trained on data harmonized without distinction between benign and malignant tissues (Delong test, Holm-Bonferroni adjusted p ≤ 0.05). There was not a conclusive difference in ROC-AUC between models trained on data harmonized separately and models trained on data har","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774441/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A unified model for the origins of spongiform degeneration and other neuropathological features in prion diseases.

ArXiv

Pub Date : 2025-01-15

Gerold Schmitt-Ulms, Xinzhu Wang, Joel Watts, Stephanie Booth, Holger Wille, Wenda Zhao

Decades after their initial observation in prion-infected brain tissues, the identities of virus-like dense particles, varicose tubules, and oval bodies containing parallel bands and fibrils have remained elusive. Our recent work revealed that a phenotype of dilation of the endoplasmic reticulum (ER), most notable for the perinuclear space (PNS), contributes to spongiform degeneration. To assess the significance of this phenotype for the etiology of prion diseases, we explored whether it can be functionally linked to other neuropathological hallmarks observed in these diseases, as this would indicate it to be a central event. Having surveyed the neuropathological record and other distant literature niches, we propose a model in which pathogenic forms of the prion protein poison raft domains, including essential Na+, K+-ATPases (NKAs) embedded within them, thereby triggering an ER-centered cellular rescue program coordinated by the unfolded protein response (UPR). The execution of this program stalls general protein synthesis, causing the deterioration of synaptic spines. As the disease progresses, cells selectively increase sterol biosynthesis, along with ribosome and ER biogenesis. These adaptive rescue attempts cause morphological changes to the ER which manifest as ER dilation or ER hypertrophy in a manner that is influenced by Ca2+ influx into the cell. The nuclear-to-cytoplasmic transport of mRNAs and tRNAs interrupts in late stage disease, thereby depriving ribosomes of supplies and inducing them to aggregate into a paracrystalline form. In support of this model, we share previously reported data, whose features are consistent with the interpretation that 1) the phenotype of ER dilation is observed in major prion diseases, 2) varicose tubules and oval bodies represent ER hypertrophy, and 3) virus-like dense particles are paracrystalline aggregates of inactive ribosomes.

{"title":"A unified model for the origins of spongiform degeneration and other neuropathological features in prion diseases.","authors":"Gerold Schmitt-Ulms, Xinzhu Wang, Joel Watts, Stephanie Booth, Holger Wille, Wenda Zhao","doi":"","DOIUrl":"","url":null,"abstract":"Decades after their initial observation in prion-infected brain tissues, the identities of virus-like dense particles, varicose tubules, and oval bodies containing parallel bands and fibrils have remained elusive. Our recent work revealed that a phenotype of dilation of the endoplasmic reticulum (ER), most notable for the perinuclear space (PNS), contributes to spongiform degeneration. To assess the significance of this phenotype for the etiology of prion diseases, we explored whether it can be functionally linked to other neuropathological hallmarks observed in these diseases, as this would indicate it to be a central event. Having surveyed the neuropathological record and other distant literature niches, we propose a model in which pathogenic forms of the prion protein poison raft domains, including essential Na+, K+-ATPases (NKAs) embedded within them, thereby triggering an ER-centered cellular rescue program coordinated by the unfolded protein response (UPR). The execution of this program stalls general protein synthesis, causing the deterioration of synaptic spines. As the disease progresses, cells selectively increase sterol biosynthesis, along with ribosome and ER biogenesis. These adaptive rescue attempts cause morphological changes to the ER which manifest as ER dilation or ER hypertrophy in a manner that is influenced by Ca2+ influx into the cell. The nuclear-to-cytoplasmic transport of mRNAs and tRNAs interrupts in late stage disease, thereby depriving ribosomes of supplies and inducing them to aggregate into a paracrystalline form. In support of this model, we share previously reported data, whose features are consistent with the interpretation that 1) the phenotype of ER dilation is observed in major prion diseases, 2) varicose tubules and oval bodies represent ER hypertrophy, and 3) virus-like dense particles are paracrystalline aggregates of inactive ribosomes.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction.

ArXiv

Pub Date : 2025-01-15

Alex Morehead, Jianlin Cheng

Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.

{"title":"FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction.","authors":"Alex Morehead, Jianlin Cheng","doi":"","DOIUrl":"","url":null,"abstract":"Powerful generative AI models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, the first deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the well-known PoseBusters Benchmark dataset, FlowDock outperforms single-sequence AlphaFold 3 with a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock outperforms single-sequence AlphaFold 3 and matches single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143061258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unconditional stability of a recurrent neural circuit implementing divisive normalization. 实施除法归一化的递归神经回路的无条件稳定性

ArXiv

Pub Date : 2025-01-15

Shivang Rawat, David J Heeger, Stefano Martiniani

Stability in recurrent neural models poses a significant challenge, particularly in developing biologically plausible neurodynamical models that can be seamlessly trained. Traditional cortical circuit models are notoriously difficult to train due to expansive nonlinearities in the dynamical system, leading to an optimization problem with nonlinear stability constraints that are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in tasks involving sequential data but lack biological plausibility and interpretability. In this work, we address these challenges by linking dynamic divisive normalization (DN) to the stability of "oscillatory recurrent gated neural integrator circuits" (ORGaNICs), a biologically plausible recurrent cortical circuit model that dynamically achieves DN and that has been shown to simulate a wide range of neurophysiological phenomena. By using the indirect method of Lyapunov, we prove the remarkable property of unconditional local stability for an arbitrary-dimensional ORGaNICs circuit when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a system of coupled damped harmonic oscillators, which enables us to derive the circuit's energy function, providing a normative principle of what the circuit, and individual neurons, aim to accomplish. Further, for a generic recurrent weight matrix, we prove the stability of the 2D model and demonstrate empirically that stability holds in higher dimensions. Finally, we show that ORGaNICs can be trained by backpropagation through time without gradient clipping/scaling, thanks to its intrinsic stability property and adaptive time constants, which address the problems of exploding, vanishing, and oscillating gradients. By evaluating the model's performance on RNN benchmarks, we find that ORGaNICs outperform alternative neurodynamical models on static image classification tasks and perform comparably to LSTMs on sequential tasks.

循环神经模型的稳定性是一项重大挑战，尤其是在开发可无缝训练的生物学上可信的神经动力学模型方面。传统的大脑皮层电路模型由于动态系统中的扩展非线性而难以训练，导致优化问题中的非线性稳定性约束难以施加。相反，递归神经网络（RNN）在涉及序列数据的任务中表现出色，但缺乏生物合理性和可解释性。在这项工作中，我们通过将动态分裂归一化（DN）与 ORGaNICs 的稳定性联系起来来应对这些挑战。ORGaNICs 是一种生物学上可信的递归皮层电路模型，可动态实现 DN，并已被证明能模拟各种神经生理现象。通过使用李亚普诺夫的间接方法，我们证明了任意维度的 ORGaNICs 电路在递归权重矩阵为同一值时无条件局部稳定的显著特性。因此，我们将 ORGaNICs 与耦合阻尼谐振子系统联系起来，从而推导出电路的能量函数，为电路和单个神经元的目标提供了规范原理。此外，对于一般的递归权重矩阵，我们证明了二维模型的稳定性，并通过经验证明稳定性在更高维度上也是成立的。最后，我们证明 ORGaNICs 可以通过时间反向传播进行训练，而无需梯度剪切/缩放，这要归功于其固有的稳定性和自适应时间常数，它们解决了梯度爆炸、消失和振荡的问题。通过在 RNN 基准上评估该模型的性能，我们发现 ORGaNIC 在静态图像分类任务中的表现优于其他神经动力学模型，而在顺序任务中的表现则与 LSTM 不相上下。

{"title":"Unconditional stability of a recurrent neural circuit implementing divisive normalization.","authors":"Shivang Rawat, David J Heeger, Stefano Martiniani","doi":"","DOIUrl":"","url":null,"abstract":"Stability in recurrent neural models poses a significant challenge, particularly in developing biologically plausible neurodynamical models that can be seamlessly trained. Traditional cortical circuit models are notoriously difficult to train due to expansive nonlinearities in the dynamical system, leading to an optimization problem with nonlinear stability constraints that are difficult to impose. Conversely, recurrent neural networks (RNNs) excel in tasks involving sequential data but lack biological plausibility and interpretability. In this work, we address these challenges by linking dynamic divisive normalization (DN) to the stability of \"oscillatory recurrent gated neural integrator circuits\" (ORGaNICs), a biologically plausible recurrent cortical circuit model that dynamically achieves DN and that has been shown to simulate a wide range of neurophysiological phenomena. By using the indirect method of Lyapunov, we prove the remarkable property of unconditional local stability for an arbitrary-dimensional ORGaNICs circuit when the recurrent weight matrix is the identity. We thus connect ORGaNICs to a system of coupled damped harmonic oscillators, which enables us to derive the circuit's energy function, providing a normative principle of what the circuit, and individual neurons, aim to accomplish. Further, for a generic recurrent weight matrix, we prove the stability of the 2D model and demonstrate empirically that stability holds in higher dimensions. Finally, we show that ORGaNICs can be trained by backpropagation through time without gradient clipping/scaling, thanks to its intrinsic stability property and adaptive time constants, which address the problems of exploding, vanishing, and oscillating gradients. By evaluating the model's performance on RNN benchmarks, we find that ORGaNICs outperform alternative neurodynamical models on static image classification tasks and perform comparably to LSTMs on sequential tasks.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11469413/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A mathematical language for linking fine-scale structure in spikes from hundreds to thousands of neurons with behaviour. 将数百到数千个神经元尖峰的精细结构与行为联系起来的数学语言。

ArXiv

Pub Date : 2025-01-15

Alexandra N Busch, Roberto C Budzinski, Federico W Pasini, Ján Mináč, Jonathan A Michaels, Megan Roussy, Roberto A Gulli, Benjamin W Corrigan, J Andrew Pruszynski, Julio Martinez-Trujillo, Lyle E Muller

Recent advances in neural recording technology allow simultaneously recording action potentials from hundreds to thousands of neurons in awake, behaving animals. However, characterizing spike patterns in the resulting data, and linking these patterns to behaviour, remains a challenging task. The lack of a rigorous mathematical language for variable numbers of events (spikes) emitted by multiple agents (neurons) is an important limiting factor. We introduce a new mathematical operation to decompose complex spike patterns into a set of simple, structured elements. This creates a mathematical language that allows comparing spike patterns across trials, detecting sub-patterns, and making links to behaviour via a clear distance measure. We first demonstrate the method using Neuropixel recordings from macaque motor cortex. We then apply the method to dual Utah array recordings from macaque prefrontal cortex, where this technique reveals previously unseen structure that can predict both memory-guided decisions and errors in a virtual-reality working memory task. These results demonstrate that this technique provides a powerful new approach to understand structure in the spike times of neural populations, at a scale that will continue to grow more and more rapidly in upcoming years.

神经记录技术的最新进展允许同时记录醒着的、有行为的动物的数百到数千个神经元的动作电位。然而，表征结果数据中的峰值模式，并将这些模式与行为联系起来，仍然是一项具有挑战性的任务。对于由多个代理（神经元）发出的可变数量的事件（尖峰）缺乏严格的数学语言是一个重要的限制因素。我们引入了一种新的数学运算，将复杂的尖峰图案分解成一组简单的、结构化的元素。这创造了一种数学语言，可以比较试验中的峰值模式，检测子模式，并通过明确的距离测量与行为联系起来。我们将该方法应用于猕猴前额叶皮层的双犹他阵列记录，该技术揭示了以前看不见的结构，可以预测记忆引导的决策和虚拟现实工作记忆任务中的错误。这些结果表明，该技术提供了一种强大的新方法来理解神经群体尖峰时间的结构，其规模将在未来几年继续快速增长。

{"title":"A mathematical language for linking fine-scale structure in spikes from hundreds to thousands of neurons with behaviour.","authors":"Alexandra N Busch, Roberto C Budzinski, Federico W Pasini, Ján Mináč, Jonathan A Michaels, Megan Roussy, Roberto A Gulli, Benjamin W Corrigan, J Andrew Pruszynski, Julio Martinez-Trujillo, Lyle E Muller","doi":"","DOIUrl":"","url":null,"abstract":"Recent advances in neural recording technology allow simultaneously recording action potentials from hundreds to thousands of neurons in awake, behaving animals. However, characterizing spike patterns in the resulting data, and linking these patterns to behaviour, remains a challenging task. The lack of a rigorous mathematical language for variable numbers of events (spikes) emitted by multiple agents (neurons) is an important limiting factor. We introduce a new mathematical operation to decompose complex spike patterns into a set of simple, structured elements. This creates a mathematical language that allows comparing spike patterns across trials, detecting sub-patterns, and making links to behaviour via a clear distance measure. We first demonstrate the method using Neuropixel recordings from macaque motor cortex. We then apply the method to dual Utah array recordings from macaque prefrontal cortex, where this technique reveals previously unseen structure that can predict both memory-guided decisions and errors in a virtual-reality working memory task. These results demonstrate that this technique provides a powerful new approach to understand structure in the spike times of neural populations, at a scale that will continue to grow more and more rapidly in upcoming years.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11643227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Vessel Bifurcation Landmark Pair Dataset for Abdominal CT Deformable Image Registration (DIR) Validation.

ArXiv

Pub Date : 2025-01-15

Edward R Criscuolo, Yao Hao, Zhendong Zhang, Trevor McKeown, Deshan Yang

Purpose: Deformable image registration (DIR) is an enabling technology in many diagnostic and therapeutic tasks. Despite this, DIR algorithms have limited clinical use, largely due to a lack of benchmark datasets for quality assurance during development. DIRs of intra-patient abdominal CTs are among the most challenging registration scenarios due to significant organ deformations and inconsistent image content. To support future algorithm development, here we introduce our first-of-its-kind abdominal CT DIR benchmark dataset, comprising large numbers of highly accurate landmark pairs on matching blood vessel bifurcations.

Acquisition and validation methods: Abdominal CT image pairs of 30 patients were acquired from several publicly available repositories as well as the authors' institution with IRB approval. The two CTs of each pair were originally acquired for the same patient but on different days. An image processing workflow was developed and applied to each CT image pair: 1) Abdominal organs were segmented with a deep learning model, and image intensity within organ masks was overwritten. 2) Matching image patches were manually identified between two CTs of each image pair. 3) Vessel bifurcation landmarks were labeled on one image of each image patch pair. 4) Image patches were deformably registered, and landmarks were projected onto the second image 5) Landmark pair locations were refined manually or with an automated process. This workflow resulted in 1895 total landmark pairs, or 63 per case on average. Estimates of the landmark pair accuracy using digital phantoms were 0.7mm +/- 1.2 mm.

Data format and usage notes: The data is published in Zenodo at https://doi.org/10.5281/zenodo.14362785. Instructions for use can be found at https://github.com/deshanyang/Abdominal-DIR-QA.

Potential applications: This dataset is a first-of-its-kind for abdominal DIR validation. The number, accuracy, and distribution of landmark pairs will allow for robust validation of DIR algorithms with precision beyond what is currently available.

{"title":"A Vessel Bifurcation Landmark Pair Dataset for Abdominal CT Deformable Image Registration (DIR) Validation.","authors":"Edward R Criscuolo, Yao Hao, Zhendong Zhang, Trevor McKeown, Deshan Yang","doi":"","DOIUrl":"","url":null,"abstract":"Purpose: Deformable image registration (DIR) is an enabling technology in many diagnostic and therapeutic tasks. Despite this, DIR algorithms have limited clinical use, largely due to a lack of benchmark datasets for quality assurance during development. DIRs of intra-patient abdominal CTs are among the most challenging registration scenarios due to significant organ deformations and inconsistent image content. To support future algorithm development, here we introduce our first-of-its-kind abdominal CT DIR benchmark dataset, comprising large numbers of highly accurate landmark pairs on matching blood vessel bifurcations.Acquisition and validation methods: Abdominal CT image pairs of 30 patients were acquired from several publicly available repositories as well as the authors' institution with IRB approval. The two CTs of each pair were originally acquired for the same patient but on different days. An image processing workflow was developed and applied to each CT image pair: 1) Abdominal organs were segmented with a deep learning model, and image intensity within organ masks was overwritten. 2) Matching image patches were manually identified between two CTs of each image pair. 3) Vessel bifurcation landmarks were labeled on one image of each image patch pair. 4) Image patches were deformably registered, and landmarks were projected onto the second image 5) Landmark pair locations were refined manually or with an automated process. This workflow resulted in 1895 total landmark pairs, or 63 per case on average. Estimates of the landmark pair accuracy using digital phantoms were 0.7mm +/- 1.2 mm.Data format and usage notes: The data is published in Zenodo at https://doi.org/10.5281/zenodo.14362785. Instructions for use can be found at https://github.com/deshanyang/Abdominal-DIR-QA.Potential applications: This dataset is a first-of-its-kind for abdominal DIR validation. The number, accuracy, and distribution of landmark pairs will allow for robust validation of DIR algorithms with precision beyond what is currently available.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774459/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143060794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AI Foundation Models for Wearable Movement Data in Mental Health Research. 注意力是所有你需要的活动？用于心理健康研究的可穿戴加速度计数据基础模型。

ArXiv

Pub Date : 2025-01-14

Franklin Y Ruan, Aiwei Zhang, Jenny Y Oh, SouYoung Jin, Nicholas C Jacobson

Pretrained foundation models and transformer architectures have driven the success of large language models (LLMs) and other modern AI breakthroughs. However, similar advancements in health data modeling remain limited due to the need for innovative adaptations. Wearable movement data offers a valuable avenue for exploration, as it's a core feature in nearly all commercial smartwatches, well established in clinical and mental health research, and the sequential nature of the data shares similarities to language. We introduce the Pretrained Actigraphy Transformer (PAT), the first open source foundation model designed for time-series wearable movement data. Leveraging transformer-based architectures and novel techniques, such as patch embeddings, and pretraining on data from 29,307 participants in a national U.S. sample, PAT achieves state-of-the-art performance in several mental health prediction tasks. PAT is also lightweight and easily interpretable, making it a robust tool for mental health research. GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/.

自20世纪70年代以来，可穿戴式加速度计（活动记录仪）为临床见解提供了有价值的数据，随着可穿戴设备的不断普及，它变得越来越重要。活动描记在研究和临床环境中的有效性在很大程度上取决于所使用的建模架构。为了解决这个问题，我们开发了预训练活动图转换器（PAT），这是第一个专门用于处理活动图的预训练和完全基于注意力的模型。在NHANES中，对29,307名参与者的活动描记进行了预训练，使PAT能够在心理健康领域的各种活动描记预测任务中进行微调，即使在数据有限的情况下也能提供最先进的性能。例如，当训练预测苯二氮卓类药物的使用时，仅使用500名标记参与者的活动描记图，PAT在最佳基线上实现了8.8个百分点的AUC改善。PAT拥有不到200万个参数和内置的模型可解释性，功能强大，但易于在卫生研究环境中部署。GitHub: https://github.com/njacobsonlab/Pretrained-Actigraphy-Transformer/。

引用次数: 0

Avoiding subtraction and division of stochastic signals using normalizing flows: NFdeconvolve.

ArXiv

Pub Date : 2025-01-14

Pedro Pessoa, Max Schweiger, Lance W Q Xu, Tristan Manha, Ayush Saurabh, Julian Antolin Camarena, Steve Pressé

Across the scientific realm, we find ourselves subtracting or dividing stochastic signals. For instance, consider a stochastic realization, $x$, generated from the addition or multiplication of two stochastic signals $a$ and $b$, namely $x=a+b$ or $x = ab$. For the $x=a+b$ example, $a$ can be fluorescence background and $b$ the signal of interest whose statistics are to be learned from the measured $x$. Similarly, when writing $x=ab$, $a$ can be thought of as the illumination intensity and $b$ the density of fluorescent molecules of interest. Yet dividing or subtracting stochastic signals amplifies noise, and we ask instead whether, using the statistics of $a$ and the measurement of $x$ as input, we can recover the statistics of $b$. Here, we show how normalizing flows can generate an approximation of the probability distribution over $b$, thereby avoiding subtraction or division altogether. This method is implemented in our software package, NFdeconvolve, available on GitHub with a tutorial linked in the main text.

引用次数: 0

MassSpecGym: A benchmark for the discovery and identification of molecules. MassSpecGym：发现和识别分子的基准。

ArXiv

Pub Date : 2025-01-14

Roman Bushuiev, Anton Bushuiev, Niek F de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, Marcus Ludwig, Nils A Haupt, Apurva Kalia, Corinna Brungs, Robin Schmid, Russell Greiner, Bo Wang, David S Wishart, Li-Ping Liu, Juho Rousu, Wout Bittremieux, Hannes Rost, Tytus D Mak, Soha Hassoun, Florian Huber, Justin J J van der Hooft, Michael A Stravs, Sebastian Böcker, Josef Sivic, Tomáš Pluskal

The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at url{https://github.com/pluskal-lab/MassSpecGym}.

发现和鉴定生物与环境样本中的分子对于推动生物医学和化学科学的发展至关重要。串联质谱（MS/MS）是高通量阐明分子结构的领先技术。然而，从质谱中解码分子结构是一项极具挑战性的工作，即使由人类专家来完成也是如此。因此，绝大多数获得的 MS/MS 图谱仍然无法解读，从而限制了我们对潜在（生物）化学过程的了解。尽管从 MS/MS 图谱预测分子结构的机器学习应用取得了几十年的进展，但由于缺乏标准数据集和评估协议，新方法的开发受到严重阻碍。为了解决这个问题，我们提出了 MassSpecGym -- 第一个从 MS/MS 数据中发现和识别分子的综合基准。我们的基准包括最大的公开高质量标记 MS/MS 图谱集，并定义了三个 MS/MS 注释挑战：文本{de novo}分子结构生成、分子检索和光谱模拟。它包括新的评估指标和泛化需求的数据拆分，从而实现了 MS/MS 注释任务的标准化，并使广泛的机器学习社区能够解决这一问题。MassSpecGym 在 url{https://github.com/pluskal-lab/MassSpecGym} 上公开发布。

{"title":"MassSpecGym: A benchmark for the discovery and identification of molecules.","authors":"Roman Bushuiev, Anton Bushuiev, Niek F de Jonge, Adamo Young, Fleming Kretschmer, Raman Samusevich, Janne Heirman, Fei Wang, Luke Zhang, Kai Dührkop, Marcus Ludwig, Nils A Haupt, Apurva Kalia, Corinna Brungs, Robin Schmid, Russell Greiner, Bo Wang, David S Wishart, Li-Ping Liu, Juho Rousu, Wout Bittremieux, Hannes Rost, Tytus D Mak, Soha Hassoun, Florian Huber, Justin J J van der Hooft, Michael A Stravs, Sebastian Böcker, Josef Sivic, Tomáš Pluskal","doi":"","DOIUrl":"","url":null,"abstract":"The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality labeled MS/MS spectra and defines three MS/MS annotation challenges: textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at url{https://github.com/pluskal-lab/MassSpecGym}.","PeriodicalId":93888,"journal":{"name":"ArXiv","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11581121/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142689948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Divergences between Language Models and Human Brains.

ArXiv

Pub Date : 2025-01-13

Yuchen Zhou, Emmy Liu, Graham Neubig, Michael J Tarr, Leila Wehbe

Do machines and humans process language in similar ways? Recent research has hinted at the affirmative, showing that human neural activity can be effectively predicted using the internal representations of language models (LMs). Although such results are thought to reflect shared computational principles between LMs and human brains, there are also clear differences in how LMs and humans represent and use language. In this work, we systematically explore the divergences between human and machine language processing by examining the differences between LM representations and human brain responses to language as measured by Magnetoencephalography (MEG) across two datasets in which subjects read and listened to narrative stories. Using an LLM-based data-driven approach, we identify two domains that LMs do not capture well: social/emotional intelligence and physical commonsense. We validate these findings with human behavioral experiments and hypothesize that the gap is due to insufficient representations of social/emotional and physical knowledge in LMs. Our results show that fine-tuning LMs on these domains can improve their alignment with human brain responses.

引用次数: 0