Pub Date : 2025-12-30Epub Date: 2025-10-28DOI: 10.1088/2632-2153/ae11f8
Tien Comlekoglu, J Quetzalcoatl Toledo-Marín, Douglas W DeSimone, Shayn M Peirce, Geoffrey Fox, James A Glazier
Mechanistic, multicellular, agent-based models are commonly used to investigate tissue, organ, and organism-scale biology at single-cell resolution. The Cellular-Potts Model (CPM) is a powerful and popular framework for developing and interrogating these models. CPMs become computationally expensive at large space- and time- scales making application and investigation of developed models difficult. Surrogate models may allow for the accelerated evaluation of CPMs of complex biological systems. However, the stochastic nature of these models means each set of parameters may give rise to different model configurations, complicating surrogate model development. In this work, we leverage denoising diffusion probabilistic models (DDPMs) to train a generative AI surrogate of a CPM used to investigate in vitro vasculogenesis. We describe the use of an image classifier to learn the characteristics that define unique areas of a 2-dimensional parameter space. We then apply this classifier to aid in surrogate model selection and verification. Our CPM model surrogate generates model configurations 20,000 timesteps ahead of a reference configuration and demonstrates approximately a 22x reduction in computational time as compared to native code execution. Our work represents a step towards the implementation of DDPMs to develop digital twins of stochastic biological systems.
{"title":"Generative diffusion model surrogates for mechanistic agent-based biological models.","authors":"Tien Comlekoglu, J Quetzalcoatl Toledo-Marín, Douglas W DeSimone, Shayn M Peirce, Geoffrey Fox, James A Glazier","doi":"10.1088/2632-2153/ae11f8","DOIUrl":"10.1088/2632-2153/ae11f8","url":null,"abstract":"<p><p>Mechanistic, multicellular, agent-based models are commonly used to investigate tissue, organ, and organism-scale biology at single-cell resolution. The Cellular-Potts Model (CPM) is a powerful and popular framework for developing and interrogating these models. CPMs become computationally expensive at large space- and time- scales making application and investigation of developed models difficult. Surrogate models may allow for the accelerated evaluation of CPMs of complex biological systems. However, the stochastic nature of these models means each set of parameters may give rise to different model configurations, complicating surrogate model development. In this work, we leverage denoising diffusion probabilistic models (DDPMs) to train a generative AI surrogate of a CPM used to investigate <i>in vitro</i> vasculogenesis. We describe the use of an image classifier to learn the characteristics that define unique areas of a 2-dimensional parameter space. We then apply this classifier to aid in surrogate model selection and verification. Our CPM model surrogate generates model configurations 20,000 timesteps ahead of a reference configuration and demonstrates approximately a 22x reduction in computational time as compared to native code execution. Our work represents a step towards the implementation of DDPMs to develop digital twins of stochastic biological systems.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 4","pages":"045024"},"PeriodicalIF":4.6,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12570967/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145408961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-01Epub Date: 2025-10-29DOI: 10.1088/2632-2153/ae13d1
Guoping Xu, Christopher Kabat, You Zhang
Recent advances in medical image segmentation have been driven by deep learning; however, most existing methods remain limited by modality-specific designs and exhibit poor adaptability to dynamic medical imaging scenarios. The Segment Anything Model 2 (SAM2) and its related variants, which introduce a streaming memory mechanism for real-time video segmentation, present new opportunities for prompt-based, generalizable solutions. Nevertheless, adapting these models to medical video scenarios typically requires large-scale datasets for retraining or transfer learning, leading to high computational costs and the risk of catastrophic forgetting. To address these challenges, we propose DD-SAM2, an efficient adaptation framework for SAM2 that incorporates a Depthwise-Dilated Adapter (DD-Adapter) to enhance multi-scale feature extraction with minimal parameter overhead. This design enables effective fine-tuning of SAM2 on medical videos with limited training data. Unlike existing adapter-based methods focused solely on static images, DD-SAM2 fully exploits SAM2's streaming memory for medical video objects tracking and segmentation. Comprehensive evaluations on TrackRad2025 (tumor segmentation) and EchoNet-Dynamic (left ventricle tracking) datasets demonstrate superior performance, achieving Dice scores of 0.93±0.04 and 0.97±0.01, respectively. To the best of our knowledge, this work provides an initial attempt at systematically exploring adapter-based fine-tuning strategies for SAM2 applied medical video segmentation and tracking. Code, datasets, and models will be made publicly available at https://github.com/apple1986/DD-SAM2.
{"title":"Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2.","authors":"Guoping Xu, Christopher Kabat, You Zhang","doi":"10.1088/2632-2153/ae13d1","DOIUrl":"10.1088/2632-2153/ae13d1","url":null,"abstract":"<p><p>Recent advances in medical image segmentation have been driven by deep learning; however, most existing methods remain limited by modality-specific designs and exhibit poor adaptability to dynamic medical imaging scenarios. The Segment Anything Model 2 (SAM2) and its related variants, which introduce a streaming memory mechanism for real-time video segmentation, present new opportunities for prompt-based, generalizable solutions. Nevertheless, adapting these models to medical video scenarios typically requires large-scale datasets for retraining or transfer learning, leading to high computational costs and the risk of catastrophic forgetting. To address these challenges, we propose DD-SAM2, an efficient adaptation framework for SAM2 that incorporates a Depthwise-Dilated Adapter (DD-Adapter) to enhance multi-scale feature extraction with minimal parameter overhead. This design enables effective fine-tuning of SAM2 on medical videos with limited training data. Unlike existing adapter-based methods focused solely on static images, DD-SAM2 fully exploits SAM2's streaming memory for medical video objects tracking and segmentation. Comprehensive evaluations on TrackRad2025 (tumor segmentation) and EchoNet-Dynamic (left ventricle tracking) datasets demonstrate superior performance, achieving Dice scores of 0.93±0.04 and 0.97±0.01, respectively. To the best of our knowledge, this work provides an initial attempt at systematically exploring adapter-based fine-tuning strategies for SAM2 applied medical video segmentation and tracking. Code, datasets, and models will be made publicly available at https://github.com/apple1986/DD-SAM2.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 4","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12806169/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145999235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-30Epub Date: 2025-07-22DOI: 10.1088/2632-2153/adec3b
Pedro Pessoa, Paul Campitelli, Douglas P Shepherd, S Banu Ozkan, Steve Pressé
State space models, such as Mamba, have recently garnered attention in time series forecasting (TSF) due to their ability to capture sequence patterns. However, in electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8%. Similarly, in traffic occupancy benchmarks, the mean error reaches 18%. This discrepancy leaves us to wonder whether the prediction is simply inaccurate or falls within error given spread in historical data. To address this limitation, we propose a method to quantify the predictive uncertainty of Mamba forecasts. To achieve this, we propose a dual-network framework based on the Mamba architecture for probabilistic forecasting, where one network generates point forecasts while the other estimates predictive uncertainty by modeling variance. We abbreviate our tool, Mamba with probabilistic TSF, as Mamba-ProbTSF and the code for its implementation is available on GitHub https://github.com/PessoaP/Mamba-ProbTSF. Evaluating this approach on synthetic and real-world benchmark datasets, we find Kullback-Leibler divergence between the learned distributions and the data-which, in the limit of infinite data, should converge to zero if the model correctly captures the underlying probability distribution-reduced to the order of 10-3 for synthetic data and 10-1 for real-world benchmark. We find that in both the electricity consumption and traffic occupancy benchmark, the true trajectory stays within the predicted uncertainty interval at the two-sigma level about 95% of the time. We further compare Mamba-ProbTSF against leading probabilistic forecast methods, DeepAR and ARIMA, and show that our method consistently achieves lower forecast errors while offering more reliable uncertainty quantification. We end with a consideration of potential limitations, adjustments to improve performance, and considerations for applying this framework to processes for purely or largely stochastic dynamics where the stochastic changes accumulate as observed, for example, in pure Brownian motion or molecular dynamics trajectories.
状态空间模型,如Mamba,由于其捕获序列模式的能力,最近在时间序列预测(TSF)中引起了人们的关注。然而,在电力消耗基准中,曼巴预测的平均误差约为8%。同样,在交通占用率基准中,平均误差达到18%。这种差异让我们怀疑,这种预测是简单地不准确,还是在历史数据分布的情况下属于误差范围。为了解决这一限制,我们提出了一种量化曼巴预测的预测不确定性的方法。为了实现这一点,我们提出了一个基于Mamba架构的双网络框架,用于概率预测,其中一个网络生成点预测,而另一个网络通过建模方差来估计预测不确定性。我们将我们的工具Mamba with probabilistic TSF简称为Mamba- probtsf,其实现代码可在GitHub https://github.com/PessoaP/Mamba-ProbTSF上获得。在合成和真实世界的基准数据集上评估这种方法,我们发现学习分布和数据之间的Kullback-Leibler散度——在无限数据的限制下,如果模型正确捕获潜在的概率分布,应该收敛于零——对于合成数据减少到10-3的数量级,对于真实世界的基准降低到10-1。我们发现,在电力消耗和交通占用基准中,真实轨迹在大约95%的时间内保持在预测的2西格玛水平的不确定性区间内。我们进一步将Mamba-ProbTSF与领先的概率预测方法DeepAR和ARIMA进行了比较,结果表明,我们的方法在提供更可靠的不确定性量化的同时,始终实现更低的预测误差。我们最后考虑了潜在的局限性,改进性能的调整,以及将该框架应用于纯或大部分随机动力学过程的考虑,其中随机变化如观察到的那样累积,例如,在纯布朗运动或分子动力学轨迹中。
{"title":"Mamba time series forecasting with uncertainty quantification.","authors":"Pedro Pessoa, Paul Campitelli, Douglas P Shepherd, S Banu Ozkan, Steve Pressé","doi":"10.1088/2632-2153/adec3b","DOIUrl":"10.1088/2632-2153/adec3b","url":null,"abstract":"<p><p>State space models, such as Mamba, have recently garnered attention in time series forecasting (TSF) due to their ability to capture sequence patterns. However, in electricity consumption benchmarks, Mamba forecasts exhibit a mean error of approximately 8%. Similarly, in traffic occupancy benchmarks, the mean error reaches 18%. This discrepancy leaves us to wonder whether the prediction is simply inaccurate or falls within error given spread in historical data. To address this limitation, we propose a method to quantify the predictive uncertainty of Mamba forecasts. To achieve this, we propose a dual-network framework based on the Mamba architecture for probabilistic forecasting, where one network generates point forecasts while the other estimates predictive uncertainty by modeling variance. We abbreviate our tool, Mamba with probabilistic TSF, as Mamba-ProbTSF and the code for its implementation is available on GitHub https://github.com/PessoaP/Mamba-ProbTSF. Evaluating this approach on synthetic and real-world benchmark datasets, we find Kullback-Leibler divergence between the learned distributions and the data-which, in the limit of infinite data, should converge to zero if the model correctly captures the underlying probability distribution-reduced to the order of 10<sup>-3</sup> for synthetic data and 10<sup>-1</sup> for real-world benchmark. We find that in both the electricity consumption and traffic occupancy benchmark, the true trajectory stays within the predicted uncertainty interval at the two-sigma level about 95% of the time. We further compare Mamba-ProbTSF against leading probabilistic forecast methods, DeepAR and ARIMA, and show that our method consistently achieves lower forecast errors while offering more reliable uncertainty quantification. We end with a consideration of potential limitations, adjustments to improve performance, and considerations for applying this framework to processes for purely or largely stochastic dynamics where the stochastic changes accumulate as observed, for example, in pure Brownian motion or molecular dynamics trajectories.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 3","pages":"035012"},"PeriodicalIF":6.3,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12281171/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144699735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-30Epub Date: 2025-09-29DOI: 10.1088/2632-2153/ae011a
Yoel Zimmermann, Adib Bazgir, Alexander Al-Feghali, Mehrad Ansari, Joshua Bocarsly, L Catherine Brinson, Yuan Chiang, Defne Circi, Min-Hsueh Chiu, Nathan Daelman, Matthew L Evans, Abhijeet S Gangan, Janine George, Hassan Harb, Ghazal Khalighinejad, Sartaaj Takrim Khan, Sascha Klawohn, Magdalena Lederbauer, Soroush Mahjoubi, Bernadette Mohr, Seyed Mohamad Moosavi, Aakash Naik, Aleyna Beste Ozhan, Dieter Plessers, Aritra Roy, Fabian Schöppach, Philippe Schwaller, Carla Terboven, Katharina Ueltzen, Yue Wu, Shang Zhu, Jan Janssen, Calvin Li, Ian Foster, Ben Blaiszik
Large language models (LLMs) are reshaping many aspects of materials science and chemistry research, enabling advances in molecular property prediction, materials design, scientific automation, knowledge extraction, and more. Recent developments demonstrate that the latest class of models are able to integrate structured and unstructured data, assist in hypothesis generation, and streamline research workflows. To explore the frontier of LLM capabilities across the research lifecycle, we review applications of LLMs through 32 total projects developed during the second annual LLM hackathon for applications in materials science and chemistry, a global hybrid event. These projects spanned seven key research areas: (1) molecular and material property prediction, (2) molecular and material design, (3) automation and novel interfaces, (4) scientific communication and education, (5) research data management and automation, (6) hypothesis generation and evaluation, and (7) knowledge extraction and reasoning from the scientific literature. Collectively, these applications illustrate how LLMs serve as versatile predictive models, platforms for rapid prototyping of domain-specific tools, and much more. In particular, improvements in both open source and proprietary LLM performance through the addition of reasoning, additional training data, and new techniques have expanded effectiveness, particularly in low-data environments and interdisciplinary research. As LLMs continue to improve, their integration into scientific workflows presents both new opportunities and new challenges, requiring ongoing exploration, continued refinement, and further research to address reliability, interpretability, and reproducibility.
{"title":"32 examples of LLM applications in materials science and chemistry: towards automation, assistants, agents, and accelerated scientific discovery.","authors":"Yoel Zimmermann, Adib Bazgir, Alexander Al-Feghali, Mehrad Ansari, Joshua Bocarsly, L Catherine Brinson, Yuan Chiang, Defne Circi, Min-Hsueh Chiu, Nathan Daelman, Matthew L Evans, Abhijeet S Gangan, Janine George, Hassan Harb, Ghazal Khalighinejad, Sartaaj Takrim Khan, Sascha Klawohn, Magdalena Lederbauer, Soroush Mahjoubi, Bernadette Mohr, Seyed Mohamad Moosavi, Aakash Naik, Aleyna Beste Ozhan, Dieter Plessers, Aritra Roy, Fabian Schöppach, Philippe Schwaller, Carla Terboven, Katharina Ueltzen, Yue Wu, Shang Zhu, Jan Janssen, Calvin Li, Ian Foster, Ben Blaiszik","doi":"10.1088/2632-2153/ae011a","DOIUrl":"10.1088/2632-2153/ae011a","url":null,"abstract":"<p><p>Large language models (LLMs) are reshaping many aspects of materials science and chemistry research, enabling advances in molecular property prediction, materials design, scientific automation, knowledge extraction, and more. Recent developments demonstrate that the latest class of models are able to integrate structured and unstructured data, assist in hypothesis generation, and streamline research workflows. To explore the frontier of LLM capabilities across the research lifecycle, we review applications of LLMs through 32 total projects developed during the second annual LLM hackathon for applications in materials science and chemistry, a global hybrid event. These projects spanned seven key research areas: (1) molecular and material property prediction, (2) molecular and material design, (3) automation and novel interfaces, (4) scientific communication and education, (5) research data management and automation, (6) hypothesis generation and evaluation, and (7) knowledge extraction and reasoning from the scientific literature. Collectively, these applications illustrate how LLMs serve as versatile predictive models, platforms for rapid prototyping of domain-specific tools, and much more. In particular, improvements in both open source and proprietary LLM performance through the addition of reasoning, additional training data, and new techniques have expanded effectiveness, particularly in low-data environments and interdisciplinary research. As LLMs continue to improve, their integration into scientific workflows presents both new opportunities and new challenges, requiring ongoing exploration, continued refinement, and further research to address reliability, interpretability, and reproducibility.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 3","pages":"030701"},"PeriodicalIF":4.6,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12492978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145233532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-30Epub Date: 2025-08-01DOI: 10.1088/2632-2153/adf375
Mathilde Papillon, Sophia Sanborn, Johan Mathe, Louisa Cornelis, Abby Bertics, Domas Buracas, Hansen J Lillemark, Christian Shewmake, Fatih Dinc, Xavier Pennec, Nina Miolane
The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently non-Euclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.
{"title":"Beyond Euclid: an illustrated guide to modern machine learning with geometric, topological, and algebraic structures.","authors":"Mathilde Papillon, Sophia Sanborn, Johan Mathe, Louisa Cornelis, Abby Bertics, Domas Buracas, Hansen J Lillemark, Christian Shewmake, Fatih Dinc, Xavier Pennec, Nina Miolane","doi":"10.1088/2632-2153/adf375","DOIUrl":"10.1088/2632-2153/adf375","url":null,"abstract":"<p><p>The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently non-Euclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the algebraic transformations describing symmetries of physical systems. Extracting knowledge from such non-Euclidean data necessitates a broader mathematical perspective. Echoing the 19th-century revolutions that gave rise to non-Euclidean geometry, an emerging line of research is redefining modern machine learning with non-Euclidean structures. Its goal: generalizing classical methods to unconventional data types with geometry, topology, and algebra. In this review, we provide an accessible gateway to this fast-growing field and propose a graphical taxonomy that integrates recent advances into an intuitive unified framework. We subsequently extract insights into current challenges and highlight exciting opportunities for future development in this field.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 3","pages":"031002"},"PeriodicalIF":4.6,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12315666/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144776367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-30Epub Date: 2025-04-16DOI: 10.1088/2632-2153/adc970
Dan Nguyen, Anjali Balagopal, Ti Bai, Michael Dohopolski, Mu-Han Lin, Steve Jiang
Radiotherapy treatment planning requires segmenting anatomical structures in various styles, influenced by guidelines, protocols, preferences, or dose planning needs. Deep learning-based auto-segmentation models, trained on anatomical definitions, may not match local clinicians' styles at new institutions. Adapting these models can be challenging without sufficient resources. We hypothesize that consistent differences between segmentation styles and anatomical definitions can be learned from initial patients and applied to pre-trained models for more precise segmentation. We propose a Prior-guided deep difference meta-learner (DDL) to learn and adapt these differences. We collected data from 440 patients for model development and 30 for testing. The dataset includes contours of the prostate clinical target volume (CTV), parotid, and rectum. We developed a deep learning framework that segments new images with a matching style using example styles as a prior, without model retraining. The pre-trained segmentation models were adapted to three different clinician styles for post-operative CTV for prostate, parotid gland, and rectum segmentation. We tested the model's ability to learn unseen styles and compared its performance with transfer learning, using varying amounts of prior patient style data (0-10 patients). Performance was quantitatively evaluated using dice similarity coefficient (DSC) and Hausdorff distance. With exposure to only three patients for the model, the average DSC (%) improved from 78.6, 71.9, 63.0, 69.6, 52.2 and 46.3-84.4, 77.8, 73.0, 77.8, 70.5, 68.1, for CTVstyle1, CTVstyle2, CTVstyle3, Parotidsuperficial, Rectumsuperior, and Rectumposterior, respectively. The proposed Prior-guided DDL is a fast and effortless network for adapting a structure to new styles. The improved segmentation accuracy may result in reduced contour editing time, providing a more efficient and streamlined clinical workflow.
{"title":"Prior guided deep difference meta-learner for fast adaptation to stylized segmentation.","authors":"Dan Nguyen, Anjali Balagopal, Ti Bai, Michael Dohopolski, Mu-Han Lin, Steve Jiang","doi":"10.1088/2632-2153/adc970","DOIUrl":"https://doi.org/10.1088/2632-2153/adc970","url":null,"abstract":"<p><p>Radiotherapy treatment planning requires segmenting anatomical structures in various styles, influenced by guidelines, protocols, preferences, or dose planning needs. Deep learning-based auto-segmentation models, trained on anatomical definitions, may not match local clinicians' styles at new institutions. Adapting these models can be challenging without sufficient resources. We hypothesize that consistent differences between segmentation styles and anatomical definitions can be learned from initial patients and applied to pre-trained models for more precise segmentation. We propose a Prior-guided deep difference meta-learner (DDL) to learn and adapt these differences. We collected data from 440 patients for model development and 30 for testing. The dataset includes contours of the prostate clinical target volume (CTV), parotid, and rectum. We developed a deep learning framework that segments new images with a matching style using example styles as a prior, without model retraining. The pre-trained segmentation models were adapted to three different clinician styles for post-operative CTV for prostate, parotid gland, and rectum segmentation. We tested the model's ability to learn unseen styles and compared its performance with transfer learning, using varying amounts of prior patient style data (0-10 patients). Performance was quantitatively evaluated using dice similarity coefficient (DSC) and Hausdorff distance. With exposure to only three patients for the model, the average DSC (%) improved from 78.6, 71.9, 63.0, 69.6, 52.2 and 46.3-84.4, 77.8, 73.0, 77.8, 70.5, 68.1, for CTV<sub>style1</sub>, CTV<sub>style2</sub>, CTV<sub>style3</sub>, Parotid<sub>superficial</sub>, Rectum<sub>superior</sub>, and Rectum<sub>posterior</sub>, respectively. The proposed Prior-guided DDL is a fast and effortless network for adapting a structure to new styles. The improved segmentation accuracy may result in reduced contour editing time, providing a more efficient and streamlined clinical workflow.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 2","pages":"025016"},"PeriodicalIF":6.3,"publicationDate":"2025-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12001319/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144002018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-01Epub Date: 2025-04-07DOI: 10.1088/2632-2153/adc656
Yunxiang Li, Hua-Chieh Shao, Xiaoxue Qian, You Zhang
Diffusion models have demonstrated significant potential in producing high-quality images in medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models often fall short when it comes to faithfully translating medical images. They struggle to accurately preserve anatomical structures, especially when working with unpaired datasets. In this study, we introduce the Frequency Decoupled Diffusion Model (FDDM) for MR-to-CT conversion. The differences between MR and CT images lie in both anatomical structures (e.g., the outlines of organs or bones) and the data distribution (e.g., intensity values and contrast within). Therefore, FDDM first converts anatomical information using an initial conversion module. Then, the converted anatomical information guides a subsequent diffusion model to generate high-quality CT images. Our diffusion model uses a dual-path reverse diffusion process for low-frequency and high-frequency information, achieving a better balance between image quality and anatomical accuracy. We extensively evaluated FDDM using public datasets for brain MR-to-CT and pelvis MR-to-CT translations. The results show that FDDM outperforms other generative adversarial network (GAN)-based, variational autoencoder (VAE)-based, and diffusion-based models. The evaluation metrics included Fréchet Inception Distance (FID), mean absolute error (MAE), mean squared error (MSE), Structural Similarity Index Measure (SSIM), and Dice similarity coefficient (DICE). FDDM achieved the best scores on all metrics for both datasets, particularly excelling in FID, with scores of 25.9 for brain data and 29.2 for pelvis data, significantly outperforming other methods. These results demonstrate that FDDM can generate high-quality target domain images while maintaining the accuracy of translated anatomical structures, thereby facilitating more precise/accurate downstream tasks including anatomy segmentation and radiotherapy planning.
{"title":"FDDM: Unsupervised Medical Image Translation with a Frequency-Decoupled Diffusion Model.","authors":"Yunxiang Li, Hua-Chieh Shao, Xiaoxue Qian, You Zhang","doi":"10.1088/2632-2153/adc656","DOIUrl":"10.1088/2632-2153/adc656","url":null,"abstract":"<p><p>Diffusion models have demonstrated significant potential in producing high-quality images in medical image translation to aid disease diagnosis, localization, and treatment. Nevertheless, current diffusion models often fall short when it comes to faithfully translating medical images. They struggle to accurately preserve anatomical structures, especially when working with unpaired datasets. In this study, we introduce the Frequency Decoupled Diffusion Model (FDDM) for MR-to-CT conversion. The differences between MR and CT images lie in both anatomical structures (e.g., the outlines of organs or bones) and the data distribution (e.g., intensity values and contrast within). Therefore, FDDM first converts anatomical information using an initial conversion module. Then, the converted anatomical information guides a subsequent diffusion model to generate high-quality CT images. Our diffusion model uses a dual-path reverse diffusion process for low-frequency and high-frequency information, achieving a better balance between image quality and anatomical accuracy. We extensively evaluated FDDM using public datasets for brain MR-to-CT and pelvis MR-to-CT translations. The results show that FDDM outperforms other generative adversarial network (GAN)-based, variational autoencoder (VAE)-based, and diffusion-based models. The evaluation metrics included Fréchet Inception Distance (FID), mean absolute error (MAE), mean squared error (MSE), Structural Similarity Index Measure (SSIM), and Dice similarity coefficient (DICE). FDDM achieved the best scores on all metrics for both datasets, particularly excelling in FID, with scores of 25.9 for brain data and 29.2 for pelvis data, significantly outperforming other methods. These results demonstrate that FDDM can generate high-quality target domain images while maintaining the accuracy of translated anatomical structures, thereby facilitating more precise/accurate downstream tasks including anatomy segmentation and radiotherapy planning.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 2","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12826588/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146053995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2025-02-14DOI: 10.1088/2632-2153/adb371
Hengrui Zhao, Biling Wang, Deepkumar Mistry, Jing Wang, Michael Dohopolski, Daniel Yang, Weiguo Lu, Steve Jiang, Dan Nguyen
Introduction: Auto-segmentation of tumor volumes and organs at risk (OARs) is a critical step in cancer radiotherapy treatment planning, where rapid, precise adjustments to treatment plans are required to match the patient anatomy. Although auto-segmentation has been clinically accepted for most OARs, auto-segmentation of tumor volumes, particularly clinical target volumes (CTVs), remains a challenge. This difficulty arises because images alone are often insufficient to capture the necessary information for accurate delineation of microscopic tumor invasion invisible on the image itself.
Methods: We propose a deep learning-based medical image segmentation framework designed to mimic the clinical process of delineating CTVs and OARs. At its core, the model performs precise segmentation of medical images while enhancing accuracy by integrating clinical information in text format. A transformer-based text encoder converts textual clinical data into vectors, which are incorporated into the segmentation process with image features. This integration bridges the gap between traditional automated segmentation methods and clinician-guided, context-rich delineations. The framework's effectiveness is demonstrated through a prostate segmentation example in the context of radiation therapy for localized prostate cancer, where incorporating clinical context significantly impacts the delineation process.
Results: In our experiments, we included additional clinical information potentially influencing clinicians' prostate segmentation. The results show that our proposed method not only outperforms the baseline model, but also surpasses current state-of-the-art methods, with or without clinical contexts. Furthermore, our method demonstrates high performance even with limited data.
Conclusion: This proposed segmentation framework has shown to significantly improve auto-segmentation, particularly for CTVs, in cancer radiotherapy.
{"title":"Medical Image Segmentation Assisted with Clinical Inputs via Language Encoder in A Deep Learning Framework.","authors":"Hengrui Zhao, Biling Wang, Deepkumar Mistry, Jing Wang, Michael Dohopolski, Daniel Yang, Weiguo Lu, Steve Jiang, Dan Nguyen","doi":"10.1088/2632-2153/adb371","DOIUrl":"10.1088/2632-2153/adb371","url":null,"abstract":"<p><strong>Introduction: </strong>Auto-segmentation of tumor volumes and organs at risk (OARs) is a critical step in cancer radiotherapy treatment planning, where rapid, precise adjustments to treatment plans are required to match the patient anatomy. Although auto-segmentation has been clinically accepted for most OARs, auto-segmentation of tumor volumes, particularly clinical target volumes (CTVs), remains a challenge. This difficulty arises because images alone are often insufficient to capture the necessary information for accurate delineation of microscopic tumor invasion invisible on the image itself.</p><p><strong>Methods: </strong>We propose a deep learning-based medical image segmentation framework designed to mimic the clinical process of delineating CTVs and OARs. At its core, the model performs precise segmentation of medical images while enhancing accuracy by integrating clinical information in text format. A transformer-based text encoder converts textual clinical data into vectors, which are incorporated into the segmentation process with image features. This integration bridges the gap between traditional automated segmentation methods and clinician-guided, context-rich delineations. The framework's effectiveness is demonstrated through a prostate segmentation example in the context of radiation therapy for localized prostate cancer, where incorporating clinical context significantly impacts the delineation process.</p><p><strong>Results: </strong>In our experiments, we included additional clinical information potentially influencing clinicians' prostate segmentation. The results show that our proposed method not only outperforms the baseline model, but also surpasses current state-of-the-art methods, with or without clinical contexts. Furthermore, our method demonstrates high performance even with limited data.</p><p><strong>Conclusion: </strong>This proposed segmentation framework has shown to significantly improve auto-segmentation, particularly for CTVs, in cancer radiotherapy.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12509794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281319","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-01Epub Date: 2025-01-23DOI: 10.1088/2632-2153/ada8f3
Hengrui Zhao, Biling Wang, Michael Dohopolski, Ti Bai, Steve Jiang, Dan Nguyen
Introduction: Clinical datasets for training deep learning (DL) models often exhibit high levels of heterogeneity due to differences such as patient characteristics, new medical techniques, and physician preferences. In recent years, hydrogel spacers have been used in some prostate cancer patients receiving radiotherapy to separate the prostate and the rectum to better spare the rectum while achieving adequate dose coverage on the prostate. However, this substantially affects the CT image appearance, which downstream reduced the contouring accuracy of auto-segmentation algorithms. This leads to highly heterogeneous dataset.
Methods: To address this issue, we propose to identify underlying clusters within the dataset and use the cluster labels for segmentation. We collected a clinical dataset of 909 patients, including those with two types of hydrogel spacers and those without. First, we trained a DL model to locate the prostate and limit our field of view to the local area surrounding the prostate and rectum. We then used Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction and employed k-means clustering to assign each patient to a cluster. To leverage this clustered data, we propose a text-guided segmentation model, CLIP-UNet, which encodes the cluster information using a text encoder and combines the encoded text information with image features for segmentation.
Results: The UMAP results indicated up to three clusters within the dataset. CLIP-UNet with cluster information achieved a Dice score of 86.2% compared to 84.4% from the baseline UNet. Additionally, CLIP-UNet outperforms other state-of-the-art models with or without cluster information.
Conclusion: Automatic clustering assisted by deep learning can reveal hidden data clusters in clinical datasets, and CLIP-UNet effectively utilizes clustered labels and achieves higher performance.
{"title":"Deep Unsupervised Clustering for Prostate Auto-segmentation With and Without Hydrogel Spacer.","authors":"Hengrui Zhao, Biling Wang, Michael Dohopolski, Ti Bai, Steve Jiang, Dan Nguyen","doi":"10.1088/2632-2153/ada8f3","DOIUrl":"10.1088/2632-2153/ada8f3","url":null,"abstract":"<p><strong>Introduction: </strong>Clinical datasets for training deep learning (DL) models often exhibit high levels of heterogeneity due to differences such as patient characteristics, new medical techniques, and physician preferences. In recent years, hydrogel spacers have been used in some prostate cancer patients receiving radiotherapy to separate the prostate and the rectum to better spare the rectum while achieving adequate dose coverage on the prostate. However, this substantially affects the CT image appearance, which downstream reduced the contouring accuracy of auto-segmentation algorithms. This leads to highly heterogeneous dataset.</p><p><strong>Methods: </strong>To address this issue, we propose to identify underlying clusters within the dataset and use the cluster labels for segmentation. We collected a clinical dataset of 909 patients, including those with two types of hydrogel spacers and those without. First, we trained a DL model to locate the prostate and limit our field of view to the local area surrounding the prostate and rectum. We then used Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction and employed k-means clustering to assign each patient to a cluster. To leverage this clustered data, we propose a text-guided segmentation model, CLIP-UNet, which encodes the cluster information using a text encoder and combines the encoded text information with image features for segmentation.</p><p><strong>Results: </strong>The UMAP results indicated up to three clusters within the dataset. CLIP-UNet with cluster information achieved a Dice score of 86.2% compared to 84.4% from the baseline UNet. Additionally, CLIP-UNet outperforms other state-of-the-art models with or without cluster information.</p><p><strong>Conclusion: </strong>Automatic clustering assisted by deep learning can reveal hidden data clusters in clinical datasets, and CLIP-UNet effectively utilizes clustered labels and achieves higher performance.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"6 1","pages":""},"PeriodicalIF":4.6,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12509332/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145281229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-01Epub Date: 2024-10-11DOI: 10.1088/2632-2153/ad829e
Shunyu Yan, Austen Maniscalco, Biling Wang, Dan Nguyen, Steve Jiang, Chenyang Shen
In online adaptive radiotherapy (ART), quick computation-based secondary dose verification is crucial for ensuring the quality of ART plans while the patient is positioned on the treatment couch. However, traditional dose verification algorithms are generally time-consuming, reducing the efficiency of ART workflow. This study aims to develop an ultra-fast deep-learning (DL) based secondary dose verification algorithm to accurately estimate dose distributions using computed tomography (CT) and fluence maps (FMs). We integrated FMs into the CT image domain by explicitly resolving the geometry of treatment delivery. For each gantry angle, an FM was constructed based on the optimized multi-leaf collimator apertures and corresponding monitoring units. To effectively encode treatment beam configuration, the constructed FMs were back-projected to cm away from the isocenter with respect to the exact geometry of the treatment machines. Then, a 3D U-Net was utilized to take the integrated CT and FM volume as input to estimate dose. Training and validation were performed on prostate cancer cases, with an additional testing cases for independent evaluation of model performance. The proposed model can estimate dose in ∼ ms for each patient. The average γ passing rate ( , threshold) for the estimated dose was 99.9% ± 0.15% on testing patients. The mean dose differences for the planning target volume and organs at risk were and , respectively. We have developed a geometry-resolved DL framework for accurate dose estimation and demonstrated its potential in real-time online ART doses verification.
{"title":"Quality assurance for online adaptive radiotherapy: a secondary dose verification model with geometry-encoded U-Net.","authors":"Shunyu Yan, Austen Maniscalco, Biling Wang, Dan Nguyen, Steve Jiang, Chenyang Shen","doi":"10.1088/2632-2153/ad829e","DOIUrl":"10.1088/2632-2153/ad829e","url":null,"abstract":"<p><p>In online adaptive radiotherapy (ART), quick computation-based secondary dose verification is crucial for ensuring the quality of ART plans while the patient is positioned on the treatment couch. However, traditional dose verification algorithms are generally time-consuming, reducing the efficiency of ART workflow. This study aims to develop an ultra-fast deep-learning (DL) based secondary dose verification algorithm to accurately estimate dose distributions using computed tomography (CT) and fluence maps (FMs). We integrated FMs into the CT image domain by explicitly resolving the geometry of treatment delivery. For each gantry angle, an FM was constructed based on the optimized multi-leaf collimator apertures and corresponding monitoring units. To effectively encode treatment beam configuration, the constructed FMs were back-projected to <math><mrow><mn>30</mn></mrow> </math> cm away from the isocenter with respect to the exact geometry of the treatment machines. Then, a 3D U-Net was utilized to take the integrated CT and FM volume as input to estimate dose. Training and validation were performed on <math><mrow><mn>381</mn></mrow> </math> prostate cancer cases, with an additional <math><mrow><mn>40</mn></mrow> </math> testing cases for independent evaluation of model performance. The proposed model can estimate dose in ∼ <math><mrow><mn>15</mn></mrow> </math> ms for each patient. The average <i>γ</i> passing rate ( <math><mrow><mn>3</mn> <mi>%</mi> <mrow><mo>/</mo></mrow> <mn>2</mn> <mstyle></mstyle> <mrow><mtext>mm</mtext></mrow> </mrow> </math> , <math><mrow><mn>10</mn> <mi>%</mi></mrow> </math> threshold) for the estimated dose was 99.9% ± 0.15% on testing patients. The mean dose differences for the planning target volume and organs at risk were <math><mrow><mn>0.07</mn> <mi>%</mi> <mo>±</mo> <mn>0.34</mn> <mi>%</mi></mrow> </math> and <math><mrow><mn>0.48</mn> <mi>%</mi> <mo>±</mo> <mn>0.72</mn> <mi>%</mi></mrow> </math> , respectively. We have developed a geometry-resolved DL framework for accurate dose estimation and demonstrated its potential in real-time online ART doses verification.</p>","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"5 4","pages":"045013"},"PeriodicalIF":6.3,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467776/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142476443","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}