首页 > 最新文献

IS&T International Symposium on Electronic Imaging最新文献

英文 中文
Scale-up Unlearnable Examples Learning with High-Performance Computing. 用高性能计算学习不可学习的例子。
Pub Date : 2025-01-01 DOI: 10.2352/ei.2025.37.12.hpci-184
Yanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo

Recent advancements in AI models, like ChatGPT, are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI-driven diagnostic tools hosted on online platforms, there is a risk that medical imaging data may be repurposed for future AI training without explicit consent, spotlighting critical privacy and intellectual property concerns around healthcare data usage. Addressing these privacy challenges, a novel approach known as Unlearnable Examples (UEs) has been introduced, aiming to make data unlearnable to deep learning models. A prominent method within this area, called Unlearnable Clustering (UC), has shown improved UE performance with larger batch sizes but was previously limited by computational resources (e.g., a single workstation). To push the boundaries of UE performance with theoretically unlimited resources, we scaled up UC learning across various datasets using Distributed Data Parallel (DDP) training on the Summit supercomputer. Our goal was to examine UE efficacy at high-performance computing (HPC) levels to prevent unauthorized learning and enhance data security, particularly exploring the impact of batch size on UE's unlearnability. Utilizing the robust computational capabilities of the Summit, extensive experiments were conducted on diverse datasets such as Pets, MedMNist, Flowers, and Flowers102. Our findings reveal that both overly large and overly small batch sizes can lead to performance instability and affect accuracy. However, the relationship between batch size and unlearnability varied across datasets, highlighting the necessity for tailored batch size strategies to achieve optimal data protection. The use of Summit's high-performance GPUs, along with the efficiency of the DDP framework, facilitated rapid updates of model parameters and consistent training across nodes. Our results underscore the critical role of selecting appropriate batch sizes based on the specific characteristics of each dataset to prevent learning and ensure data security in deep learning applications. The source code is publicly available at https://github.com/hrlblab/UE_HPC.

人工智能模型(如ChatGPT)的最新进展是为了保留用户交互,这可能会无意中包含敏感的医疗数据。在医疗保健领域,特别是当放射科医生使用托管在在线平台上的人工智能驱动的诊断工具时,存在一种风险,即医疗成像数据可能会在未经明确同意的情况下被重新用于未来的人工智能培训,这突显了医疗保健数据使用方面的关键隐私和知识产权问题。为了解决这些隐私挑战,引入了一种称为不可学习示例(ue)的新方法,旨在使数据对深度学习模型不可学习。该领域的一个突出方法,称为不可学习聚类(UC),在更大的批处理规模下显示出改进的UE性能,但以前受计算资源(例如,单个工作站)的限制。为了在理论上无限的资源下突破UE性能的界限,我们在Summit超级计算机上使用分布式数据并行(DDP)训练扩展了UC学习在各种数据集上的规模。我们的目标是在高性能计算(HPC)级别检查UE的有效性,以防止未经授权的学习并增强数据安全性,特别是探索批处理大小对UE不可学习性的影响。利用Summit强大的计算能力,在不同的数据集(如Pets、medmist、Flowers和Flowers102)上进行了广泛的实验。我们的研究结果表明,过大和过小的批量大小都可能导致性能不稳定并影响准确性。然而,批大小和不可学习性之间的关系因数据集而异,突出了定制批大小策略以实现最佳数据保护的必要性。使用Summit的高性能gpu,以及DDP框架的效率,促进了模型参数的快速更新和跨节点的一致训练。我们的研究结果强调了在深度学习应用中,根据每个数据集的特定特征选择适当的批大小以防止学习和确保数据安全的关键作用。源代码可在https://github.com/hrlblab/UE_HPC上公开获得。
{"title":"Scale-up Unlearnable Examples Learning with High-Performance Computing.","authors":"Yanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo","doi":"10.2352/ei.2025.37.12.hpci-184","DOIUrl":"10.2352/ei.2025.37.12.hpci-184","url":null,"abstract":"<p><p>Recent advancements in AI models, like ChatGPT, are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI-driven diagnostic tools hosted on online platforms, there is a risk that medical imaging data may be repurposed for future AI training without explicit consent, spotlighting critical privacy and intellectual property concerns around healthcare data usage. Addressing these privacy challenges, a novel approach known as Unlearnable Examples (UEs) has been introduced, aiming to make data unlearnable to deep learning models. A prominent method within this area, called Unlearnable Clustering (UC), has shown improved UE performance with larger batch sizes but was previously limited by computational resources (e.g., a single workstation). To push the boundaries of UE performance with theoretically unlimited resources, we scaled up UC learning across various datasets using Distributed Data Parallel (DDP) training on the Summit supercomputer. Our goal was to examine UE efficacy at high-performance computing (HPC) levels to prevent unauthorized learning and enhance data security, particularly exploring the impact of batch size on UE's unlearnability. Utilizing the robust computational capabilities of the Summit, extensive experiments were conducted on diverse datasets such as Pets, MedMNist, Flowers, and Flowers102. Our findings reveal that both overly large and overly small batch sizes can lead to performance instability and affect accuracy. However, the relationship between batch size and unlearnability varied across datasets, highlighting the necessity for tailored batch size strategies to achieve optimal data protection. The use of Summit's high-performance GPUs, along with the efficiency of the DDP framework, facilitated rapid updates of model parameters and consistent training across nodes. Our results underscore the critical role of selecting appropriate batch sizes based on the specific characteristics of each dataset to prevent learning and ensure data security in deep learning applications. The source code is publicly available at https://github.com/hrlblab/UE_HPC.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit. PySpatial:一个高速全幻灯片图像病理工具包。
Pub Date : 2025-01-01 DOI: 10.2352/EI.2025.37.12.HPCI-177
Yuechen Yang, Yu Wang, Tianyuan Yao, Ruining Deng, Mengmeng Yin, Shilin Zhao, Haichun Yang, Yuankai Huo

Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples[1]. However, traditional feature extraction pipelines based on tools like CellProfiler[2] often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI[4]. To address these challenges, we present PySpatial, a high-speed pathomics toolkit specifically designed for WSI-level analysis. PySpatial streamlines the conventional pipeline by directly operating on computational regions of interest, reducing redundant processing steps. Utilizing rtree-based spatial indexing and matrix-based computation, PySpatial efficiently maps and processes computational regions, significantly accelerating feature extraction while maintaining high accuracy. Our experiments on two datasets-Perivascular Epithelioid Cell (PEC) and data from the Kidney Precision Medicine Project (KPMP) [13]-demonstrate substantial performance improvements. For smaller and sparse objects in PEC datasets, PySpatial achieves nearly a 10-fold speedup compared to standard CellProfiler pipelines. For larger objects, such as glomeruli and arteries in KPMP datasets, PySpatial achieves a 2-fold speedup. These results highlight PySpatial's potential to handle large-scale WSI analysis with enhanced efficiency and accuracy, paving the way for broader applications in digital pathology.

全幻灯片图像(WSI)分析在现代数字病理学中起着至关重要的作用,可以从组织样本中大规模提取特征。然而,基于CellProfiler[2]等工具的传统特征提取管道通常涉及冗长的工作流程,需要将WSI分割成补丁,在补丁级别提取特征,然后再映射回原始的WSI[4]。为了应对这些挑战,我们提出了PySpatial,这是一个专门为wsi级别分析设计的高速病理性工具包。pyspace通过直接在感兴趣的计算区域上操作简化了传统的管道,减少了冗余的处理步骤。利用基于树的空间索引和基于矩阵的计算,PySpatial有效地映射和处理计算区域,在保持高精度的同时显著加速特征提取。我们在两个数据集上的实验——血管周围上皮样细胞(PEC)和肾脏精准医学项目(KPMP)[13]的数据——证明了显著的性能改进。对于PEC数据集中较小和稀疏的对象,PySpatial实现了比标准CellProfiler管道近10倍的加速。对于较大的对象,如KPMP数据集中的肾小球和动脉,PySpatial实现了2倍的加速。这些结果突出了PySpatial处理大规模WSI分析的潜力,提高了效率和准确性,为数字病理学的更广泛应用铺平了道路。
{"title":"PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit.","authors":"Yuechen Yang, Yu Wang, Tianyuan Yao, Ruining Deng, Mengmeng Yin, Shilin Zhao, Haichun Yang, Yuankai Huo","doi":"10.2352/EI.2025.37.12.HPCI-177","DOIUrl":"10.2352/EI.2025.37.12.HPCI-177","url":null,"abstract":"<p><p>Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples[1]. However, traditional feature extraction pipelines based on tools like CellProfiler[2] often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI[4]. To address these challenges, we present PySpatial, a high-speed pathomics toolkit specifically designed for WSI-level analysis. PySpatial streamlines the conventional pipeline by directly operating on computational regions of interest, reducing redundant processing steps. Utilizing rtree-based spatial indexing and matrix-based computation, PySpatial efficiently maps and processes computational regions, significantly accelerating feature extraction while maintaining high accuracy. Our experiments on two datasets-Perivascular Epithelioid Cell (PEC) and data from the Kidney Precision Medicine Project (KPMP) [13]-demonstrate substantial performance improvements. For smaller and sparse objects in PEC datasets, PySpatial achieves nearly a 10-fold speedup compared to standard CellProfiler pipelines. For larger objects, such as glomeruli and arteries in KPMP datasets, PySpatial achieves a 2-fold speedup. These results highlight PySpatial's potential to handle large-scale WSI analysis with enhanced efficiency and accuracy, paving the way for broader applications in digital pathology.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging. 数字病理学的任意分割模型 (SAM):评估全切片成像的零镜头分割。
Pub Date : 2025-01-01 Epub Date: 2025-02-01 DOI: 10.2352/EI.2025.37.14.COIMG-132
Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W Remedios, Shunxing Bao, Bennett A Landman, Lee E Wheless, Lori A Coburn, Keith T Wilson, Yaohong Wang, Shilin Zhao, Agnes B Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo

The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation.

Core results: The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological segmentation tasks might help the model to achieve better performance in dense object segmentation.

作为图像分割的基础模型,提出了分段任意模型(SAM)。在1100万张授权且尊重隐私的图像上,使用超过10亿个掩模训练了提示分割模型。该模型支持零镜头图像分割与各种分割提示(例如,点,框,蒙版)。这使得SAM对医学图像分析具有吸引力,特别是对于训练数据很少的数字病理学。在本研究中,我们评估了SAM模型在全切片成像(WSI)上具有代表性的分割任务(1)肿瘤分割,(2)非肿瘤组织分割,(3)细胞核分割)上的零射击分割性能。核心结果:结果表明,零射击SAM模型对大型连接对象的分割性能显著。然而,对于密集的实例对象分割,即使在每个图像上有20个提示(点击/框),它也不能始终达到令人满意的性能。我们还总结了数字病理学的局限性:(1)图像分辨率,(2)多重尺度,(3)提示选择,(4)模型微调。在未来,对来自下游病理分割任务的图像进行少量微调可能有助于模型在密集目标分割中获得更好的性能。
{"title":"Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging.","authors":"Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W Remedios, Shunxing Bao, Bennett A Landman, Lee E Wheless, Lori A Coburn, Keith T Wilson, Yaohong Wang, Shilin Zhao, Agnes B Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo","doi":"10.2352/EI.2025.37.14.COIMG-132","DOIUrl":"10.2352/EI.2025.37.14.COIMG-132","url":null,"abstract":"<p><p>The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation.</p><p><strong>Core results: </strong>The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological segmentation tasks might help the model to achieve better performance in dense object segmentation.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11971729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Write Sentence with Images: Revisit the Large Vision Model with Visual Sentence. 用图像写句子:用视觉句子重新审视大视觉模型。
Pub Date : 2025-01-01 DOI: 10.2352/ei.2025.37.12.hpci-172
Quan Liu, Can Cui, Ruining Deng, Tianyuan Yao, Yuechen Yang, Yucheng Tang, Yuankai Huo

This paper introduces a novel framework for generating high-quality images from "visual sentences" extracted from video sequences. By combining a lightweight autoregressive model with a Vector Quantized Generative Adversarial Network (VQGAN), our approach achieves a favorable trade-off between computational efficiency and image fidelity. Unlike conventional methods that require substantial resources, the proposed framework efficiently captures sequential patterns in partially annotated frames and synthesizes coherent, contextually accurate images. Empirical results demonstrate that our method not only attains state-of-the-art performance on various benchmarks but also reduces inference overhead, making it well-suited for real-time and resource-constrained environments. Furthermore, we explore its applicability to medical image analysis, showcasing robust denoising, brightness adjustment, and segmentation capabilities. Overall, our contributions highlight an effective balance between performance and efficiency, paving the way for scalable and adaptive image generation across diverse multimedia domains.

本文介绍了一种从视频序列中提取的“视觉句子”生成高质量图像的新框架。通过将轻量级自回归模型与矢量量化生成对抗网络(VQGAN)相结合,我们的方法在计算效率和图像保真度之间实现了良好的权衡。与需要大量资源的传统方法不同,该框架有效地捕获了部分注释帧中的顺序模式,并合成了连贯的、上下文准确的图像。实证结果表明,我们的方法不仅在各种基准测试中获得了最先进的性能,而且还减少了推理开销,使其非常适合实时和资源受限的环境。此外,我们探讨了它在医学图像分析中的适用性,展示了鲁棒的去噪、亮度调节和分割能力。总的来说,我们的贡献突出了性能和效率之间的有效平衡,为跨不同多媒体领域的可扩展和自适应图像生成铺平了道路。
{"title":"Write Sentence with Images: Revisit the Large Vision Model with Visual Sentence.","authors":"Quan Liu, Can Cui, Ruining Deng, Tianyuan Yao, Yuechen Yang, Yucheng Tang, Yuankai Huo","doi":"10.2352/ei.2025.37.12.hpci-172","DOIUrl":"10.2352/ei.2025.37.12.hpci-172","url":null,"abstract":"<p><p>This paper introduces a novel framework for generating high-quality images from \"visual sentences\" extracted from video sequences. By combining a lightweight autoregressive model with a Vector Quantized Generative Adversarial Network (VQGAN), our approach achieves a favorable trade-off between computational efficiency and image fidelity. Unlike conventional methods that require substantial resources, the proposed framework efficiently captures sequential patterns in partially annotated frames and synthesizes coherent, contextually accurate images. Empirical results demonstrate that our method not only attains state-of-the-art performance on various benchmarks but also reduces inference overhead, making it well-suited for real-time and resource-constrained environments. Furthermore, we explore its applicability to medical image analysis, showcasing robust denoising, brightness adjustment, and segmentation capabilities. Overall, our contributions highlight an effective balance between performance and efficiency, paving the way for scalable and adaptive image generation across diverse multimedia domains.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis. mTREE:用于整个幻灯片图像分析的多层次文本引导表示端到端学习。
Pub Date : 2025-01-01 DOI: 10.2352/ei.2025.37.12.hpci-183
Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Yuechen Yang, Vishwesh Nath, Bingshan Li, You Chen, Yucheng Tang, Yuankai Huo

Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas ("global-to-local") and the development of a WSI-level image-text representation ("local-to-global") - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. Code and trained models are made available at https://github.com/hrlblab/mTREE.

多模式学习熟练地集成了视觉和文本数据,但其在组织病理学图像和文本分析中的应用仍然具有挑战性,特别是对于像千兆像素整张幻灯片图像(wsi)这样的大型高分辨率图像。当前的方法通常依赖于手动区域标记或多阶段学习来将局部表示(例如,补丁级)组装成全局特征(例如,幻灯片级)。然而,目前还没有一种有效的方法将多尺度图像表示与文本数据无缝地集成到一个端到端的过程中。在本研究中,我们引入了多层次文本引导表示端到端学习(mTREE)。这种新颖的文本引导方法通过利用伴随文本病理信息的信息有效地捕获多尺度WSI表示。mTREE创新地将关键区域的本地化(“全局到局部”)和wsi级别图像-文本表示(“局部到全局”)的开发结合为统一的端到端学习框架。在该模型中,文本信息有双重作用:首先,作为注意图,准确识别关键区域;其次,作为将文本特征整合到图像的综合表示中的管道。我们的研究通过定量分析证明了mTREE在两个图像相关任务中的有效性:分类和生存预测,显示了它比基线的显著优势。代码和经过训练的模型可在https://github.com/hrlblab/mTREE上获得。
{"title":"mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis.","authors":"Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Yuechen Yang, Vishwesh Nath, Bingshan Li, You Chen, Yucheng Tang, Yuankai Huo","doi":"10.2352/ei.2025.37.12.hpci-183","DOIUrl":"10.2352/ei.2025.37.12.hpci-183","url":null,"abstract":"<p><p>Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas (<b>\"global-to-local\"</b>) and the development of a WSI-level image-text representation (<b>\"local-to-global\"</b>) - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. Code and trained models are made available at https://github.com/hrlblab/mTREE.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Egocentric Boundaries on Distinguishing Colliding and Non-Colliding Pedestrians while Walking in a Virtual Environment. 在虚拟环境中行走时,区分碰撞和非碰撞行人的自我中心界限。
Pub Date : 2024-01-01 DOI: 10.2352/EI.2024.36.11.HVEI-214
Alex D Hwang, Jaehyun Jung, Alex Bowers, Eli Peli

Avoiding person-to-person collisions is critical for visual field loss patients. Any intervention claiming to improve the safety of such patients should empirically demonstrate its efficacy. To design a VR mobility testing platform presenting multiple pedestrians, a distinction between colliding and non-colliding pedestrians must be clearly defined. We measured nine normally sighted subjects' collision envelopes (CE; an egocentric boundary distinguishing collision and non-collision) and found it changes based on the approaching pedestrian's bearing angle and speed. For person-to-person collision events for the VR mobility testing platform, non-colliding pedestrians should not evade the CE.

避免人与人之间的碰撞对视野缺损患者至关重要。任何声称能提高这类患者安全的干预措施都应通过经验证明其有效性。要设计一个呈现多个行人的 VR 移动性测试平台,必须明确区分碰撞和非碰撞行人。我们测量了九名视力正常的受试者的碰撞包络(CE,一种区分碰撞和非碰撞的自我中心边界),发现它会根据接近行人的方位角和速度发生变化。对于 VR 移动性测试平台上的人与人碰撞事件,非碰撞行人不应避开 CE。
{"title":"Egocentric Boundaries on Distinguishing Colliding and Non-Colliding Pedestrians while Walking in a Virtual Environment.","authors":"Alex D Hwang, Jaehyun Jung, Alex Bowers, Eli Peli","doi":"10.2352/EI.2024.36.11.HVEI-214","DOIUrl":"10.2352/EI.2024.36.11.HVEI-214","url":null,"abstract":"<p><p>Avoiding person-to-person collisions is critical for visual field loss patients. Any intervention claiming to improve the safety of such patients should empirically demonstrate its efficacy. To design a VR mobility testing platform presenting multiple pedestrians, a distinction between colliding and non-colliding pedestrians must be clearly defined. We measured nine normally sighted subjects' collision envelopes (CE; an egocentric boundary distinguishing collision and non-collision) and found it changes based on the approaching pedestrian's bearing angle and speed. For person-to-person collision events for the VR mobility testing platform, non-colliding pedestrians should not evade the CE.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"36 ","pages":"2141-2148"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
34th Annual Stereoscopic Displays and Applications Conference - Introduction 第三十四届立体显示及应用年会简介
Pub Date : 2023-01-16 DOI: 10.2352/ei.2023.35.2.sda-b02
Andrew J. Woods, Nicolas S. Holliman, Takashi Kawai, Bjorn Sommer
This manuscript serves as an introduction to the conference proceedings for the 34th annual Stereoscopic Displays and Applications conference and also provides an overview of the conference.
这份手稿作为第34届年度立体显示和应用会议的会议记录的介绍,也提供了会议的概述。
{"title":"34th Annual Stereoscopic Displays and Applications Conference - Introduction","authors":"Andrew J. Woods, Nicolas S. Holliman, Takashi Kawai, Bjorn Sommer","doi":"10.2352/ei.2023.35.2.sda-b02","DOIUrl":"https://doi.org/10.2352/ei.2023.35.2.sda-b02","url":null,"abstract":"This manuscript serves as an introduction to the conference proceedings for the 34th annual Stereoscopic Displays and Applications conference and also provides an overview of the conference.","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135693993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wearable multispectral imaging and telemetry at edge 边缘可穿戴多光谱成像和遥测技术
Pub Date : 2023-01-16 DOI: 10.2352/ei.2023.35.7.image-278
Yang Cai, Mel Siegel
We present a head-mounted holographic display system for thermographic image overlay, biometric sensing, and wireless telemetry. The system is lightweight and reconfigurable for multiple field applications, including object contour detection and enhancement, breathing rate detection, and telemetry over a mobile phone for peer-to-peer communication and incident command dashboard. Due to the constraints of the limited computing power of an embedded system, we developed a lightweight image processing algorithm for edge detection and breath rate detection, as well as an image compression codec. The system can be integrated into a helmet or personal protection equipment such as a face shield or goggles. It can be applied to firefighting, medical emergency response, and other first-response operations. Finally, we present a case study of "Cold Trailing" for forest fire containment.
我们提出了一种头戴式全息显示系统,用于热成像图像覆盖,生物识别传感和无线遥测。该系统重量轻,可重新配置,适用于多种现场应用,包括物体轮廓检测和增强、呼吸率检测、通过移动电话进行遥测,用于点对点通信和事件指挥仪表板。由于嵌入式系统的计算能力有限,我们开发了一种轻量级的图像处理算法,用于边缘检测和呼吸率检测,以及图像压缩编解码器。该系统可以集成到头盔或个人防护设备中,如面罩或护目镜。它可以应用于消防、医疗应急响应和其他第一反应操作。最后,提出了“冷追踪”在野外森林防火中的应用实例。
{"title":"Wearable multispectral imaging and telemetry at edge","authors":"Yang Cai, Mel Siegel","doi":"10.2352/ei.2023.35.7.image-278","DOIUrl":"https://doi.org/10.2352/ei.2023.35.7.image-278","url":null,"abstract":"We present a head-mounted holographic display system for thermographic image overlay, biometric sensing, and wireless telemetry. The system is lightweight and reconfigurable for multiple field applications, including object contour detection and enhancement, breathing rate detection, and telemetry over a mobile phone for peer-to-peer communication and incident command dashboard. Due to the constraints of the limited computing power of an embedded system, we developed a lightweight image processing algorithm for edge detection and breath rate detection, as well as an image compression codec. The system can be integrated into a helmet or personal protection equipment such as a face shield or goggles. It can be applied to firefighting, medical emergency response, and other first-response operations. Finally, we present a case study of \"Cold Trailing\" for forest fire containment.","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135694716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stereoscopic Displays and Applications XXXIV Conference Overview and Papers Program 立体显示和应用第三十四届会议概述和论文程序
Pub Date : 2023-01-16 DOI: 10.2352/ei.2023.35.2.sda-a02
Abstract The Stereoscopic Displays and Applications Conference (SD&A) focuses on developments covering the entire stereoscopic 3D imaging pipeline from capture, processing, and display to perception. The conference brings together practitioners and researchers from industry and academia to facilitate an exchange of current information on stereoscopic imaging topics. The highly popular conference demonstration session provides authors with a perfect additional opportunity to showcase their work. The long-running SD&A 3D Theater Session provides conference attendees with a wonderful opportunity to see how 3D content is being created and exhibited around the world. Publishing your work at SD&A offers excellent exposure—across all publication outlets, SD&A has the highest proportion of papers in the top 100 cited papers in the stereoscopic imaging field (Google Scholar, May 2013).
立体显示与应用会议(SD&A)聚焦于从捕获、处理、显示到感知的整个立体3D成像管道的发展。会议汇集了来自工业界和学术界的从业人员和研究人员,以促进关于立体成像主题的最新信息交流。非常受欢迎的会议演示环节为作者提供了展示其作品的绝佳机会。长期运行的SD&A 3D影院会议为与会者提供了一个绝佳的机会,可以看到3D内容是如何在世界各地创建和展示的。在SD&A发表您的作品可以提供出色的曝光率——在所有出版渠道中,SD&A在立体成像领域的前100篇被引论文中所占比例最高(Google Scholar, 2013年5月)。
{"title":"Stereoscopic Displays and Applications XXXIV Conference Overview and Papers Program","authors":"","doi":"10.2352/ei.2023.35.2.sda-a02","DOIUrl":"https://doi.org/10.2352/ei.2023.35.2.sda-a02","url":null,"abstract":"Abstract The Stereoscopic Displays and Applications Conference (SD&A) focuses on developments covering the entire stereoscopic 3D imaging pipeline from capture, processing, and display to perception. The conference brings together practitioners and researchers from industry and academia to facilitate an exchange of current information on stereoscopic imaging topics. The highly popular conference demonstration session provides authors with a perfect additional opportunity to showcase their work. The long-running SD&A 3D Theater Session provides conference attendees with a wonderful opportunity to see how 3D content is being created and exhibited around the world. Publishing your work at SD&A offers excellent exposure—across all publication outlets, SD&A has the highest proportion of papers in the top 100 cited papers in the stereoscopic imaging field (Google Scholar, May 2013).","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135695210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving the performance of web-streaming by super-resolution upscaling techniques 通过超分辨率升级技术提高网络流媒体的性能
Pub Date : 2023-01-16 DOI: 10.2352/ei.2023.35.3.mobmu-351
Yuriy Reznik, Nabajeet Barman
In recent years, we have seen significant progress in advanced image upscaling techniques, sometimes called super-resolution, ML-based, or AI-based upscaling. Such algorithms are now available not only in form of specialized software but also in drivers and SDKs supplied with modern graphics cards. Upscaling functions in NVIDIA Maxine SDK is one of the recent examples. However, to take advantage of this functionality in video streaming applications, one needs to (a) quantify the impacts of super-resolution techniques on the perceived visual quality, (b) implement video rendering incorporating super-resolution upscaling techniques, and (c) implement new bitrate+resolution adaptation algorithms in streaming players, enabling such players to deliver better quality of experience or better efficiency (e.g. reduce bandwidth usage) or both. Towards this end, in this paper, we propose several techniques that may be helpful to the implementation community. First, we offer a model quantifying the impacts of super resolution upscaling on the perceived quality. Our model is based on the Westerink-Roufs model connecting the true resolution of images/videos to perceived quality, with several additional parameters added, allowing its tuning to specific implementations of super-resolution techniques. We verify this model by using several recent datasets including MOS scores measured for several conventional up-scaling and super-resolution algorithms. Then, we propose an improved adaptation logic for video streaming players, considering video bitrates, encoded video resolutions, player size, and the upscaling method. This improved logic relies on our modified Westerink-Roufs model to predict perceived quality and suggests choices of renditions that would deliver the best quality for given display and upscaling method characteristics. Finally, we study the impacts of the proposed techniques and show that they can deliver practically appreciable results in terms of the expected QoE improvements and bandwidth savings.
近年来,我们看到先进的图像升级技术取得了重大进展,有时被称为超分辨率,基于ml或基于ai的升级技术。这样的算法现在不仅以专门软件的形式存在,而且也存在于现代显卡提供的驱动程序和sdk中。NVIDIA Maxine SDK中的升级功能就是最近的一个例子。然而,要在视频流应用中利用这一功能,需要(a)量化超分辨率技术对感知视觉质量的影响,(b)实现包含超分辨率升级技术的视频渲染,以及(c)在流媒体播放器中实现新的比特率+分辨率自适应算法,使这些播放器能够提供更好的体验质量或更高的效率(例如减少带宽使用)或两者兼有。为此,在本文中,我们提出了几种可能对实现社区有所帮助的技术。首先,我们提供了一个模型来量化超分辨率升级对感知质量的影响。我们的模型基于Westerink-Roufs模型,该模型将图像/视频的真实分辨率与感知质量联系起来,并添加了几个额外的参数,允许其调整到超分辨率技术的特定实现。我们通过使用几个最新的数据集来验证该模型,这些数据集包括几种传统的上尺度和超分辨率算法测量的MOS分数。然后,我们提出了一种改进的视频流播放器自适应逻辑,考虑了视频比特率、编码视频分辨率、播放器大小和升级方法。这种改进的逻辑依赖于我们改进的Westerink-Roufs模型来预测感知质量,并建议选择能够为给定的显示和升级方法特性提供最佳质量的再现。最后,我们研究了所提出的技术的影响,并表明它们可以在预期的QoE改进和带宽节省方面提供实际可观的结果。
{"title":"Improving the performance of web-streaming by super-resolution upscaling techniques","authors":"Yuriy Reznik, Nabajeet Barman","doi":"10.2352/ei.2023.35.3.mobmu-351","DOIUrl":"https://doi.org/10.2352/ei.2023.35.3.mobmu-351","url":null,"abstract":"In recent years, we have seen significant progress in advanced image upscaling techniques, sometimes called super-resolution, ML-based, or AI-based upscaling. Such algorithms are now available not only in form of specialized software but also in drivers and SDKs supplied with modern graphics cards. Upscaling functions in NVIDIA Maxine SDK is one of the recent examples. However, to take advantage of this functionality in video streaming applications, one needs to (a) quantify the impacts of super-resolution techniques on the perceived visual quality, (b) implement video rendering incorporating super-resolution upscaling techniques, and (c) implement new bitrate+resolution adaptation algorithms in streaming players, enabling such players to deliver better quality of experience or better efficiency (e.g. reduce bandwidth usage) or both. Towards this end, in this paper, we propose several techniques that may be helpful to the implementation community. First, we offer a model quantifying the impacts of super resolution upscaling on the perceived quality. Our model is based on the Westerink-Roufs model connecting the true resolution of images/videos to perceived quality, with several additional parameters added, allowing its tuning to specific implementations of super-resolution techniques. We verify this model by using several recent datasets including MOS scores measured for several conventional up-scaling and super-resolution algorithms. Then, we propose an improved adaptation logic for video streaming players, considering video bitrates, encoded video resolutions, player size, and the upscaling method. This improved logic relies on our modified Westerink-Roufs model to predict perceived quality and suggests choices of renditions that would deliver the best quality for given display and upscaling method characteristics. Finally, we study the impacts of the proposed techniques and show that they can deliver practically appreciable results in terms of the expected QoE improvements and bandwidth savings.","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135693966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IS&T International Symposium on Electronic Imaging
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1