Pub Date : 2025-01-01DOI: 10.2352/ei.2025.37.12.hpci-184
Yanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo
Recent advancements in AI models, like ChatGPT, are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI-driven diagnostic tools hosted on online platforms, there is a risk that medical imaging data may be repurposed for future AI training without explicit consent, spotlighting critical privacy and intellectual property concerns around healthcare data usage. Addressing these privacy challenges, a novel approach known as Unlearnable Examples (UEs) has been introduced, aiming to make data unlearnable to deep learning models. A prominent method within this area, called Unlearnable Clustering (UC), has shown improved UE performance with larger batch sizes but was previously limited by computational resources (e.g., a single workstation). To push the boundaries of UE performance with theoretically unlimited resources, we scaled up UC learning across various datasets using Distributed Data Parallel (DDP) training on the Summit supercomputer. Our goal was to examine UE efficacy at high-performance computing (HPC) levels to prevent unauthorized learning and enhance data security, particularly exploring the impact of batch size on UE's unlearnability. Utilizing the robust computational capabilities of the Summit, extensive experiments were conducted on diverse datasets such as Pets, MedMNist, Flowers, and Flowers102. Our findings reveal that both overly large and overly small batch sizes can lead to performance instability and affect accuracy. However, the relationship between batch size and unlearnability varied across datasets, highlighting the necessity for tailored batch size strategies to achieve optimal data protection. The use of Summit's high-performance GPUs, along with the efficiency of the DDP framework, facilitated rapid updates of model parameters and consistent training across nodes. Our results underscore the critical role of selecting appropriate batch sizes based on the specific characteristics of each dataset to prevent learning and ensure data security in deep learning applications. The source code is publicly available at https://github.com/hrlblab/UE_HPC.
{"title":"Scale-up Unlearnable Examples Learning with High-Performance Computing.","authors":"Yanfan Zhu, Issac Lyngaas, Murali Gopalakrishnan Meena, Mary Ellen I Koran, Bradley Malin, Daniel Moyer, Shunxing Bao, Anuj Kapadia, Xiao Wang, Bennett Landman, Yuankai Huo","doi":"10.2352/ei.2025.37.12.hpci-184","DOIUrl":"10.2352/ei.2025.37.12.hpci-184","url":null,"abstract":"<p><p>Recent advancements in AI models, like ChatGPT, are structured to retain user interactions, which could inadvertently include sensitive healthcare data. In the healthcare field, particularly when radiologists use AI-driven diagnostic tools hosted on online platforms, there is a risk that medical imaging data may be repurposed for future AI training without explicit consent, spotlighting critical privacy and intellectual property concerns around healthcare data usage. Addressing these privacy challenges, a novel approach known as Unlearnable Examples (UEs) has been introduced, aiming to make data unlearnable to deep learning models. A prominent method within this area, called Unlearnable Clustering (UC), has shown improved UE performance with larger batch sizes but was previously limited by computational resources (e.g., a single workstation). To push the boundaries of UE performance with theoretically unlimited resources, we scaled up UC learning across various datasets using Distributed Data Parallel (DDP) training on the Summit supercomputer. Our goal was to examine UE efficacy at high-performance computing (HPC) levels to prevent unauthorized learning and enhance data security, particularly exploring the impact of batch size on UE's unlearnability. Utilizing the robust computational capabilities of the Summit, extensive experiments were conducted on diverse datasets such as Pets, MedMNist, Flowers, and Flowers102. Our findings reveal that both overly large and overly small batch sizes can lead to performance instability and affect accuracy. However, the relationship between batch size and unlearnability varied across datasets, highlighting the necessity for tailored batch size strategies to achieve optimal data protection. The use of Summit's high-performance GPUs, along with the efficiency of the DDP framework, facilitated rapid updates of model parameters and consistent training across nodes. Our results underscore the critical role of selecting appropriate batch sizes based on the specific characteristics of each dataset to prevent learning and ensure data security in deep learning applications. The source code is publicly available at https://github.com/hrlblab/UE_HPC.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662734/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples[1]. However, traditional feature extraction pipelines based on tools like CellProfiler[2] often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI[4]. To address these challenges, we present PySpatial, a high-speed pathomics toolkit specifically designed for WSI-level analysis. PySpatial streamlines the conventional pipeline by directly operating on computational regions of interest, reducing redundant processing steps. Utilizing rtree-based spatial indexing and matrix-based computation, PySpatial efficiently maps and processes computational regions, significantly accelerating feature extraction while maintaining high accuracy. Our experiments on two datasets-Perivascular Epithelioid Cell (PEC) and data from the Kidney Precision Medicine Project (KPMP) [13]-demonstrate substantial performance improvements. For smaller and sparse objects in PEC datasets, PySpatial achieves nearly a 10-fold speedup compared to standard CellProfiler pipelines. For larger objects, such as glomeruli and arteries in KPMP datasets, PySpatial achieves a 2-fold speedup. These results highlight PySpatial's potential to handle large-scale WSI analysis with enhanced efficiency and accuracy, paving the way for broader applications in digital pathology.
{"title":"PySpatial: A High-Speed Whole Slide Image Pathomics Toolkit.","authors":"Yuechen Yang, Yu Wang, Tianyuan Yao, Ruining Deng, Mengmeng Yin, Shilin Zhao, Haichun Yang, Yuankai Huo","doi":"10.2352/EI.2025.37.12.HPCI-177","DOIUrl":"10.2352/EI.2025.37.12.HPCI-177","url":null,"abstract":"<p><p>Whole Slide Image (WSI) analysis plays a crucial role in modern digital pathology, enabling large-scale feature extraction from tissue samples[1]. However, traditional feature extraction pipelines based on tools like CellProfiler[2] often involve lengthy workflows, requiring WSI segmentation into patches, feature extraction at the patch level, and subsequent mapping back to the original WSI[4]. To address these challenges, we present PySpatial, a high-speed pathomics toolkit specifically designed for WSI-level analysis. PySpatial streamlines the conventional pipeline by directly operating on computational regions of interest, reducing redundant processing steps. Utilizing rtree-based spatial indexing and matrix-based computation, PySpatial efficiently maps and processes computational regions, significantly accelerating feature extraction while maintaining high accuracy. Our experiments on two datasets-Perivascular Epithelioid Cell (PEC) and data from the Kidney Precision Medicine Project (KPMP) [13]-demonstrate substantial performance improvements. For smaller and sparse objects in PEC datasets, PySpatial achieves nearly a 10-fold speedup compared to standard CellProfiler pipelines. For larger objects, such as glomeruli and arteries in KPMP datasets, PySpatial achieves a 2-fold speedup. These results highlight PySpatial's potential to handle large-scale WSI analysis with enhanced efficiency and accuracy, paving the way for broader applications in digital pathology.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662731/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01Epub Date: 2025-02-01DOI: 10.2352/EI.2025.37.14.COIMG-132
Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W Remedios, Shunxing Bao, Bennett A Landman, Lee E Wheless, Lori A Coburn, Keith T Wilson, Yaohong Wang, Shilin Zhao, Agnes B Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo
The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation.
Core results: The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological segmentation tasks might help the model to achieve better performance in dense object segmentation.
{"title":"Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging.","authors":"Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W Remedios, Shunxing Bao, Bennett A Landman, Lee E Wheless, Lori A Coburn, Keith T Wilson, Yaohong Wang, Shilin Zhao, Agnes B Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo","doi":"10.2352/EI.2025.37.14.COIMG-132","DOIUrl":"10.2352/EI.2025.37.14.COIMG-132","url":null,"abstract":"<p><p>The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital pathology where the training data are rare. In this study, we evaluate the zero-shot segmentation performance of SAM model on representative segmentation tasks on whole slide imaging (WSI), including (1) tumor segmentation, (2) non-tumor tissue segmentation, (3) cell nuclei segmentation.</p><p><strong>Core results: </strong>The results suggest that the zero-shot SAM model achieves remarkable segmentation performance for large connected objects. However, it does not consistently achieve satisfying performance for dense instance object segmentation, even with 20 prompts (clicks/boxes) on each image. We also summarized the identified limitations for digital pathology: (1) image resolution, (2) multiple scales, (3) prompt selection, and (4) model fine-tuning. In the future, the few-shot fine-tuning with images from downstream pathological segmentation tasks might help the model to achieve better performance in dense object segmentation.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11971729/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143796640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.2352/ei.2025.37.12.hpci-172
Quan Liu, Can Cui, Ruining Deng, Tianyuan Yao, Yuechen Yang, Yucheng Tang, Yuankai Huo
This paper introduces a novel framework for generating high-quality images from "visual sentences" extracted from video sequences. By combining a lightweight autoregressive model with a Vector Quantized Generative Adversarial Network (VQGAN), our approach achieves a favorable trade-off between computational efficiency and image fidelity. Unlike conventional methods that require substantial resources, the proposed framework efficiently captures sequential patterns in partially annotated frames and synthesizes coherent, contextually accurate images. Empirical results demonstrate that our method not only attains state-of-the-art performance on various benchmarks but also reduces inference overhead, making it well-suited for real-time and resource-constrained environments. Furthermore, we explore its applicability to medical image analysis, showcasing robust denoising, brightness adjustment, and segmentation capabilities. Overall, our contributions highlight an effective balance between performance and efficiency, paving the way for scalable and adaptive image generation across diverse multimedia domains.
{"title":"Write Sentence with Images: Revisit the Large Vision Model with Visual Sentence.","authors":"Quan Liu, Can Cui, Ruining Deng, Tianyuan Yao, Yuechen Yang, Yucheng Tang, Yuankai Huo","doi":"10.2352/ei.2025.37.12.hpci-172","DOIUrl":"10.2352/ei.2025.37.12.hpci-172","url":null,"abstract":"<p><p>This paper introduces a novel framework for generating high-quality images from \"visual sentences\" extracted from video sequences. By combining a lightweight autoregressive model with a Vector Quantized Generative Adversarial Network (VQGAN), our approach achieves a favorable trade-off between computational efficiency and image fidelity. Unlike conventional methods that require substantial resources, the proposed framework efficiently captures sequential patterns in partially annotated frames and synthesizes coherent, contextually accurate images. Empirical results demonstrate that our method not only attains state-of-the-art performance on various benchmarks but also reduces inference overhead, making it well-suited for real-time and resource-constrained environments. Furthermore, we explore its applicability to medical image analysis, showcasing robust denoising, brightness adjustment, and segmentation capabilities. Overall, our contributions highlight an effective balance between performance and efficiency, paving the way for scalable and adaptive image generation across diverse multimedia domains.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662733/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-01DOI: 10.2352/ei.2025.37.12.hpci-183
Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Yuechen Yang, Vishwesh Nath, Bingshan Li, You Chen, Yucheng Tang, Yuankai Huo
Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas ("global-to-local") and the development of a WSI-level image-text representation ("local-to-global") - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. Code and trained models are made available at https://github.com/hrlblab/mTREE.
{"title":"mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis.","authors":"Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Yuechen Yang, Vishwesh Nath, Bingshan Li, You Chen, Yucheng Tang, Yuankai Huo","doi":"10.2352/ei.2025.37.12.hpci-183","DOIUrl":"10.2352/ei.2025.37.12.hpci-183","url":null,"abstract":"<p><p>Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas (<b>\"global-to-local\"</b>) and the development of a WSI-level image-text representation (<b>\"local-to-global\"</b>) - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. Code and trained models are made available at https://github.com/hrlblab/mTREE.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"37 ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12662735/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145650103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-01DOI: 10.2352/EI.2024.36.11.HVEI-214
Alex D Hwang, Jaehyun Jung, Alex Bowers, Eli Peli
Avoiding person-to-person collisions is critical for visual field loss patients. Any intervention claiming to improve the safety of such patients should empirically demonstrate its efficacy. To design a VR mobility testing platform presenting multiple pedestrians, a distinction between colliding and non-colliding pedestrians must be clearly defined. We measured nine normally sighted subjects' collision envelopes (CE; an egocentric boundary distinguishing collision and non-collision) and found it changes based on the approaching pedestrian's bearing angle and speed. For person-to-person collision events for the VR mobility testing platform, non-colliding pedestrians should not evade the CE.
{"title":"Egocentric Boundaries on Distinguishing Colliding and Non-Colliding Pedestrians while Walking in a Virtual Environment.","authors":"Alex D Hwang, Jaehyun Jung, Alex Bowers, Eli Peli","doi":"10.2352/EI.2024.36.11.HVEI-214","DOIUrl":"10.2352/EI.2024.36.11.HVEI-214","url":null,"abstract":"<p><p>Avoiding person-to-person collisions is critical for visual field loss patients. Any intervention claiming to improve the safety of such patients should empirically demonstrate its efficacy. To design a VR mobility testing platform presenting multiple pedestrians, a distinction between colliding and non-colliding pedestrians must be clearly defined. We measured nine normally sighted subjects' collision envelopes (CE; an egocentric boundary distinguishing collision and non-collision) and found it changes based on the approaching pedestrian's bearing angle and speed. For person-to-person collision events for the VR mobility testing platform, non-colliding pedestrians should not evade the CE.</p>","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"36 ","pages":"2141-2148"},"PeriodicalIF":0.0,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10883473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139934514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-16DOI: 10.2352/ei.2023.35.2.sda-b02
Andrew J. Woods, Nicolas S. Holliman, Takashi Kawai, Bjorn Sommer
This manuscript serves as an introduction to the conference proceedings for the 34th annual Stereoscopic Displays and Applications conference and also provides an overview of the conference.
这份手稿作为第34届年度立体显示和应用会议的会议记录的介绍,也提供了会议的概述。
{"title":"34th Annual Stereoscopic Displays and Applications Conference - Introduction","authors":"Andrew J. Woods, Nicolas S. Holliman, Takashi Kawai, Bjorn Sommer","doi":"10.2352/ei.2023.35.2.sda-b02","DOIUrl":"https://doi.org/10.2352/ei.2023.35.2.sda-b02","url":null,"abstract":"This manuscript serves as an introduction to the conference proceedings for the 34th annual Stereoscopic Displays and Applications conference and also provides an overview of the conference.","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135693993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-16DOI: 10.2352/ei.2023.35.7.image-278
Yang Cai, Mel Siegel
We present a head-mounted holographic display system for thermographic image overlay, biometric sensing, and wireless telemetry. The system is lightweight and reconfigurable for multiple field applications, including object contour detection and enhancement, breathing rate detection, and telemetry over a mobile phone for peer-to-peer communication and incident command dashboard. Due to the constraints of the limited computing power of an embedded system, we developed a lightweight image processing algorithm for edge detection and breath rate detection, as well as an image compression codec. The system can be integrated into a helmet or personal protection equipment such as a face shield or goggles. It can be applied to firefighting, medical emergency response, and other first-response operations. Finally, we present a case study of "Cold Trailing" for forest fire containment.
{"title":"Wearable multispectral imaging and telemetry at edge","authors":"Yang Cai, Mel Siegel","doi":"10.2352/ei.2023.35.7.image-278","DOIUrl":"https://doi.org/10.2352/ei.2023.35.7.image-278","url":null,"abstract":"We present a head-mounted holographic display system for thermographic image overlay, biometric sensing, and wireless telemetry. The system is lightweight and reconfigurable for multiple field applications, including object contour detection and enhancement, breathing rate detection, and telemetry over a mobile phone for peer-to-peer communication and incident command dashboard. Due to the constraints of the limited computing power of an embedded system, we developed a lightweight image processing algorithm for edge detection and breath rate detection, as well as an image compression codec. The system can be integrated into a helmet or personal protection equipment such as a face shield or goggles. It can be applied to firefighting, medical emergency response, and other first-response operations. Finally, we present a case study of \"Cold Trailing\" for forest fire containment.","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135694716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-16DOI: 10.2352/ei.2023.35.2.sda-a02
Abstract The Stereoscopic Displays and Applications Conference (SD&A) focuses on developments covering the entire stereoscopic 3D imaging pipeline from capture, processing, and display to perception. The conference brings together practitioners and researchers from industry and academia to facilitate an exchange of current information on stereoscopic imaging topics. The highly popular conference demonstration session provides authors with a perfect additional opportunity to showcase their work. The long-running SD&A 3D Theater Session provides conference attendees with a wonderful opportunity to see how 3D content is being created and exhibited around the world. Publishing your work at SD&A offers excellent exposure—across all publication outlets, SD&A has the highest proportion of papers in the top 100 cited papers in the stereoscopic imaging field (Google Scholar, May 2013).
{"title":"Stereoscopic Displays and Applications XXXIV Conference Overview and Papers Program","authors":"","doi":"10.2352/ei.2023.35.2.sda-a02","DOIUrl":"https://doi.org/10.2352/ei.2023.35.2.sda-a02","url":null,"abstract":"Abstract The Stereoscopic Displays and Applications Conference (SD&A) focuses on developments covering the entire stereoscopic 3D imaging pipeline from capture, processing, and display to perception. The conference brings together practitioners and researchers from industry and academia to facilitate an exchange of current information on stereoscopic imaging topics. The highly popular conference demonstration session provides authors with a perfect additional opportunity to showcase their work. The long-running SD&A 3D Theater Session provides conference attendees with a wonderful opportunity to see how 3D content is being created and exhibited around the world. Publishing your work at SD&A offers excellent exposure—across all publication outlets, SD&A has the highest proportion of papers in the top 100 cited papers in the stereoscopic imaging field (Google Scholar, May 2013).","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135695210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-16DOI: 10.2352/ei.2023.35.3.mobmu-351
Yuriy Reznik, Nabajeet Barman
In recent years, we have seen significant progress in advanced image upscaling techniques, sometimes called super-resolution, ML-based, or AI-based upscaling. Such algorithms are now available not only in form of specialized software but also in drivers and SDKs supplied with modern graphics cards. Upscaling functions in NVIDIA Maxine SDK is one of the recent examples. However, to take advantage of this functionality in video streaming applications, one needs to (a) quantify the impacts of super-resolution techniques on the perceived visual quality, (b) implement video rendering incorporating super-resolution upscaling techniques, and (c) implement new bitrate+resolution adaptation algorithms in streaming players, enabling such players to deliver better quality of experience or better efficiency (e.g. reduce bandwidth usage) or both. Towards this end, in this paper, we propose several techniques that may be helpful to the implementation community. First, we offer a model quantifying the impacts of super resolution upscaling on the perceived quality. Our model is based on the Westerink-Roufs model connecting the true resolution of images/videos to perceived quality, with several additional parameters added, allowing its tuning to specific implementations of super-resolution techniques. We verify this model by using several recent datasets including MOS scores measured for several conventional up-scaling and super-resolution algorithms. Then, we propose an improved adaptation logic for video streaming players, considering video bitrates, encoded video resolutions, player size, and the upscaling method. This improved logic relies on our modified Westerink-Roufs model to predict perceived quality and suggests choices of renditions that would deliver the best quality for given display and upscaling method characteristics. Finally, we study the impacts of the proposed techniques and show that they can deliver practically appreciable results in terms of the expected QoE improvements and bandwidth savings.
{"title":"Improving the performance of web-streaming by super-resolution upscaling techniques","authors":"Yuriy Reznik, Nabajeet Barman","doi":"10.2352/ei.2023.35.3.mobmu-351","DOIUrl":"https://doi.org/10.2352/ei.2023.35.3.mobmu-351","url":null,"abstract":"In recent years, we have seen significant progress in advanced image upscaling techniques, sometimes called super-resolution, ML-based, or AI-based upscaling. Such algorithms are now available not only in form of specialized software but also in drivers and SDKs supplied with modern graphics cards. Upscaling functions in NVIDIA Maxine SDK is one of the recent examples. However, to take advantage of this functionality in video streaming applications, one needs to (a) quantify the impacts of super-resolution techniques on the perceived visual quality, (b) implement video rendering incorporating super-resolution upscaling techniques, and (c) implement new bitrate+resolution adaptation algorithms in streaming players, enabling such players to deliver better quality of experience or better efficiency (e.g. reduce bandwidth usage) or both. Towards this end, in this paper, we propose several techniques that may be helpful to the implementation community. First, we offer a model quantifying the impacts of super resolution upscaling on the perceived quality. Our model is based on the Westerink-Roufs model connecting the true resolution of images/videos to perceived quality, with several additional parameters added, allowing its tuning to specific implementations of super-resolution techniques. We verify this model by using several recent datasets including MOS scores measured for several conventional up-scaling and super-resolution algorithms. Then, we propose an improved adaptation logic for video streaming players, considering video bitrates, encoded video resolutions, player size, and the upscaling method. This improved logic relies on our modified Westerink-Roufs model to predict perceived quality and suggests choices of renditions that would deliver the best quality for given display and upscaling method characteristics. Finally, we study the impacts of the proposed techniques and show that they can deliver practically appreciable results in terms of the expected QoE improvements and bandwidth savings.","PeriodicalId":73514,"journal":{"name":"IS&T International Symposium on Electronic Imaging","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135693966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}