Machine Vision and Applications最新文献_第3页

Fourier feature network for 3D vessel reconstruction from biplane angiograms 从双平面血管造影中重建三维血管的傅立叶特征网络

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-08-01 DOI: 10.1007/s00138-024-01585-5

Sean Wu, Naoki Kaneko, David S. Liebeskind, Fabien Scalzo

3D reconstruction of biplane cerebral angiograms remains a challenging, unsolved research problem due to the loss of depth information and the unknown pixelwise correlation between input images. The occlusions arising from only two views complicate the reconstruction of fine vessel details and the simultaneous addressing of inherent missing information. In this paper, we take an incremental step toward solving this problem by reconstructing the corresponding 2D slice of the cerebral angiogram using biplane 1D image data. We developed a coordinate-based neural network that encodes the 1D image data along with a deterministic Fourier feature mapping from a given input point, resulting in a slice reconstruction that is more spatially accurate. Using only one 1D row of biplane image data, our Fourier feature network reconstructed the corresponding volume slices with a peak signal-to-noise ratio (PSNR) of 26.32 ± 0.36, a structural similarity index measure (SSIM) of 61.38 ± 1.79, a mean squared error (MSE) of 0.0023 ± 0.0002, and a mean absolute error (MAE) of 0.0364 ± 0.0029. Our research has implications for future work aimed at improving backprojection-based reconstruction by first examining individual slices from 1D information as a prerequisite.

由于深度信息的缺失和输入图像之间未知的像素相关性，双平面脑血管造影的三维重建仍是一个极具挑战性且尚未解决的研究问题。仅两个视图产生的闭塞使重建精细血管细节和同时处理固有缺失信息变得更加复杂。在本文中，我们通过使用双平面一维图像数据重建相应的二维脑血管切片，逐步解决了这一问题。我们开发了一种基于坐标的神经网络，它能将一维图像数据与给定输入点的确定性傅立叶特征映射一起编码，从而获得空间精度更高的切片重建。我们的傅立叶特征网络仅使用一维行双平面图像数据就能重建相应的容积切片，其峰值信噪比 (PSNR) 为 26.32 ± 0.36，结构相似性指数 (SSIM) 为 61.38 ± 1.79，平均平方误差 (MSE) 为 0.0023 ± 0.0002，平均绝对误差 (MAE) 为 0.0364 ± 0.0029。我们的研究对今后的工作具有启发意义，今后的工作旨在通过首先检查一维信息中的单个切片作为前提，改进基于反投影的重建。

{"title":"Fourier feature network for 3D vessel reconstruction from biplane angiograms","authors":"Sean Wu, Naoki Kaneko, David S. Liebeskind, Fabien Scalzo","doi":"10.1007/s00138-024-01585-5","DOIUrl":"https://doi.org/10.1007/s00138-024-01585-5","url":null,"abstract":"3D reconstruction of biplane cerebral angiograms remains a challenging, unsolved research problem due to the loss of depth information and the unknown pixelwise correlation between input images. The occlusions arising from only two views complicate the reconstruction of fine vessel details and the simultaneous addressing of inherent missing information. In this paper, we take an incremental step toward solving this problem by reconstructing the corresponding 2D slice of the cerebral angiogram using biplane 1D image data. We developed a coordinate-based neural network that encodes the 1D image data along with a deterministic Fourier feature mapping from a given input point, resulting in a slice reconstruction that is more spatially accurate. Using only one 1D row of biplane image data, our Fourier feature network reconstructed the corresponding volume slices with a peak signal-to-noise ratio (PSNR) of 26.32 ± 0.36, a structural similarity index measure (SSIM) of 61.38 ± 1.79, a mean squared error (MSE) of 0.0023 ± 0.0002, and a mean absolute error (MAE) of 0.0364 ± 0.0029. Our research has implications for future work aimed at improving backprojection-based reconstruction by first examining individual slices from 1D information as a prerequisite.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"46 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141867620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Semi-supervised metric learning incorporating weighted triplet constraint and Riemannian manifold optimization for classification 结合加权三元组约束和黎曼流形优化进行分类的半监督度量学习

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-26 DOI: 10.1007/s00138-024-01581-9

Yizhe Xia, Hongjuan Zhang

Metric learning focuses on finding similarities between data and aims to enlarge the distance between the samples with different labels. This work proposes a semi-supervised metric learning method based on the point-to-class structure of the labeled data, which is computationally less expensive, especially than using point-to-point structure. Specifically, the point-to-class structure is formulated into a new triplet constraint, which could narrow the distance of inner-class data and enlarge the distance of inter-class data simultaneously. Moreover, for measuring dissimilarity between different classes, weights are introduced into the triplet constraint and forms the weighted triplet constraint. Then, two kinds of regularizers such as spatial regularizer are rationally incorporated respectively in this model to mitigate the overfitting phenomenon and preserve the topological structure of the data. Furthermore, Riemannian gradient descent algorithm is adopted to solve the proposed model, since it can fully exploit the geometric structure of Riemannian manifolds and the proposed model can be regarded as a generalization of the unconstrained optimization problem in Euclidean space on Riemannian manifold. By introducing such solution strategy, the variables are constrained to a specific Riemannian manifold in each step of the iterative solution process, thereby enabling efficient and accurate model resolution. Finally, we conduct classification experiments on various data sets and compare the classification performance to state-of-the-art methods. The experimental results demonstrate that our proposed method has better performance in classification, especially for hyperspectral image data.

度量学习侧重于寻找数据之间的相似性，旨在扩大不同标签样本之间的距离。本研究提出了一种基于标注数据的点到类结构的半监督度量学习方法，与使用点到点结构相比，这种方法的计算成本更低。具体来说，点到类结构被表述为一个新的三元组约束，它可以同时缩小类内数据的距离和扩大类间数据的距离。此外，为了衡量不同类之间的差异，在三重约束中引入了权重，形成了加权三重约束。然后，在该模型中分别合理地加入了两种正则器，如空间正则器，以减轻过拟合现象并保持数据的拓扑结构。此外，由于黎曼梯度下降算法能充分利用黎曼流形的几何结构，且所提模型可视为欧几里得空间中无约束优化问题在黎曼流形上的广义化，因此采用黎曼梯度下降算法求解所提模型。通过引入这种求解策略，变量在迭代求解过程中的每一步都被约束在特定的黎曼流形上，从而实现了高效、准确的模型解析。最后，我们在各种数据集上进行了分类实验，并将分类性能与最先进的方法进行了比较。实验结果表明，我们提出的方法具有更好的分类性能，尤其是在高光谱图像数据方面。

{"title":"Semi-supervised metric learning incorporating weighted triplet constraint and Riemannian manifold optimization for classification","authors":"Yizhe Xia, Hongjuan Zhang","doi":"10.1007/s00138-024-01581-9","DOIUrl":"https://doi.org/10.1007/s00138-024-01581-9","url":null,"abstract":"Metric learning focuses on finding similarities between data and aims to enlarge the distance between the samples with different labels. This work proposes a semi-supervised metric learning method based on the point-to-class structure of the labeled data, which is computationally less expensive, especially than using point-to-point structure. Specifically, the point-to-class structure is formulated into a new triplet constraint, which could narrow the distance of inner-class data and enlarge the distance of inter-class data simultaneously. Moreover, for measuring dissimilarity between different classes, weights are introduced into the triplet constraint and forms the weighted triplet constraint. Then, two kinds of regularizers such as spatial regularizer are rationally incorporated respectively in this model to mitigate the overfitting phenomenon and preserve the topological structure of the data. Furthermore, Riemannian gradient descent algorithm is adopted to solve the proposed model, since it can fully exploit the geometric structure of Riemannian manifolds and the proposed model can be regarded as a generalization of the unconstrained optimization problem in Euclidean space on Riemannian manifold. By introducing such solution strategy, the variables are constrained to a specific Riemannian manifold in each step of the iterative solution process, thereby enabling efficient and accurate model resolution. Finally, we conduct classification experiments on various data sets and compare the classification performance to state-of-the-art methods. The experimental results demonstrate that our proposed method has better performance in classification, especially for hyperspectral image data.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"60 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Weakly supervised collaborative localization learning method for sewer pipe defect detection 用于下水管道缺陷检测的弱监督协同定位学习方法

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-25 DOI: 10.1007/s00138-024-01587-3

Yang Yang, Shangqin Yang, Qi Zhao, Honghui Cao, Xinjie Peng

Long-term corrosion and external disturbances can lead to defects in sewer pipes, which threaten important parts of urban infrastructure. The automatic defect detection algorithm based on closed-circuit televisions (CCTV) has gradually matured using supervised deep learning. However, there are different types and sizes of sewer pipe defects, and relying on human inspection to detect defects is time-consuming and subjective. Therefore, a few-shot, accurate and automatic method for sewer pipe defect with localization and fine-grained classification is needed. Thus, this study constructs a few-shot image-level dataset of 15 categories using the sewer dataset ML-Sewer and then presents a collaborative localization network based on weakly supervised learning to automatically classify and detect defects. Specifically, an attention refinement module (ARM) is designed to obtain classification results and high-level semantic features. Furthermore, considering the correlation between target regions and the extraction of target edge information, we designed a collaborative localization module (CLM) consisting of two branches. Then, to ensure that the network focuses on the complete target area, this study applies an image iteration module (IIM). Finally, the results of the two branches in the CLM are fused to acquire target localization. The experimental results show that the proposed model exhibits favorable performance in detecting sewer pipe defects. The proposed method exhibits prediction classification accuracy that reaches 69.76(%) and a positioning accuracy rate that reaches 65.32(%), which is higher than the performances of other weakly supervised detection models in sewer pipe defect detection.

长期腐蚀和外部干扰会导致下水管道出现缺陷，从而威胁到城市基础设施的重要组成部分。目前，基于闭路电视（CCTV）的缺陷自动检测算法已利用有监督深度学习逐渐成熟。然而，下水管道缺陷的类型和大小各不相同，依靠人工检测缺陷既费时又主观。因此，需要一种可定位和细粒度分类的少量、精确和自动的下水管道缺陷检测方法。因此，本研究利用下水道数据集 ML-Sewer 构建了一个包含 15 个类别的少量图像级数据集，然后提出了一种基于弱监督学习的协作定位网络，用于自动分类和检测缺陷。具体来说，设计了一个注意力细化模块（ARM），以获得分类结果和高级语义特征。此外，考虑到目标区域之间的相关性和目标边缘信息的提取，我们设计了一个由两个分支组成的协作定位模块（CLM）。然后，为了确保网络聚焦于完整的目标区域，本研究应用了图像迭代模块（IIM）。最后，CLM 中两个分支的结果被融合，从而获得目标定位。实验结果表明，所提出的模型在检测下水管道缺陷方面表现良好。所提方法的预测分类准确率达到69.76%，定位准确率达到65.32%，高于其他弱监督检测模型在下水管道缺陷检测中的表现。

{"title":"Weakly supervised collaborative localization learning method for sewer pipe defect detection","authors":"Yang Yang, Shangqin Yang, Qi Zhao, Honghui Cao, Xinjie Peng","doi":"10.1007/s00138-024-01587-3","DOIUrl":"https://doi.org/10.1007/s00138-024-01587-3","url":null,"abstract":"Long-term corrosion and external disturbances can lead to defects in sewer pipes, which threaten important parts of urban infrastructure. The automatic defect detection algorithm based on closed-circuit televisions (CCTV) has gradually matured using supervised deep learning. However, there are different types and sizes of sewer pipe defects, and relying on human inspection to detect defects is time-consuming and subjective. Therefore, a few-shot, accurate and automatic method for sewer pipe defect with localization and fine-grained classification is needed. Thus, this study constructs a few-shot image-level dataset of 15 categories using the sewer dataset ML-Sewer and then presents a collaborative localization network based on weakly supervised learning to automatically classify and detect defects. Specifically, an attention refinement module (ARM) is designed to obtain classification results and high-level semantic features. Furthermore, considering the correlation between target regions and the extraction of target edge information, we designed a collaborative localization module (CLM) consisting of two branches. Then, to ensure that the network focuses on the complete target area, this study applies an image iteration module (IIM). Finally, the results of the two branches in the CLM are fused to acquire target localization. The experimental results show that the proposed model exhibits favorable performance in detecting sewer pipe defects. The proposed method exhibits prediction classification accuracy that reaches 69.76(%) and a positioning accuracy rate that reaches 65.32(%), which is higher than the performances of other weakly supervised detection models in sewer pipe defect detection.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"10 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An insect vision-inspired neuromorphic vision systems in low-light obstacle avoidance for intelligent vehicles 受昆虫视觉启发的神经形态视觉系统在智能车辆低照度避障中的应用

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-25 DOI: 10.1007/s00138-024-01582-8

Haiyang Wang, Songwei Wang, Longlong Qian

The Lobular Giant Motion Detector (LGMD) is a neuron in the insect visual system that has been extensively studied, especially in locusts. This neuron is highly sensitive to rapidly approaching objects, allowing insects to react quickly to avoid potential threats such as approaching predators or obstacles. In the realm of intelligent vehicles, due to the lack of performance of conventional RGB cameras in extreme light conditions or at high-speed movements. Inspired by biological mechanisms, we have developed a novel neuromorphic dynamic vision sensor (DVS) driven LGMD spiking neural network (SNN) model. SNNs, distinguished by their bio-inspired spiking dynamics, offer a unique advantage in processing time-varying visual data, particularly in scenarios where rapid response and energy efficiency are paramount. Our model incorporates two distinct types of Leaky Integrate-and-Fire (LIF) neuron models and synapse models, which have been instrumental in reducing network latency and enhancing the system’s reaction speed. And addressing the challenge of noise in event streams, we have implemented denoising techniques to ensure the integrity of the input data. Integrating the proposed methods, ultimately, the model was integrated into an intelligent vehicle to conduct real-time obstacle avoidance testing in response to looming objects in simulated real scenarios. The experimental results show that the model’s ability to compensate for the limitations of traditional RGB cameras in detecting looming targets in the dark, and can detect looming targets and implement effective obstacle avoidance in complex and diverse dark environments.

叶状巨型运动探测器（LGMD）是昆虫视觉系统中的一种神经元，已被广泛研究，尤其是在蝗虫中。这种神经元对快速接近的物体高度敏感，使昆虫能够迅速做出反应，避开潜在的威胁，如接近的捕食者或障碍物。在智能车辆领域，由于传统的 RGB 摄像头在极端光线条件下或高速运动时性能不足。受生物机制的启发，我们开发了一种新型神经形态动态视觉传感器（DVS）驱动的 LGMD 尖峰神经网络（SNN）模型。SNN 因其受生物启发的尖峰动态而与众不同，在处理时变视觉数据方面具有独特的优势，尤其是在快速反应和能效至关重要的场景中。我们的模型融合了两种不同类型的 "漏-集成-火"（LIF）神经元模型和突触模型，它们在减少网络延迟和提高系统反应速度方面发挥了重要作用。针对事件流中的噪声问题，我们采用了去噪技术，以确保输入数据的完整性。综合上述方法，我们最终将该模型集成到一辆智能汽车中，在模拟真实场景中进行实时避障测试，以应对若隐若现的物体。实验结果表明，该模型能够弥补传统 RGB 摄像机在黑暗中检测隐现目标的局限性，并能在复杂多样的黑暗环境中检测隐现目标，实现有效的避障。

{"title":"An insect vision-inspired neuromorphic vision systems in low-light obstacle avoidance for intelligent vehicles","authors":"Haiyang Wang, Songwei Wang, Longlong Qian","doi":"10.1007/s00138-024-01582-8","DOIUrl":"https://doi.org/10.1007/s00138-024-01582-8","url":null,"abstract":"The Lobular Giant Motion Detector (LGMD) is a neuron in the insect visual system that has been extensively studied, especially in locusts. This neuron is highly sensitive to rapidly approaching objects, allowing insects to react quickly to avoid potential threats such as approaching predators or obstacles. In the realm of intelligent vehicles, due to the lack of performance of conventional RGB cameras in extreme light conditions or at high-speed movements. Inspired by biological mechanisms, we have developed a novel neuromorphic dynamic vision sensor (DVS) driven LGMD spiking neural network (SNN) model. SNNs, distinguished by their bio-inspired spiking dynamics, offer a unique advantage in processing time-varying visual data, particularly in scenarios where rapid response and energy efficiency are paramount. Our model incorporates two distinct types of Leaky Integrate-and-Fire (LIF) neuron models and synapse models, which have been instrumental in reducing network latency and enhancing the system’s reaction speed. And addressing the challenge of noise in event streams, we have implemented denoising techniques to ensure the integrity of the input data. Integrating the proposed methods, ultimately, the model was integrated into an intelligent vehicle to conduct real-time obstacle avoidance testing in response to looming objects in simulated real scenarios. The experimental results show that the model’s ability to compensate for the limitations of traditional RGB cameras in detecting looming targets in the dark, and can detect looming targets and implement effective obstacle avoidance in complex and diverse dark environments.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"40 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

React: recognize every action everywhere all at once 反应：一次性识别各地的每一个动作

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-20 DOI: 10.1007/s00138-024-01561-z

Naga V. S. Raviteja Chappa, Pha Nguyen, Page Daniel Dobbs, Khoa Luu

In the realm of computer vision, Group Activity Recognition (GAR) plays a vital role, finding applications in sports video analysis, surveillance, and social scene understanding. This paper introduces Recognize Every Action Everywhere All At Once (REACT), a novel architecture designed to model complex contextual relationships within videos. REACT leverages advanced transformer-based models for encoding intricate contextual relationships, enhancing understanding of group dynamics. Integrated Vision-Language Encoding facilitates efficient capture of spatiotemporal interactions and multi-modal information, enabling comprehensive scene understanding. The model’s precise action localization refines joint understanding of text and video data, enabling precise bounding box retrieval and enhancing semantic links between textual descriptions and visual reality. Actor-Specific Fusion strikes a balance between actor-specific details and contextual information, improving model specificity and robustness in recognizing group activities. Experimental results demonstrate REACT’s superiority over state-of-the-art GAR approaches, achieving higher accuracy in recognizing and understanding group activities across diverse datasets. This work significantly advances group activity recognition, offering a robust framework for nuanced scene comprehension.

在计算机视觉领域，群体活动识别（GAR）发挥着至关重要的作用，可应用于体育视频分析、监控和社会场景理解。本文介绍了 "一次识别所有地方的所有动作"（REACT），这是一种新颖的架构，旨在为视频中复杂的上下文关系建模。REACT 利用先进的基于变压器的模型来编码复杂的上下文关系，从而增强对群体动态的理解。综合视觉语言编码有助于有效捕捉时空互动和多模态信息，从而实现全面的场景理解。该模型的精确动作定位功能可完善对文本和视频数据的共同理解，从而实现精确的边界框检索，并增强文本描述与视觉现实之间的语义联系。特定演员融合在特定演员细节和上下文信息之间取得了平衡，提高了模型的特定性和识别群体活动的鲁棒性。实验结果表明，与最先进的 GAR 方法相比，REACT 在识别和理解不同数据集的群体活动方面具有更高的准确性。这项工作极大地推动了群体活动识别，为细致入微的场景理解提供了一个强大的框架。

{"title":"React: recognize every action everywhere all at once","authors":"Naga V. S. Raviteja Chappa, Pha Nguyen, Page Daniel Dobbs, Khoa Luu","doi":"10.1007/s00138-024-01561-z","DOIUrl":"https://doi.org/10.1007/s00138-024-01561-z","url":null,"abstract":"In the realm of computer vision, Group Activity Recognition (GAR) plays a vital role, finding applications in sports video analysis, surveillance, and social scene understanding. This paper introduces Recognize Every Action Everywhere All At Once (REACT), a novel architecture designed to model complex contextual relationships within videos. REACT leverages advanced transformer-based models for encoding intricate contextual relationships, enhancing understanding of group dynamics. Integrated Vision-Language Encoding facilitates efficient capture of spatiotemporal interactions and multi-modal information, enabling comprehensive scene understanding. The model’s precise action localization refines joint understanding of text and video data, enabling precise bounding box retrieval and enhancing semantic links between textual descriptions and visual reality. Actor-Specific Fusion strikes a balance between actor-specific details and contextual information, improving model specificity and robustness in recognizing group activities. Experimental results demonstrate REACT’s superiority over state-of-the-art GAR approaches, achieving higher accuracy in recognizing and understanding group activities across diverse datasets. This work significantly advances group activity recognition, offering a robust framework for nuanced scene comprehension.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141743258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Virtual home staging and relighting from a single panorama under natural illumination 在自然光下，通过单个全景图进行虚拟家居分期和重新照明

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-11 DOI: 10.1007/s00138-024-01559-7

Guanzhou Ji, Azadeh O. Sawyer, Srinivasa G. Narasimhan

Virtual staging technique can digitally showcase a variety of real-world scenes. However, relighting indoor scenes from a single image is challenging due to unknown scene geometry, material properties, and outdoor spatially-varying lighting. In this study, we use the High Dynamic Range (HDR) technique to capture an indoor panorama and its paired outdoor hemispherical photograph, and we develop a novel inverse rendering approach for scene relighting and editing. Our method consists of four key components: (1) panoramic furniture detection and removal, (2) automatic floor layout design, (3) global rendering with scene geometry, new furniture objects, and the real-time outdoor photograph, and (4) virtual staging with new camera position, outdoor illumination, scene texture, and electrical light. The results demonstrate that a single indoor panorama can be used to generate high-quality virtual scenes under new environmental conditions. Additionally, we contribute a new calibrated HDR (Cali-HDR) dataset that consists of 137 paired indoor and outdoor photographs. The animation for virtual rendered scenes is available here.

虚拟舞台技术可以通过数字技术展示现实世界中的各种场景。然而，由于未知的场景几何形状、材料属性和室外空间变化的照明，从单一图像重新照明室内场景具有挑战性。在本研究中，我们使用高动态范围（HDR）技术来捕捉室内全景及其配对的室外半球照片，并开发了一种用于场景重新照明和编辑的新型反渲染方法。我们的方法由四个关键部分组成：(1) 全景家具检测和移除；(2) 自动地板布局设计；(3) 使用场景几何图形、新家具对象和实时室外照片进行全局渲染；(4) 使用新相机位置、室外照明、场景纹理和电光进行虚拟分期。结果表明，在新的环境条件下，可以使用一张室内全景图生成高质量的虚拟场景。此外，我们还提供了一个新的校准 HDR（Cali-HDR）数据集，该数据集由 137 张成对的室内和室外照片组成。虚拟渲染场景的动画在此提供。

{"title":"Virtual home staging and relighting from a single panorama under natural illumination","authors":"Guanzhou Ji, Azadeh O. Sawyer, Srinivasa G. Narasimhan","doi":"10.1007/s00138-024-01559-7","DOIUrl":"https://doi.org/10.1007/s00138-024-01559-7","url":null,"abstract":"Virtual staging technique can digitally showcase a variety of real-world scenes. However, relighting indoor scenes from a single image is challenging due to unknown scene geometry, material properties, and outdoor spatially-varying lighting. In this study, we use the High Dynamic Range (HDR) technique to capture an indoor panorama and its paired outdoor hemispherical photograph, and we develop a novel inverse rendering approach for scene relighting and editing. Our method consists of four key components: (1) panoramic furniture detection and removal, (2) automatic floor layout design, (3) global rendering with scene geometry, new furniture objects, and the real-time outdoor photograph, and (4) virtual staging with new camera position, outdoor illumination, scene texture, and electrical light. The results demonstrate that a single indoor panorama can be used to generate high-quality virtual scenes under new environmental conditions. Additionally, we contribute a new calibrated HDR (Cali-HDR) dataset that consists of 137 paired indoor and outdoor photographs. The animation for virtual rendered scenes is available here.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141584987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Evaluation of data augmentation techniques on subjective tasks 评估主观任务的数据增强技术

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-11 DOI: 10.1007/s00138-024-01574-8

Luis Gonzalez-Naharro, M. Julia Flores, Jesus Martínez-Gómez, Jose M. Puerta

Data augmentation is widely applied in various computer vision problems for artificially increasing the size of a dataset by transforming the original data. These techniques are employed in small datasets to prevent overfitting, and also in problems where labelling is difficult. Nevertheless, data augmentation assumes that transformations preserve groundtruth labels, something not true for subjective problems such as aesthetic quality assessment, in which image transformations can alter their aesthetic quality groundtruth. In this work, we study how data augmentation affects subjective problems. We train a series of models, changing the probability of augmenting images and the intensity of such augmentations. We train models on AVA for quality prediction, on Photozilla for photo style prediction, and on subjective and objective labels of CelebA. Results show that subjective tasks get worse results than objective tasks with traditional augmentation techniques, and this worsening depends on the specific type of subjectivity.

数据扩增被广泛应用于各种计算机视觉问题中，通过转换原始数据来人为增加数据集的大小。这些技术可用于小数据集，以防止过拟合，也可用于标记困难的问题。然而，数据扩增假定变换会保留真实标签，但对于审美质量评估等主观问题来说，这是不正确的，因为图像变换会改变其审美质量的真实情况。在这项工作中，我们研究了数据增强如何影响主观问题。我们通过改变增强图像的概率和强度来训练一系列模型。我们在用于质量预测的 AVA、用于照片风格预测的 Photozilla 以及 CelebA 的主观和客观标签上训练模型。结果表明，使用传统的增强技术，主观任务比客观任务得到的结果更差，而且这种恶化取决于主观性的具体类型。

引用次数: 0

Continual learning approaches to hand–eye calibration in robots 机器人手眼校准的持续学习方法

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-10 DOI: 10.1007/s00138-024-01572-w

Ozan Bahadir, Jan Paul Siebert, Gerardo Aragon-Camarasa

This study addresses the problem of hand–eye calibration in robotic systems by developing Continual Learning (CL)-based approaches. Traditionally, robots require explicit models to transfer knowledge from camera observations to their hands or base. However, this poses limitations, as the hand–eye calibration parameters are typically valid only for the current camera configuration. We, therefore, propose a flexible and autonomous hand–eye calibration system that can adapt to changes in camera pose over time. Three CL-based approaches are introduced: the naive CL approach, the reservoir rehearsal approach, and the hybrid approach combining reservoir sampling with new data evaluation. The naive CL approach suffers from catastrophic forgetting, while the reservoir rehearsal approach mitigates this issue by sampling uniformly from past data. The hybrid approach further enhances performance by incorporating reservoir sampling and assessing new data for novelty. Experiments conducted in simulated and real-world environments demonstrate that the CL-based approaches, except for the naive approach, achieve competitive performance compared to traditional batch learning-based methods. This suggests that treating hand–eye calibration as a time sequence problem enables the extension of the learned space without complete retraining. The adaptability of the CL-based approaches facilitates accommodating changes in camera pose, leading to an improved hand–eye calibration system.

本研究通过开发基于持续学习（CL）的方法来解决机器人系统中的手眼校准问题。传统上，机器人需要明确的模型来将摄像头观测到的知识转移到手部或基座上。然而，这也带来了局限性，因为手眼校准参数通常只对当前的摄像头配置有效。因此，我们提出了一种灵活自主的手眼校准系统，该系统可适应摄像机姿态的长期变化。我们介绍了三种基于 CL 的方法：天真 CL 方法、水库演练方法以及结合水库采样和新数据评估的混合方法。天真 CL 方法存在灾难性遗忘问题，而水库预演方法则通过对过去数据进行均匀采样来缓解这一问题。混合方法结合了水库采样和新数据新颖性评估，进一步提高了性能。在模拟和真实环境中进行的实验表明，与传统的基于批量学习的方法相比，除了天真方法外，其他基于手眼校准的方法都能获得具有竞争力的性能。这表明，将手眼校准作为一个时序问题来处理，可以在不完全重新训练的情况下扩展学习空间。基于 CL 的方法的适应性有助于适应摄像机姿势的变化，从而改进手眼校准系统。

{"title":"Continual learning approaches to hand–eye calibration in robots","authors":"Ozan Bahadir, Jan Paul Siebert, Gerardo Aragon-Camarasa","doi":"10.1007/s00138-024-01572-w","DOIUrl":"https://doi.org/10.1007/s00138-024-01572-w","url":null,"abstract":"This study addresses the problem of hand–eye calibration in robotic systems by developing Continual Learning (CL)-based approaches. Traditionally, robots require explicit models to transfer knowledge from camera observations to their hands or base. However, this poses limitations, as the hand–eye calibration parameters are typically valid only for the current camera configuration. We, therefore, propose a flexible and autonomous hand–eye calibration system that can adapt to changes in camera pose over time. Three CL-based approaches are introduced: the naive CL approach, the reservoir rehearsal approach, and the hybrid approach combining reservoir sampling with new data evaluation. The naive CL approach suffers from catastrophic forgetting, while the reservoir rehearsal approach mitigates this issue by sampling uniformly from past data. The hybrid approach further enhances performance by incorporating reservoir sampling and assessing new data for novelty. Experiments conducted in simulated and real-world environments demonstrate that the CL-based approaches, except for the naive approach, achieve competitive performance compared to traditional batch learning-based methods. This suggests that treating hand–eye calibration as a time sequence problem enables the extension of the learned space without complete retraining. The adaptability of the CL-based approaches facilitates accommodating changes in camera pose, leading to an improved hand–eye calibration system.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"41 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141584984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MDUNet: deep-prior unrolling network with multi-parameter data integration for low-dose computed tomography reconstruction MDUNet：用于低剂量计算机断层扫描重建的多参数数据集成深度优先解卷网络

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-09 DOI: 10.1007/s00138-024-01568-6

Temitope Emmanuel Komolafe, Nizhuan Wang, Yuchi Tian, Adegbola Oyedotun Adeniji, Liang Zhou

The goal of this study is to reconstruct a high-quality computed tomography (CT) image from low-dose acquisition using an unrolling deep learning-based reconstruction network with less computational complexity and a more generalized model. We propose a MDUNet: Multi-parameters deep-prior unrolling network that employs the cascaded convolutional and deconvolutional blocks to unroll the model-based iterative reconstruction within a finite number of iterations by data-driven training. Furthermore, the embedded data consistency constraint in MDUNet ensures that the input low-dose images and the low-dose sinograms are consistent as well as incorporate the physics imaging geometry. Additionally, multi-parameter training was employed to enhance the model's generalization during the training process. Experimental results based on AAPM Low-dose CT datasets show that the proposed MDUNet significantly outperforms other state-of-the-art (SOTA) methods quantitatively and qualitatively. Also, the cascaded blocks reduce the computational complexity with reduced training parameters and generalize well on different datasets. In addition, the proposed MDUNet is validated on 8 different organs of interest, with more detailed structures recovered and high-quality images generated. The experimental results demonstrate that the proposed MDUNet generates favorable improvement over other competing methods in terms of visual quality, quantitative performance, and computational efficiency. The MDUNet has improved image quality with reduced computational cost and good generalization which effectively lowers radiation dose and reduces scanning time, making it favorable for future clinical deployment.

本研究的目标是使用计算复杂度更低、通用性更强的基于滚动深度学习的重建网络，从低剂量采集中重建高质量的计算机断层扫描（CT）图像。我们提出了一种 MDUNet：该网络采用级联卷积和解卷积块，通过数据驱动训练，在有限的迭代次数内展开基于模型的迭代重建。此外，MDUNet 中的嵌入式数据一致性约束可确保输入的低剂量图像和低剂量正弦曲线保持一致，并结合物理成像几何。此外，在训练过程中还采用了多参数训练来增强模型的泛化能力。基于 AAPM 低剂量 CT 数据集的实验结果表明，所提出的 MDUNet 在定量和定性方面都明显优于其他最先进的（SOTA）方法。同时，级联块降低了计算复杂度，减少了训练参数，并能在不同数据集上很好地泛化。此外，提议的 MDUNet 还在 8 个不同的相关器官上进行了验证，恢复了更详细的结构并生成了高质量的图像。实验结果表明，与其他竞争方法相比，所提出的 MDUNet 在视觉质量、定量性能和计算效率方面都有很大改进。MDUNet 提高了图像质量，降低了计算成本，并具有良好的泛化能力，有效降低了辐射剂量，缩短了扫描时间，有利于未来的临床应用。

{"title":"MDUNet: deep-prior unrolling network with multi-parameter data integration for low-dose computed tomography reconstruction","authors":"Temitope Emmanuel Komolafe, Nizhuan Wang, Yuchi Tian, Adegbola Oyedotun Adeniji, Liang Zhou","doi":"10.1007/s00138-024-01568-6","DOIUrl":"https://doi.org/10.1007/s00138-024-01568-6","url":null,"abstract":"The goal of this study is to reconstruct a high-quality computed tomography (CT) image from low-dose acquisition using an unrolling deep learning-based reconstruction network with less computational complexity and a more generalized model. We propose a MDUNet: Multi-parameters deep-prior unrolling network that employs the cascaded convolutional and deconvolutional blocks to unroll the model-based iterative reconstruction within a finite number of iterations by data-driven training. Furthermore, the embedded data consistency constraint in MDUNet ensures that the input low-dose images and the low-dose sinograms are consistent as well as incorporate the physics imaging geometry. Additionally, multi-parameter training was employed to enhance the model's generalization during the training process. Experimental results based on AAPM Low-dose CT datasets show that the proposed MDUNet significantly outperforms other state-of-the-art (SOTA) methods quantitatively and qualitatively. Also, the cascaded blocks reduce the computational complexity with reduced training parameters and generalize well on different datasets. In addition, the proposed MDUNet is validated on 8 different organs of interest, with more detailed structures recovered and high-quality images generated. The experimental results demonstrate that the proposed MDUNet generates favorable improvement over other competing methods in terms of visual quality, quantitative performance, and computational efficiency. The MDUNet has improved image quality with reduced computational cost and good generalization which effectively lowers radiation dose and reduces scanning time, making it favorable for future clinical deployment.","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"2016 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A framework of specialized knowledge distillation for Siamese tracker on challenging attributes 针对具有挑战性属性的连体跟踪器的专业知识提炼框架

IF 3.3 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Vision and Applications

Pub Date : 2024-07-09 DOI: 10.1007/s00138-024-01578-4

Yiding Li, Atsushi Shimada, Tsubasa Minematsu, Cheng Tang

In recent years, Siamese network-based trackers have achieved significant improvements in real-time tracking. Despite their success, performance bottlenecks caused by unavoidably complex scenarios in target-tracking tasks are becoming increasingly non-negligible. For example, occlusion and fast motion are factors that can easily cause tracking failures and are labeled in many high-quality tracking databases as challenging attributes. In addition, Siamese trackers tend to suffer from high memory costs, which restricts their applicability to mobile devices with tight memory budgets. To address these issues, we propose a Specialized teachers Distilled Siamese Tracker (SDST) framework to learn a student tracker, which is small, fast, and has enhanced performance in challenging attributes. SDST introduces two types of teachers for multi-teacher distillation: general teacher and specialized teachers. The former imparts basic knowledge to the students. The latter is used to transfer specialized knowledge to students, which helps improve their performance in challenging attributes. For students to efficiently capture critical knowledge from the two types of teachers, SDST is equipped with a carefully designed multi-teacher knowledge distillation model. Our model contains two processes: general teacher-student knowledge transfer and specialized teachers-student knowledge transfer. Extensive empirical evaluations of several popular Siamese trackers demonstrated the generality and effectiveness of our framework. Moreover, the results on Large-scale Single Object Tracking (LaSOT) show that the proposed method achieves a significant improvement of more than 2–4% in most challenging attributes. SDST also maintained high overall performance while achieving compression rates of up to 8x and framerates of 252 FPS and obtaining outstanding accuracy on all challenging attributes.

近年来，基于连体网络的跟踪器在实时跟踪方面取得了显著进步。尽管取得了成功，但目标跟踪任务中不可避免的复杂场景所造成的性能瓶颈也越来越不容忽视。例如，遮挡和快速运动是容易导致跟踪失败的因素，在许多高质量跟踪数据库中被标记为具有挑战性的属性。此外，连体跟踪器的内存成本往往很高，这限制了它们在内存预算紧张的移动设备上的适用性。为了解决这些问题，我们提出了一种专用教师分馏连体跟踪器（SDST）框架来学习学生跟踪器，这种跟踪器体积小、速度快，而且在具有挑战性的属性方面性能更强。SDST 引入了两类教师进行多教师提炼：普通教师和专业教师。前者向学生传授基础知识。后者用于向学生传授专业知识，有助于提高他们在具有挑战性的属性方面的表现。为了让学生从这两类教师那里有效地获取关键知识，SDST 配备了一个精心设计的多教师知识提炼模型。我们的模型包含两个过程：普通教师-学生知识转移和专业教师-学生知识转移。对几种流行的连体跟踪器进行的广泛实证评估证明了我们框架的通用性和有效性。此外，大规模单个物体跟踪（LaSOT）的结果表明，所提出的方法在大多数具有挑战性的属性上都取得了超过 2-4% 的显著改进。SDST 还保持了较高的整体性能，同时实现了高达 8 倍的压缩率和 252 FPS 的帧率，并在所有具有挑战性的属性上获得了出色的精度。

{"title":"A framework of specialized knowledge distillation for Siamese tracker on challenging attributes","authors":"Yiding Li, Atsushi Shimada, Tsubasa Minematsu, Cheng Tang","doi":"10.1007/s00138-024-01578-4","DOIUrl":"https://doi.org/10.1007/s00138-024-01578-4","url":null,"abstract":"In recent years, Siamese network-based trackers have achieved significant improvements in real-time tracking. Despite their success, performance bottlenecks caused by unavoidably complex scenarios in target-tracking tasks are becoming increasingly non-negligible. For example, occlusion and fast motion are factors that can easily cause tracking failures and are labeled in many high-quality tracking databases as challenging attributes. In addition, Siamese trackers tend to suffer from high memory costs, which restricts their applicability to mobile devices with tight memory budgets. To address these issues, we propose a Specialized teachers Distilled Siamese Tracker (SDST) framework to learn a student tracker, which is small, fast, and has enhanced performance in challenging attributes. SDST introduces two types of teachers for multi-teacher distillation: general teacher and specialized teachers. The former imparts basic knowledge to the students. The latter is used to transfer specialized knowledge to students, which helps improve their performance in challenging attributes. For students to efficiently capture critical knowledge from the two types of teachers, SDST is equipped with a carefully designed multi-teacher knowledge distillation model. Our model contains two processes: general teacher-student knowledge transfer and specialized teachers-student knowledge transfer. Extensive empirical evaluations of several popular Siamese trackers demonstrated the generality and effectiveness of our framework. Moreover, the results on Large-scale Single Object Tracking (LaSOT) show that the proposed method achieves a significant improvement of more than 2–4% in most challenging attributes. SDST also maintained high overall performance while achieving compression rates of up to 8x and framerates of 252 FPS and obtaining outstanding accuracy on all challenging attributes.\u0000","PeriodicalId":51116,"journal":{"name":"Machine Vision and Applications","volume":"16 1","pages":""},"PeriodicalIF":3.3,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0