2023 18th International Conference on Machine Vision and Applications (MVA)最新文献

英文中文

Domain Adaptation from Visible-Light to FIR with Reliable Pseudo Labels 从可见光到FIR的可靠伪标签域自适应

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216102

Juki Tanimoto, Haruya Kyutoku, Keisuke Doman, Y. Mekada

Deep learning object detection models using visible-light cameras are easily affected by weather and lighting conditions, whereas those using far-infrared cameras are less affected by such conditions. This paper proposes a domain adaptation method using pseudo labels from a visible-light camera toward an accurate object detection from far-infrared images. Our method projects visible light-domain detection results onto far-infrared images, and uses them as pseudo labels for training a far-infrared detection model. We confirmed the effectiveness of our method through experiments.

使用可见光相机的深度学习目标检测模型容易受到天气和光照条件的影响，而使用远红外相机的深度学习目标检测模型受此类条件的影响较小。本文提出了一种利用可见光相机伪标签的域自适应方法，用于远红外图像的精确目标检测。我们的方法将可见光域检测结果投影到远红外图像上，并将其作为伪标签用于训练远红外检测模型。我们通过实验证实了我们方法的有效性。

引用次数: 0

Intra-frame Skeleton Constraints Modeling and Grouping Strategy Based Multi-Scale Graph Convolution Network for 3D Human Motion Prediction 基于帧内骨架约束建模和分组策略的多尺度图卷积网络三维人体运动预测

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216076

Zhihan Zhuang, Yuan Li, Songlin Du, T. Ikenaga

Attention-based feed-forward networks and graph convolution networks have recently shown great promise in 3D skeleton-based human motion prediction for their good performance in learning temporal and spatial relations. However, previous methods have two critical issues: first, spatial dependencies for distal joints in each independent frame are hard to learn; second, the basic architecture of graph convolution network ignores hierarchical structure and diverse motion patterns of different body parts. To address these issues, this paper proposes an intra-frame skeleton constraints modeling method and a Grouping based Multi-Scale Graph Convolution Network (GMS-GCN) model. The intra-frame skeleton constraints modeling method leverages self-attention mechanism and a designed adjacency matrix to model the skeleton constraints of distal joints in each independent frame. The GMS-GCN utilizes a grouping strategy to learn the dynamics of various body parts separately. Instead of mapping features in the same feature space, GMS-GCN extracts human body features in different dimensions by up-sample and down-sample GCN layers. Experiment results demonstrate that our method achieves an average MPJPE of 34.7mm for short-term prediction and 93.2mm for long-term prediction and both outperform the state-of-the-art approaches.

基于注意力的前馈网络和图卷积网络最近在基于三维骨骼的人体运动预测中显示出很大的前景，因为它们在学习时空关系方面表现良好。然而，以前的方法有两个关键问题:首先，每个独立框架中的远端关节的空间依赖性很难学习;其次，图卷积网络的基本架构忽略了层次结构和不同身体部位的不同运动模式。为了解决这些问题，本文提出了框架内骨架约束建模方法和基于分组的多尺度图卷积网络(GMS-GCN)模型。框架内骨架约束建模方法利用自关注机制和设计的邻接矩阵对每个独立框架中远端关节的骨架约束进行建模。GMS-GCN采用分组策略，分别学习身体各部位的动态。GMS-GCN不是在同一特征空间中映射特征，而是通过上采样和下采样GCN层提取不同维度的人体特征。实验结果表明，该方法短期预测的平均MPJPE为34.7mm，长期预测的平均MPJPE为93.2mm，均优于目前最先进的方法。

{"title":"Intra-frame Skeleton Constraints Modeling and Grouping Strategy Based Multi-Scale Graph Convolution Network for 3D Human Motion Prediction","authors":"Zhihan Zhuang, Yuan Li, Songlin Du, T. Ikenaga","doi":"10.23919/MVA57639.2023.10216076","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216076","url":null,"abstract":"Attention-based feed-forward networks and graph convolution networks have recently shown great promise in 3D skeleton-based human motion prediction for their good performance in learning temporal and spatial relations. However, previous methods have two critical issues: first, spatial dependencies for distal joints in each independent frame are hard to learn; second, the basic architecture of graph convolution network ignores hierarchical structure and diverse motion patterns of different body parts. To address these issues, this paper proposes an intra-frame skeleton constraints modeling method and a Grouping based Multi-Scale Graph Convolution Network (GMS-GCN) model. The intra-frame skeleton constraints modeling method leverages self-attention mechanism and a designed adjacency matrix to model the skeleton constraints of distal joints in each independent frame. The GMS-GCN utilizes a grouping strategy to learn the dynamics of various body parts separately. Instead of mapping features in the same feature space, GMS-GCN extracts human body features in different dimensions by up-sample and down-sample GCN layers. Experiment results demonstrate that our method achieves an average MPJPE of 34.7mm for short-term prediction and 93.2mm for long-term prediction and both outperform the state-of-the-art approaches.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122791411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Outline Generation Transformer for Bilingual Scene Text Recognition 双语场景文本识别的轮廓生成变压器

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216107

Jui-Teng Ho, G. Hsu, S. Yanushkevich, M. Gavrilova

We propose the Outline Generation Transformer (OGT) for bilingual Scene Text Recognition (STR). As most STR approaches focus on English, we consider both English and Chinese as Chinese is also a major language, and it is a common scene in many areas/countries where both languages can be seen. The OGT consists of an Outline Generator (OG) and a transformer with a language model embedded. The OG detects the character outline of the text and embeds the outline features into a transformer with the outline-query cross-attention layer to better locate each character and enhance the text recognition performance. The training of OGT has two phases, one is training on synthetic data where the text outline masks are made available, followed by the other training on real data where the text outline masks can only be estimated. The proposed OGT is evaluated on several benchmark datasets and compared with state-of-the-art methods.

我们提出了用于双语场景文本识别(STR)的轮廓生成变压器(OGT)。由于大多数STR方法都以英语为重点，因此我们认为英语和汉语都是主要语言，并且在许多地区/国家都可以看到这两种语言。该OGT由一个Outline Generator (OG)和一个嵌入了语言模型的转换器组成。OG检测文本的字符轮廓，并将轮廓特征嵌入到具有轮廓查询交叉注意层的transformer中，以更好地定位每个字符，提高文本识别性能。OGT的训练分为两个阶段，第一阶段是在合成数据上进行训练，在合成数据上可以得到文本轮廓遮罩;第二阶段是在真实数据上进行训练，在真实数据上只能估计文本轮廓遮罩。在几个基准数据集上对所提出的OGT进行了评估，并与最新的方法进行了比较。

引用次数: 0

Multi-class Semantic Segmentation of Tooth Pathologies and Anatomical Structures on Bitewing and Periapical Radiographs 咬翼和根尖周x线片上牙齿病理和解剖结构的多类别语义分割

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215653

James-Andrew R. Sarmiento, Liushifeng Chen, P. Naval

Detecting dental problems early can prevent invasive procedures and reduce healthcare costs, but traditional exams may not identify all issues, making radiography essential. However, interpreting X-rays can be time-consuming, subjective, prone to error, and requires specialized knowledge. Automated segmentation methods using AI can improve interpretation and aid in diagnosis and patient education. Our U-Net model, trained on 344 bitewing and periapical X-rays, can identify two pathologies and eight anatomical features. It achieves an overall diagnostic performance of 0.794 and 0.787 in terms of Dice score and sensitivity, respectively, 0.493 and 0.405 for dental caries, and 0.471 and 0.44 for root infections. This successful application of deep learning to dental imaging demonstrates the potential of automated segmentation methods for improving accuracy and eﬃciency in diagnosing and treating dental disorders.

早期发现牙齿问题可以防止侵入性手术并降低医疗成本，但传统的检查可能无法识别所有问题，因此放射检查必不可少。然而，解释x射线可能耗时，主观，容易出错，并且需要专业知识。使用人工智能的自动分割方法可以改善解释，并有助于诊断和患者教育。我们的U-Net模型经过344张咬牙和根尖周x光片的训练，可以识别出两种病理和八种解剖特征。在Dice评分和敏感性方面，其总体诊断性能分别为0.794和0.787，对龋齿的诊断性能分别为0.493和0.405，对牙根感染的诊断性能分别为0.471和0.44。深度学习在牙科成像中的成功应用表明，自动分割方法在提高诊断和治疗牙科疾病的准确性和效率方面具有潜力。

引用次数: 0

Multi-Plane Projection for Extending Perspective Image Object Detection Models to 360° Images 多平面投影扩展透视图像对象检测模型到360°图像

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215689

Yasuto Nagase, Y. Babazaki, Katsuhiko Takahashi

Since 360° cameras are still in their diffusion phase, there are no large annotated datasets or models trained on them as there are for perspective cameras. Creating new 360°-specific datasets and training recognition models for each domain and tasks have a significant barrier for many users aiming at practical applications. Therefore, we propose a novel technique to effectively adapt the existing models to 360° images. The 360° images are projected to multiple planes and adapted to the existing model, and the detected results are unified in a spherical coordinate system. In experiments, we evaluated our method on an object detection task and compared it to baselines, which showed an improvement in recognition accuracy of up to 6.7%.

由于360°相机仍处于扩散阶段，因此没有像透视相机那样对它们进行大型注释数据集或模型训练。为每个领域和任务创建新的360°特定数据集和训练识别模型对于许多针对实际应用的用户来说是一个很大的障碍。因此，我们提出了一种新的技术，可以有效地使现有模型适应360°图像。将360°图像投影到多个平面并适应现有模型，将检测结果统一到球坐标系中。在实验中，我们对目标检测任务进行了评估，并将其与基线进行了比较，结果表明该方法的识别准确率提高了6.7%。

引用次数: 0

Safe Landing Zone Detection for UAVs using Image Segmentation and Super Resolution 基于图像分割和超分辨率的无人机安全着陆区检测

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215759

Anagh Benjwal, Prajwal Uday, Aditya Vadduri, Abhishek Pai

Increased usage of UAVs in urban environments has led to the necessity of safe and robust emergency landing zone detection techniques. This paper presents a novel approach for detecting safe landing zones for UAVs using deep learning-based image segmentation. Our approach involves using a custom dataset to train a CNN model. To account for low-resolution input images, our approach incorporates a Super-Resolution model to upscale low-resolution images before feeding them into the segmentation model. The proposed approach achieves robust and accurate detection of safe landing zones, even on low-resolution images. Experimental results demonstrate the effectiveness of our method and show a marked improvement of upto 6.3% in accuracy over state-of-the-art safe landing zone detection methods.

无人机在城市环境中的使用越来越多，因此需要安全可靠的紧急着陆区探测技术。提出了一种基于深度学习的图像分割检测无人机安全着陆区域的新方法。我们的方法包括使用自定义数据集来训练CNN模型。为了解决低分辨率输入图像，我们的方法在将低分辨率图像输入分割模型之前，将超分辨率模型集成到高分辨率图像中。该方法即使在低分辨率图像上也能实现对安全着陆区域的鲁棒性和准确性检测。实验结果证明了我们的方法的有效性，并且与最先进的安全着陆区检测方法相比，准确率显著提高了6.3%。

引用次数: 0

Joint learning of images and videos with a single Vision Transformer 用单个视觉转换器联合学习图像和视频

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215661

Shuki Shimizu, Toru Tamaki

In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.

在本研究中，我们提出了一种使用单一模型对图像和视频进行联合学习的方法。一般来说，图像和视频通常由不同的模型进行训练。本文提出了一种将一批图像作为视觉转换器(IV-ViT)的输入，并通过后期融合得到一组具有时间聚合的视频帧的方法。给出了在两个图像数据集和两个动作识别数据集上的实验结果。

引用次数: 0

Contrastive Knowledge Distillation for Anomaly Detection in Multi-Illumination/Focus Display Images 多照度/聚焦显示图像异常检测的对比知识蒸馏

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10215808

Jihyun Lee, Hangi Park, Yongmin Seo, Taewon Min, Joodong Yun, Jaewon Kim, Tae-Kyun Kim

In this paper, we tackle automatic anomaly detection in multi-illumination and multi-focus display images. The minute defects on the display surface are hard to spot out in RGB images and by a model trained with only normal data. To address this, we propose a novel contrastive learning scheme for knowledge distillation-based anomaly detection. In our framework, Multiresolution Knowledge Distillation (MKD) is adopted as a baseline, which operates by measuring feature similarities between the teacher and student networks. Based on MKD, we propose a novel contrastive learning method, namely Multiresolution Contrastive Distillation (MCD), which does not require positive/negative pairs with an anchor but operates by pulling/pushing the distance between the teacher and student features. Furthermore, we propose the blending module that transforms and aggregate multi-channel information to the three-channel input layer of MCD. Our proposed method significantly outperforms competitive state-of-the-art methods in both AUROC and accuracy metrics on the collected Multi-illumination and Multi-focus display image dataset for Anomaly Detection (MMdAD).

本文主要研究多照度、多聚焦显示图像的自动异常检测问题。在RGB图像和仅使用正常数据训练的模型中，显示表面上的微小缺陷很难被发现。为了解决这个问题，我们提出了一种新的基于知识提取的异常检测对比学习方案。在我们的框架中，采用多分辨率知识蒸馏(MKD)作为基线，它通过测量教师和学生网络之间的特征相似性来运行。基于MKD，我们提出了一种新的对比学习方法，即多分辨率对比蒸馏(Multiresolution contrastive Distillation, MCD)，该方法不需要带锚点的正/负对，而是通过拉/推师生特征之间的距离来实现。在此基础上，提出了将多通道信息转换聚合到MCD的三通道输入层的混合模块。我们提出的方法在收集的用于异常检测(MMdAD)的多照度和多焦点显示图像数据集的AUROC和精度指标上都明显优于竞争最先进的方法。

{"title":"Contrastive Knowledge Distillation for Anomaly Detection in Multi-Illumination/Focus Display Images","authors":"Jihyun Lee, Hangi Park, Yongmin Seo, Taewon Min, Joodong Yun, Jaewon Kim, Tae-Kyun Kim","doi":"10.23919/MVA57639.2023.10215808","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215808","url":null,"abstract":"In this paper, we tackle automatic anomaly detection in multi-illumination and multi-focus display images. The minute defects on the display surface are hard to spot out in RGB images and by a model trained with only normal data. To address this, we propose a novel contrastive learning scheme for knowledge distillation-based anomaly detection. In our framework, Multiresolution Knowledge Distillation (MKD) is adopted as a baseline, which operates by measuring feature similarities between the teacher and student networks. Based on MKD, we propose a novel contrastive learning method, namely Multiresolution Contrastive Distillation (MCD), which does not require positive/negative pairs with an anchor but operates by pulling/pushing the distance between the teacher and student features. Furthermore, we propose the blending module that transforms and aggregate multi-channel information to the three-channel input layer of MCD. Our proposed method significantly outperforms competitive state-of-the-art methods in both AUROC and accuracy metrics on the collected Multi-illumination and Multi-focus display image dataset for Anomaly Detection (MMdAD).","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129295108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automated Identification of Surgical Instruments without Tagging: Implementation in Real Hospital Work Environment 无标签手术器械的自动识别:在真实医院工作环境中的实现

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/MVA57639.2023.10216222

Rui Ishiyama, Per Helge Litzheim Frøiland, Stein-Asle Øvrebotn

This paper presents a new practical system to track and trace individual surgical instruments without marking or tagging. Individual identification is fundamental to traceability, documentation, and optimization for patient safety, compliance, economy, and the environment. However, existing identification systems have yet to be adopted by most hospitals due to the costs and risks of tagging or marking. The "Fingerprint of Things" recognition technology enables tag-less identification; however, scanning automation to save labor costs, which should be devoted to patient care, is also essential for practical use. We developed a new system concept that automates the detection, type recognition, fingerprint scanning, and identification of every instrument in the workspace. A prototype solution has also been implemented and tested in real hospital work. The feasibility of our solution as a commercial product is verified by its order for adoption.

本文提出了一种新的实用系统来跟踪和追踪单个手术器械，而不需要标记或标签。个人识别是可追溯性、文档化和优化患者安全、合规性、经济性和环境的基础。然而，由于标签或标记的成本和风险，现有的识别系统尚未被大多数医院采用。“物指纹”识别技术实现无标签识别;然而，扫描自动化，以节省人工成本，这应该致力于病人护理，也是必不可少的实际应用。我们开发了一个新的系统概念，可以自动检测，类型识别，指纹扫描和识别工作空间中的每个仪器。原型解决方案也已在实际医院工作中实施和测试。我们的解决方案作为商业产品的可行性是通过其采用顺序来验证的。

引用次数: 0

Most Influential Paper over the Decade Award 十年最具影响力论文奖

2023 18th International Conference on Machine Vision and Applications (MVA)

Pub Date : 2023-07-23 DOI: 10.23919/mva57639.2023.10215707

引用次数: 0

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2023 18th International Conference on Machine Vision and Applications (MVA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀