Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10216102
Juki Tanimoto, Haruya Kyutoku, Keisuke Doman, Y. Mekada
Deep learning object detection models using visible-light cameras are easily affected by weather and lighting conditions, whereas those using far-infrared cameras are less affected by such conditions. This paper proposes a domain adaptation method using pseudo labels from a visible-light camera toward an accurate object detection from far-infrared images. Our method projects visible light-domain detection results onto far-infrared images, and uses them as pseudo labels for training a far-infrared detection model. We confirmed the effectiveness of our method through experiments.
{"title":"Domain Adaptation from Visible-Light to FIR with Reliable Pseudo Labels","authors":"Juki Tanimoto, Haruya Kyutoku, Keisuke Doman, Y. Mekada","doi":"10.23919/MVA57639.2023.10216102","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216102","url":null,"abstract":"Deep learning object detection models using visible-light cameras are easily affected by weather and lighting conditions, whereas those using far-infrared cameras are less affected by such conditions. This paper proposes a domain adaptation method using pseudo labels from a visible-light camera toward an accurate object detection from far-infrared images. Our method projects visible light-domain detection results onto far-infrared images, and uses them as pseudo labels for training a far-infrared detection model. We confirmed the effectiveness of our method through experiments.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115276711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10216076
Zhihan Zhuang, Yuan Li, Songlin Du, T. Ikenaga
Attention-based feed-forward networks and graph convolution networks have recently shown great promise in 3D skeleton-based human motion prediction for their good performance in learning temporal and spatial relations. However, previous methods have two critical issues: first, spatial dependencies for distal joints in each independent frame are hard to learn; second, the basic architecture of graph convolution network ignores hierarchical structure and diverse motion patterns of different body parts. To address these issues, this paper proposes an intra-frame skeleton constraints modeling method and a Grouping based Multi-Scale Graph Convolution Network (GMS-GCN) model. The intra-frame skeleton constraints modeling method leverages self-attention mechanism and a designed adjacency matrix to model the skeleton constraints of distal joints in each independent frame. The GMS-GCN utilizes a grouping strategy to learn the dynamics of various body parts separately. Instead of mapping features in the same feature space, GMS-GCN extracts human body features in different dimensions by up-sample and down-sample GCN layers. Experiment results demonstrate that our method achieves an average MPJPE of 34.7mm for short-term prediction and 93.2mm for long-term prediction and both outperform the state-of-the-art approaches.
{"title":"Intra-frame Skeleton Constraints Modeling and Grouping Strategy Based Multi-Scale Graph Convolution Network for 3D Human Motion Prediction","authors":"Zhihan Zhuang, Yuan Li, Songlin Du, T. Ikenaga","doi":"10.23919/MVA57639.2023.10216076","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216076","url":null,"abstract":"Attention-based feed-forward networks and graph convolution networks have recently shown great promise in 3D skeleton-based human motion prediction for their good performance in learning temporal and spatial relations. However, previous methods have two critical issues: first, spatial dependencies for distal joints in each independent frame are hard to learn; second, the basic architecture of graph convolution network ignores hierarchical structure and diverse motion patterns of different body parts. To address these issues, this paper proposes an intra-frame skeleton constraints modeling method and a Grouping based Multi-Scale Graph Convolution Network (GMS-GCN) model. The intra-frame skeleton constraints modeling method leverages self-attention mechanism and a designed adjacency matrix to model the skeleton constraints of distal joints in each independent frame. The GMS-GCN utilizes a grouping strategy to learn the dynamics of various body parts separately. Instead of mapping features in the same feature space, GMS-GCN extracts human body features in different dimensions by up-sample and down-sample GCN layers. Experiment results demonstrate that our method achieves an average MPJPE of 34.7mm for short-term prediction and 93.2mm for long-term prediction and both outperform the state-of-the-art approaches.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"353 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122791411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10216107
Jui-Teng Ho, G. Hsu, S. Yanushkevich, M. Gavrilova
We propose the Outline Generation Transformer (OGT) for bilingual Scene Text Recognition (STR). As most STR approaches focus on English, we consider both English and Chinese as Chinese is also a major language, and it is a common scene in many areas/countries where both languages can be seen. The OGT consists of an Outline Generator (OG) and a transformer with a language model embedded. The OG detects the character outline of the text and embeds the outline features into a transformer with the outline-query cross-attention layer to better locate each character and enhance the text recognition performance. The training of OGT has two phases, one is training on synthetic data where the text outline masks are made available, followed by the other training on real data where the text outline masks can only be estimated. The proposed OGT is evaluated on several benchmark datasets and compared with state-of-the-art methods.
{"title":"Outline Generation Transformer for Bilingual Scene Text Recognition","authors":"Jui-Teng Ho, G. Hsu, S. Yanushkevich, M. Gavrilova","doi":"10.23919/MVA57639.2023.10216107","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216107","url":null,"abstract":"We propose the Outline Generation Transformer (OGT) for bilingual Scene Text Recognition (STR). As most STR approaches focus on English, we consider both English and Chinese as Chinese is also a major language, and it is a common scene in many areas/countries where both languages can be seen. The OGT consists of an Outline Generator (OG) and a transformer with a language model embedded. The OG detects the character outline of the text and embeds the outline features into a transformer with the outline-query cross-attention layer to better locate each character and enhance the text recognition performance. The training of OGT has two phases, one is training on synthetic data where the text outline masks are made available, followed by the other training on real data where the text outline masks can only be estimated. The proposed OGT is evaluated on several benchmark datasets and compared with state-of-the-art methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126492214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10215653
James-Andrew R. Sarmiento, Liushifeng Chen, P. Naval
Detecting dental problems early can prevent invasive procedures and reduce healthcare costs, but traditional exams may not identify all issues, making radiography essential. However, interpreting X-rays can be time-consuming, subjective, prone to error, and requires specialized knowledge. Automated segmentation methods using AI can improve interpretation and aid in diagnosis and patient education. Our U-Net model, trained on 344 bitewing and periapical X-rays, can identify two pathologies and eight anatomical features. It achieves an overall diagnostic performance of 0.794 and 0.787 in terms of Dice score and sensitivity, respectively, 0.493 and 0.405 for dental caries, and 0.471 and 0.44 for root infections. This successful application of deep learning to dental imaging demonstrates the potential of automated segmentation methods for improving accuracy and efficiency in diagnosing and treating dental disorders.
{"title":"Multi-class Semantic Segmentation of Tooth Pathologies and Anatomical Structures on Bitewing and Periapical Radiographs","authors":"James-Andrew R. Sarmiento, Liushifeng Chen, P. Naval","doi":"10.23919/MVA57639.2023.10215653","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215653","url":null,"abstract":"Detecting dental problems early can prevent invasive procedures and reduce healthcare costs, but traditional exams may not identify all issues, making radiography essential. However, interpreting X-rays can be time-consuming, subjective, prone to error, and requires specialized knowledge. Automated segmentation methods using AI can improve interpretation and aid in diagnosis and patient education. Our U-Net model, trained on 344 bitewing and periapical X-rays, can identify two pathologies and eight anatomical features. It achieves an overall diagnostic performance of 0.794 and 0.787 in terms of Dice score and sensitivity, respectively, 0.493 and 0.405 for dental caries, and 0.471 and 0.44 for root infections. This successful application of deep learning to dental imaging demonstrates the potential of automated segmentation methods for improving accuracy and efficiency in diagnosing and treating dental disorders.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127414661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10215689
Yasuto Nagase, Y. Babazaki, Katsuhiko Takahashi
Since 360° cameras are still in their diffusion phase, there are no large annotated datasets or models trained on them as there are for perspective cameras. Creating new 360°-specific datasets and training recognition models for each domain and tasks have a significant barrier for many users aiming at practical applications. Therefore, we propose a novel technique to effectively adapt the existing models to 360° images. The 360° images are projected to multiple planes and adapted to the existing model, and the detected results are unified in a spherical coordinate system. In experiments, we evaluated our method on an object detection task and compared it to baselines, which showed an improvement in recognition accuracy of up to 6.7%.
{"title":"Multi-Plane Projection for Extending Perspective Image Object Detection Models to 360° Images","authors":"Yasuto Nagase, Y. Babazaki, Katsuhiko Takahashi","doi":"10.23919/MVA57639.2023.10215689","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215689","url":null,"abstract":"Since 360° cameras are still in their diffusion phase, there are no large annotated datasets or models trained on them as there are for perspective cameras. Creating new 360°-specific datasets and training recognition models for each domain and tasks have a significant barrier for many users aiming at practical applications. Therefore, we propose a novel technique to effectively adapt the existing models to 360° images. The 360° images are projected to multiple planes and adapted to the existing model, and the detected results are unified in a spherical coordinate system. In experiments, we evaluated our method on an object detection task and compared it to baselines, which showed an improvement in recognition accuracy of up to 6.7%.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116857496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10215759
Anagh Benjwal, Prajwal Uday, Aditya Vadduri, Abhishek Pai
Increased usage of UAVs in urban environments has led to the necessity of safe and robust emergency landing zone detection techniques. This paper presents a novel approach for detecting safe landing zones for UAVs using deep learning-based image segmentation. Our approach involves using a custom dataset to train a CNN model. To account for low-resolution input images, our approach incorporates a Super-Resolution model to upscale low-resolution images before feeding them into the segmentation model. The proposed approach achieves robust and accurate detection of safe landing zones, even on low-resolution images. Experimental results demonstrate the effectiveness of our method and show a marked improvement of upto 6.3% in accuracy over state-of-the-art safe landing zone detection methods.
{"title":"Safe Landing Zone Detection for UAVs using Image Segmentation and Super Resolution","authors":"Anagh Benjwal, Prajwal Uday, Aditya Vadduri, Abhishek Pai","doi":"10.23919/MVA57639.2023.10215759","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215759","url":null,"abstract":"Increased usage of UAVs in urban environments has led to the necessity of safe and robust emergency landing zone detection techniques. This paper presents a novel approach for detecting safe landing zones for UAVs using deep learning-based image segmentation. Our approach involves using a custom dataset to train a CNN model. To account for low-resolution input images, our approach incorporates a Super-Resolution model to upscale low-resolution images before feeding them into the segmentation model. The proposed approach achieves robust and accurate detection of safe landing zones, even on low-resolution images. Experimental results demonstrate the effectiveness of our method and show a marked improvement of upto 6.3% in accuracy over state-of-the-art safe landing zone detection methods.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128689719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10215661
Shuki Shimizu, Toru Tamaki
In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.
{"title":"Joint learning of images and videos with a single Vision Transformer","authors":"Shuki Shimizu, Toru Tamaki","doi":"10.23919/MVA57639.2023.10215661","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215661","url":null,"abstract":"In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127324476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10215808
Jihyun Lee, Hangi Park, Yongmin Seo, Taewon Min, Joodong Yun, Jaewon Kim, Tae-Kyun Kim
In this paper, we tackle automatic anomaly detection in multi-illumination and multi-focus display images. The minute defects on the display surface are hard to spot out in RGB images and by a model trained with only normal data. To address this, we propose a novel contrastive learning scheme for knowledge distillation-based anomaly detection. In our framework, Multiresolution Knowledge Distillation (MKD) is adopted as a baseline, which operates by measuring feature similarities between the teacher and student networks. Based on MKD, we propose a novel contrastive learning method, namely Multiresolution Contrastive Distillation (MCD), which does not require positive/negative pairs with an anchor but operates by pulling/pushing the distance between the teacher and student features. Furthermore, we propose the blending module that transforms and aggregate multi-channel information to the three-channel input layer of MCD. Our proposed method significantly outperforms competitive state-of-the-art methods in both AUROC and accuracy metrics on the collected Multi-illumination and Multi-focus display image dataset for Anomaly Detection (MMdAD).
{"title":"Contrastive Knowledge Distillation for Anomaly Detection in Multi-Illumination/Focus Display Images","authors":"Jihyun Lee, Hangi Park, Yongmin Seo, Taewon Min, Joodong Yun, Jaewon Kim, Tae-Kyun Kim","doi":"10.23919/MVA57639.2023.10215808","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10215808","url":null,"abstract":"In this paper, we tackle automatic anomaly detection in multi-illumination and multi-focus display images. The minute defects on the display surface are hard to spot out in RGB images and by a model trained with only normal data. To address this, we propose a novel contrastive learning scheme for knowledge distillation-based anomaly detection. In our framework, Multiresolution Knowledge Distillation (MKD) is adopted as a baseline, which operates by measuring feature similarities between the teacher and student networks. Based on MKD, we propose a novel contrastive learning method, namely Multiresolution Contrastive Distillation (MCD), which does not require positive/negative pairs with an anchor but operates by pulling/pushing the distance between the teacher and student features. Furthermore, we propose the blending module that transforms and aggregate multi-channel information to the three-channel input layer of MCD. Our proposed method significantly outperforms competitive state-of-the-art methods in both AUROC and accuracy metrics on the collected Multi-illumination and Multi-focus display image dataset for Anomaly Detection (MMdAD).","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129295108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/MVA57639.2023.10216222
Rui Ishiyama, Per Helge Litzheim Frøiland, Stein-Asle Øvrebotn
This paper presents a new practical system to track and trace individual surgical instruments without marking or tagging. Individual identification is fundamental to traceability, documentation, and optimization for patient safety, compliance, economy, and the environment. However, existing identification systems have yet to be adopted by most hospitals due to the costs and risks of tagging or marking. The "Fingerprint of Things" recognition technology enables tag-less identification; however, scanning automation to save labor costs, which should be devoted to patient care, is also essential for practical use. We developed a new system concept that automates the detection, type recognition, fingerprint scanning, and identification of every instrument in the workspace. A prototype solution has also been implemented and tested in real hospital work. The feasibility of our solution as a commercial product is verified by its order for adoption.
{"title":"Automated Identification of Surgical Instruments without Tagging: Implementation in Real Hospital Work Environment","authors":"Rui Ishiyama, Per Helge Litzheim Frøiland, Stein-Asle Øvrebotn","doi":"10.23919/MVA57639.2023.10216222","DOIUrl":"https://doi.org/10.23919/MVA57639.2023.10216222","url":null,"abstract":"This paper presents a new practical system to track and trace individual surgical instruments without marking or tagging. Individual identification is fundamental to traceability, documentation, and optimization for patient safety, compliance, economy, and the environment. However, existing identification systems have yet to be adopted by most hospitals due to the costs and risks of tagging or marking. The \"Fingerprint of Things\" recognition technology enables tag-less identification; however, scanning automation to save labor costs, which should be devoted to patient care, is also essential for practical use. We developed a new system concept that automates the detection, type recognition, fingerprint scanning, and identification of every instrument in the workspace. A prototype solution has also been implemented and tested in real hospital work. The feasibility of our solution as a commercial product is verified by its order for adoption.","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131303166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-23DOI: 10.23919/mva57639.2023.10215707
{"title":"Most Influential Paper over the Decade Award","authors":"","doi":"10.23919/mva57639.2023.10215707","DOIUrl":"https://doi.org/10.23919/mva57639.2023.10215707","url":null,"abstract":"","PeriodicalId":338734,"journal":{"name":"2023 18th International Conference on Machine Vision and Applications (MVA)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125620476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}