During the coronavirus pandemic, the demand for contactless biometrics technology has promoted the development of masked face recognition. Training a masked face recognition model needs to address two crucial issues: a lack of large-scale realistic masked face datasets and the difficulty of obtaining robust face representations due to the huge difference between complete faces and masked faces. To tackle with the first issue, this paper proposes to train a 3D masked face recognition network with non-masked face images. For the second issue, this paper utilizes the geometric features of 3D face, namely depth, azimuth, and elevation, to represent the face. The inherent advantages of 3D face enhance the stability and practicability of 3D masked face recognition network. In addition, a facial geometry extractor is proposed to highlight discriminative facial geometric features so that the 3D masked face recognition network can take full advantage of the depth, azimuth and elevation information in distinguishing face identities. The experimental results on four public 3D face datasets show that the proposed 3D masked face recognition network improves the accuracy of the masked face recognition, which verifies the feasibility of training the masked face recognition model with non-masked face images.
{"title":"Masked Face Recognition with 3D Facial Geometric Attributes","authors":"Yuan Wang, Zhen Yang, Zhiqiang Zhang, Huaijuan Zang, Qiang Zhu, Shu Zhan","doi":"10.1145/3529446.3529449","DOIUrl":"https://doi.org/10.1145/3529446.3529449","url":null,"abstract":"During the coronavirus pandemic, the demand for contactless biometrics technology has promoted the development of masked face recognition. Training a masked face recognition model needs to address two crucial issues: a lack of large-scale realistic masked face datasets and the difficulty of obtaining robust face representations due to the huge difference between complete faces and masked faces. To tackle with the first issue, this paper proposes to train a 3D masked face recognition network with non-masked face images. For the second issue, this paper utilizes the geometric features of 3D face, namely depth, azimuth, and elevation, to represent the face. The inherent advantages of 3D face enhance the stability and practicability of 3D masked face recognition network. In addition, a facial geometry extractor is proposed to highlight discriminative facial geometric features so that the 3D masked face recognition network can take full advantage of the depth, azimuth and elevation information in distinguishing face identities. The experimental results on four public 3D face datasets show that the proposed 3D masked face recognition network improves the accuracy of the masked face recognition, which verifies the feasibility of training the masked face recognition model with non-masked face images.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116728455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Information hiding technology is a technique to hide meaningful information in the public carrier information. When data elements are becoming more and more important, information hiding technology has a better performance than traditional encryption and decryption technologies such as single table substitution and Virginia cipher. Since it contains redundant information, image is a common carrier for information hiding among many carrier types. As one of the primary means of information hiding technology, image transform-domain information hiding technology has been widely studied and used in academic and industrial fields. Information hiding in the image transform domain improves security and robustness effectively. This paper mainly introduces the concepts, principles, processes, mainstream algorithms and applications of image transform-domain information hiding techniques. A possible general future development of it is also depicted.
{"title":"Development and future of information hiding in image transformation domain: A literature review","authors":"Yue Yang","doi":"10.1145/3529446.3529458","DOIUrl":"https://doi.org/10.1145/3529446.3529458","url":null,"abstract":"Information hiding technology is a technique to hide meaningful information in the public carrier information. When data elements are becoming more and more important, information hiding technology has a better performance than traditional encryption and decryption technologies such as single table substitution and Virginia cipher. Since it contains redundant information, image is a common carrier for information hiding among many carrier types. As one of the primary means of information hiding technology, image transform-domain information hiding technology has been widely studied and used in academic and industrial fields. Information hiding in the image transform domain improves security and robustness effectively. This paper mainly introduces the concepts, principles, processes, mainstream algorithms and applications of image transform-domain information hiding techniques. A possible general future development of it is also depicted.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126439270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In view of the dark channel prior algorithm in dealing with the haze image color distortion in the sky region, atmospheric light value error extraction and scene edge halo effect, a dark channel prior defogging removal method based on transmittance image is proposed in this paper. The input haze image is converted into transmittance image. with guided filtering, the improved MSR algorithm can be used to segment the image into sky region and non-sky region. Minimum filtering and sky transmittance estimation are performed for sky region and non-sky region respectively. The two parts of images obtained by processing are combined, and the transmission is refined by fast guided filtering, and the haze image is defogging removed by combining the atmospheric light value extracted from the sky region to obtain a clear restored image. The experimental results show that the improved minimum filtering algorithm and transmittance estimation method can effectively remove the halo effect at the edge of the depth of field and the color distortion in the sky area, so that the restored image retains more details and has a clearer and natural vision. Compared with the traditional dark channel prior algorithm, the information entropy of the proposed algorithm increases by 12.1% on average, PNSR increases by 6.024% on average, SSIM increases by 15.8%, and MSE decreases by 4.7% on average.
{"title":"An Improved Dark Channel Prior Defogging Algorithm Based on Transmissivity Image Segmentation","authors":"Wenjing Yu, Jinyu He, Jing Yin, Enqi Chen","doi":"10.1145/3529446.3529454","DOIUrl":"https://doi.org/10.1145/3529446.3529454","url":null,"abstract":"In view of the dark channel prior algorithm in dealing with the haze image color distortion in the sky region, atmospheric light value error extraction and scene edge halo effect, a dark channel prior defogging removal method based on transmittance image is proposed in this paper. The input haze image is converted into transmittance image. with guided filtering, the improved MSR algorithm can be used to segment the image into sky region and non-sky region. Minimum filtering and sky transmittance estimation are performed for sky region and non-sky region respectively. The two parts of images obtained by processing are combined, and the transmission is refined by fast guided filtering, and the haze image is defogging removed by combining the atmospheric light value extracted from the sky region to obtain a clear restored image. The experimental results show that the improved minimum filtering algorithm and transmittance estimation method can effectively remove the halo effect at the edge of the depth of field and the color distortion in the sky area, so that the restored image retains more details and has a clearer and natural vision. Compared with the traditional dark channel prior algorithm, the information entropy of the proposed algorithm increases by 12.1% on average, PNSR increases by 6.024% on average, SSIM increases by 15.8%, and MSE decreases by 4.7% on average.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128388263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Firing and its overall conduct of scoring plays a very vital role in the military organizations. Currently, the evaluation of such a system is partially automated that still needs manual works. A fully automatic calculation of target scoring is time demanding and can add a great value in minimizing human error. This paper puts focus on designing and developing the process of scoring a fired target automatically based on image processing techniques in the context of military domain. Along with conventional image processing techniques, this work has used structural similarity indexing algorithms to detect the bullet holes from images. The experiment has been conducted on real-time scenarios considering the military field and shows a promising result to calculate the scores automatically.
{"title":"Image Processing based Scoring System for Small Arms Firing in the Military Domain","authors":"Md Rezoanul Hafiz Chandan, Tanvir Ahamad Naim, Md. Abdur Razzak, N. Sharmin","doi":"10.1145/3529446.3529456","DOIUrl":"https://doi.org/10.1145/3529446.3529456","url":null,"abstract":"Firing and its overall conduct of scoring plays a very vital role in the military organizations. Currently, the evaluation of such a system is partially automated that still needs manual works. A fully automatic calculation of target scoring is time demanding and can add a great value in minimizing human error. This paper puts focus on designing and developing the process of scoring a fired target automatically based on image processing techniques in the context of military domain. Along with conventional image processing techniques, this work has used structural similarity indexing algorithms to detect the bullet holes from images. The experiment has been conducted on real-time scenarios considering the military field and shows a promising result to calculate the scores automatically.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132441700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As a representative architectural style of Huizhou architecture, the Ma Tau Wall has a deep historical and cultural heritage, and plays an important role in the study of Huizhou culture and the innovation and development of modern architecture. This paper will take the Huizhou Ma Tau Wall as an example and combine it with 3D animation to explore how the traditional architecture can be preserved and inherited in digital form with the support of 3D technology. Under the study of fieldwork and literature review, the Ma Tau Wall is digitally restored through Maya and given interactive effects with the help of UE4, and finally it is made into an interactive model, so that the viewers can experience the traditional ancient architecture culture in a novel form, which is more beneficial to the protection and inheritance of ancient architecture.
{"title":"Research on the Application of Three-dimensional Digital Model in the Protection and Inheritance of Traditional Architecture: –Take the example of the Ma Tau Wall of Huizhou architecture","authors":"Aozhi Wei, Yang Cao, Wangqiao Rong","doi":"10.1145/3529446.3529452","DOIUrl":"https://doi.org/10.1145/3529446.3529452","url":null,"abstract":"As a representative architectural style of Huizhou architecture, the Ma Tau Wall has a deep historical and cultural heritage, and plays an important role in the study of Huizhou culture and the innovation and development of modern architecture. This paper will take the Huizhou Ma Tau Wall as an example and combine it with 3D animation to explore how the traditional architecture can be preserved and inherited in digital form with the support of 3D technology. Under the study of fieldwork and literature review, the Ma Tau Wall is digitally restored through Maya and given interactive effects with the help of UE4, and finally it is made into an interactive model, so that the viewers can experience the traditional ancient architecture culture in a novel form, which is more beneficial to the protection and inheritance of ancient architecture.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116931238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the Artificial Intelligence (AI) widely used in air combat simulation system, the decision-making system of fighter has reached a high level of complexity. Traditionally, the pure theoretical analysis and the rule-based system are not enough to represent the cognitive behavior of pilots. In order to properly specify the autonomous decision-making of fighter, hence, we proposed a unified framework which combines the combat simulation and machine learning in this paper. This framework adopts deep reinforcement learning modelling by using the supervised learning and the Deep Q-Network (DQN) methods. As a proof of concept, we built an autonomous decision-making training scenario based on the Weapon Effectiveness Simulation System (WESS). The simulation results show that the intelligent decision-making model based on the proposed framework has better combat effects than the traditional decision-making model based on knowledge engineering.
{"title":"Decision Modeling and Simulation of Fighter Air-to-ground Combat Based on Reinforcement Learning","authors":"Yifei Wu, Yonglin Lei, Zhi Zhu, Yan Wang","doi":"10.1145/3529446.3529463","DOIUrl":"https://doi.org/10.1145/3529446.3529463","url":null,"abstract":"With the Artificial Intelligence (AI) widely used in air combat simulation system, the decision-making system of fighter has reached a high level of complexity. Traditionally, the pure theoretical analysis and the rule-based system are not enough to represent the cognitive behavior of pilots. In order to properly specify the autonomous decision-making of fighter, hence, we proposed a unified framework which combines the combat simulation and machine learning in this paper. This framework adopts deep reinforcement learning modelling by using the supervised learning and the Deep Q-Network (DQN) methods. As a proof of concept, we built an autonomous decision-making training scenario based on the Weapon Effectiveness Simulation System (WESS). The simulation results show that the intelligent decision-making model based on the proposed framework has better combat effects than the traditional decision-making model based on knowledge engineering.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128604197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Event cameras are a kind of radically novel vision sensors. Unlike traditional standard cameras which acquire full images at a fixed rate, event cameras capture brightness changes for each pixel asynchronously. As a result, the output of event camera is a stream of events, which include information of each pixel about the time, location and sign of brightness changes. Event cameras have many advantages over traditional cameras: high temporal resolution (with microsecond resolution), low latency, low power (10mW), high dynamic range (HDR>120 dB). Therefore, event cameras are increasingly used in the field in which many problems cannot be solved due to limitation of frame-based cameras, such as AR/VR, video game, mobile robotics and computer vision. In this paper, we first describe the basic principle and advantageous properties of event camera. Additionally, we introduce wide range of applications of event camera. Specific functions: tracking, high speed and high dynamic range video reconstruction, dynamic obstacle detection and avoidance, motion segmentation. Based on these fundamental applications, much more intelligent and even completely vision-based application are produced, like its combination with fully convolutional network. Finally, we use DeepLab to do semantic segmentation of the scene and apply this result to the corresponding points of the 3D reconstruction of the same scene. We also propose potential solution to solve the ambiguity problem of semantic segmentation in the end.
{"title":"Event Camera Survey and Extension Application to Semantic Segmentation","authors":"Siqi Jia","doi":"10.1145/3529446.3529465","DOIUrl":"https://doi.org/10.1145/3529446.3529465","url":null,"abstract":"Event cameras are a kind of radically novel vision sensors. Unlike traditional standard cameras which acquire full images at a fixed rate, event cameras capture brightness changes for each pixel asynchronously. As a result, the output of event camera is a stream of events, which include information of each pixel about the time, location and sign of brightness changes. Event cameras have many advantages over traditional cameras: high temporal resolution (with microsecond resolution), low latency, low power (10mW), high dynamic range (HDR>120 dB). Therefore, event cameras are increasingly used in the field in which many problems cannot be solved due to limitation of frame-based cameras, such as AR/VR, video game, mobile robotics and computer vision. In this paper, we first describe the basic principle and advantageous properties of event camera. Additionally, we introduce wide range of applications of event camera. Specific functions: tracking, high speed and high dynamic range video reconstruction, dynamic obstacle detection and avoidance, motion segmentation. Based on these fundamental applications, much more intelligent and even completely vision-based application are produced, like its combination with fully convolutional network. Finally, we use DeepLab to do semantic segmentation of the scene and apply this result to the corresponding points of the 3D reconstruction of the same scene. We also propose potential solution to solve the ambiguity problem of semantic segmentation in the end.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124705023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adversarial training is one of the most promising methods to improve the model's robustness, while the expensive training cost keeps a huge problem for this method. Recent researchers have made great effort to improve its performance by reducing the inner adversarial sample construction cost. Their works have alleviated this problem to some extent while the overall performance is still expensive and not interpretable. In this work, we propose AAT (Adaptive Adversarial Training) algorithm utilizing the inherent relationship between the model's robustness and the effects of the adversarial samples to accelerate the overall performance. Our method offers more interpretable robustness improvement while achieving higher efficiency than the state-of-the-art works on standard datasets. We have reduced more than 56% training time than traditional adversarial training on CIFAR10.
{"title":"AAT: An Efficient Adaptive Adversarial Training Algorithm","authors":"Menghua Cao, Dongxia Wang, Yulong Wang","doi":"10.1145/3529446.3529464","DOIUrl":"https://doi.org/10.1145/3529446.3529464","url":null,"abstract":"Adversarial training is one of the most promising methods to improve the model's robustness, while the expensive training cost keeps a huge problem for this method. Recent researchers have made great effort to improve its performance by reducing the inner adversarial sample construction cost. Their works have alleviated this problem to some extent while the overall performance is still expensive and not interpretable. In this work, we propose AAT (Adaptive Adversarial Training) algorithm utilizing the inherent relationship between the model's robustness and the effects of the adversarial samples to accelerate the overall performance. Our method offers more interpretable robustness improvement while achieving higher efficiency than the state-of-the-art works on standard datasets. We have reduced more than 56% training time than traditional adversarial training on CIFAR10.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123477216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Iris images under the surveillance scenario are often low-quality, which makes the iris recognition challenging. Recently, deep learning-based methods are adopted to enhance the quality of iris images and achieve remarkable performance. However, these methods ignore the characteristics of the iris texture, which is important for iris recognition. In order to restore richer texture details, we propose a super-resolution network based on Wavelet with Transformer and Residual Attention Network (WTRAN). Specifically, we treat the low-resolution images as the low-frequency wavelet coefficients after wavelet decomposition and predict the corresponding high-frequency wavelet coefficients sequence. In order to extract detailed local features, we adopt both channel and spatial attention, and propose a Residual Dense Attention Block (RDAB). Furthermore, we propose a Convolutional Transformer Attention Module (CTAM) to integrate transformer and CNN to extract both the global topology and local texture details. In addition to constraining the quality of image generation, effective identity preserving constraints are also used to ensure the consistency of the super-resolution images in the high-level semantic space. Extensive experiments show that the proposed method has achieved competitive iris image super resolution results compared with the most advanced super-resolution method.
{"title":"Hierarchical Iris Image Super Resolution based on Wavelet Transform","authors":"Yufeng Xia, Peipei Li, Jia Wang, Zhili Zhang, Duanling Li, Zhaofeng He","doi":"10.1145/3529446.3529453","DOIUrl":"https://doi.org/10.1145/3529446.3529453","url":null,"abstract":"Iris images under the surveillance scenario are often low-quality, which makes the iris recognition challenging. Recently, deep learning-based methods are adopted to enhance the quality of iris images and achieve remarkable performance. However, these methods ignore the characteristics of the iris texture, which is important for iris recognition. In order to restore richer texture details, we propose a super-resolution network based on Wavelet with Transformer and Residual Attention Network (WTRAN). Specifically, we treat the low-resolution images as the low-frequency wavelet coefficients after wavelet decomposition and predict the corresponding high-frequency wavelet coefficients sequence. In order to extract detailed local features, we adopt both channel and spatial attention, and propose a Residual Dense Attention Block (RDAB). Furthermore, we propose a Convolutional Transformer Attention Module (CTAM) to integrate transformer and CNN to extract both the global topology and local texture details. In addition to constraining the quality of image generation, effective identity preserving constraints are also used to ensure the consistency of the super-resolution images in the high-level semantic space. Extensive experiments show that the proposed method has achieved competitive iris image super resolution results compared with the most advanced super-resolution method.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129712765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akshay Krishna, Patrick Smith, M. Nicolescu, S. Hayes
This paper focuses on the detection and tracking of a golf ball from video input, as a part of a study aiming to evaluate the influence of instructional content delivered through Acceptance and Commitment Training (ACT) on sports performance. As part of this experiment, the participants were putting on a small green region, and they recorded themselves performing the swing. Using our generated dataset, we propose an automated solution that involves only vision processing to detect and track the golf ball, and to estimate metrics such as velocity, shot angle, and whether the golf ball made the hole. We use color detection, background subtraction, contour detection, homography transformations, and other techniques to detect and track the golf ball. Our approach was extensively tested on a large dataset with annotations, with good results.
{"title":"Vision-based Assessment of Instructional Content on Golf Performance","authors":"Akshay Krishna, Patrick Smith, M. Nicolescu, S. Hayes","doi":"10.1145/3529446.3529459","DOIUrl":"https://doi.org/10.1145/3529446.3529459","url":null,"abstract":"This paper focuses on the detection and tracking of a golf ball from video input, as a part of a study aiming to evaluate the influence of instructional content delivered through Acceptance and Commitment Training (ACT) on sports performance. As part of this experiment, the participants were putting on a small green region, and they recorded themselves performing the swing. Using our generated dataset, we propose an automated solution that involves only vision processing to detect and track the golf ball, and to estimate metrics such as velocity, shot angle, and whether the golf ball made the hole. We use color detection, background subtraction, contour detection, homography transformations, and other techniques to detect and track the golf ball. Our approach was extensively tested on a large dataset with annotations, with good results.","PeriodicalId":151062,"journal":{"name":"Proceedings of the 4th International Conference on Image Processing and Machine Vision","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127193148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}