During the wind-wave tank model test of floating offshore wind turbine (FOWT), acquiring the model attitude angles is one of the key objectives. Due to the environment of the wind-wave tank, there are inevitable shortcomings of traditional contact sensors, such as difficult installation, increased model weight, limited sampling frequency, and incomplete data acquisition. In this paper, a vision-based technology is presented, which mainly consists of (1) construct a high-speed video acquisition network; (2) identify, track and match the circular landmarks on the sequence image, and obtain three-dimensional coordinates of the spatial points using the spatial point reconstruction algorithm; (3) propose a method for computing model attitude angles in sequence images to obtain the pose of the model in each frame; (4) deduce the theoretical accuracy of attitude angle from the obtained accuracy of landmarks, and propose a way to improve the accuracy of landmarks. The model attitude angle acquisition method is verified by experimental measurements of a wind-wave tank model with a 1:50 Spar-type floating wind turbine model, the results show that the point measurement accuracy can reach sub-millimeter level, and the difference between the attitude angle measurement value and the theoretical value is less than 2.34′, which meets the testing requirements. The method proposed can be further extended to the attitude angle measurement in various wind tunnel pool tests.
{"title":"High-speed Video Measurement Method of Attitude Angle Applied to Floating Offshore Wind Turbine Model Wind-Wave Tank Test","authors":"Qing Zhong, Jiahao Liu, Peng Chen","doi":"10.1145/3581807.3581859","DOIUrl":"https://doi.org/10.1145/3581807.3581859","url":null,"abstract":"During the wind-wave tank model test of floating offshore wind turbine (FOWT), acquiring the model attitude angles is one of the key objectives. Due to the environment of the wind-wave tank, there are inevitable shortcomings of traditional contact sensors, such as difficult installation, increased model weight, limited sampling frequency, and incomplete data acquisition. In this paper, a vision-based technology is presented, which mainly consists of (1) construct a high-speed video acquisition network; (2) identify, track and match the circular landmarks on the sequence image, and obtain three-dimensional coordinates of the spatial points using the spatial point reconstruction algorithm; (3) propose a method for computing model attitude angles in sequence images to obtain the pose of the model in each frame; (4) deduce the theoretical accuracy of attitude angle from the obtained accuracy of landmarks, and propose a way to improve the accuracy of landmarks. The model attitude angle acquisition method is verified by experimental measurements of a wind-wave tank model with a 1:50 Spar-type floating wind turbine model, the results show that the point measurement accuracy can reach sub-millimeter level, and the difference between the attitude angle measurement value and the theoretical value is less than 2.34′, which meets the testing requirements. The method proposed can be further extended to the attitude angle measurement in various wind tunnel pool tests.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"537 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127058782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The use of artificial intelligence algorithm to determine whether the lesion has cerebral aneurysm, especially small aneurysms, is still not completely solved. In this paper, the Faster R-CNN network was used as the localization network, and the model was trained by adjusting the network parameters, and the appropriate feature extraction network and classification network were selected to finally solve the localization problem of small aneurysms. Compared with most 3D methods, this method had the characteristics of shorter training cycle and faster image recognition. The experimental results show that the algorithm has a high accuracy in discriminating whether the lesion has cerebral aneurysm, but the false positive phenomenon may occur in the identification of single image localization. Finally, the paper discusses the experimental results and puts forward some conjecture ideas to solve the problem.
{"title":"An Aneurysm Localization Algorithm Based on Faster R-CNN Network for Cerebral Small Vessels","authors":"Yuan Meng, Xinfeng Zhang, Xiaomin Liu, Xiangshen Li, Tianyu Zhu, Xiaoxia Chang, Jinhang Chen, Xiangyu Chen","doi":"10.1145/3581807.3581889","DOIUrl":"https://doi.org/10.1145/3581807.3581889","url":null,"abstract":"The use of artificial intelligence algorithm to determine whether the lesion has cerebral aneurysm, especially small aneurysms, is still not completely solved. In this paper, the Faster R-CNN network was used as the localization network, and the model was trained by adjusting the network parameters, and the appropriate feature extraction network and classification network were selected to finally solve the localization problem of small aneurysms. Compared with most 3D methods, this method had the characteristics of shorter training cycle and faster image recognition. The experimental results show that the algorithm has a high accuracy in discriminating whether the lesion has cerebral aneurysm, but the false positive phenomenon may occur in the identification of single image localization. Finally, the paper discusses the experimental results and puts forward some conjecture ideas to solve the problem.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133545298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, object detection has been widely used in various fields such as face detection, remote sensing image detection and pedestrian detection. Due to the complex environment in the actual scene, we need to fully obtain the feature information in the image to improve the accuracy of object detection. This paper proposes an object detection algorithm based on coordinate attention and contextual feature enhancement. We design a multi-scale attention feature pyramid network, which first uses multi-branch atrous convolution to capture multi-scale context information, and then fuses the coordinate attention mechanism to embed location information into channel attention, and finally uses a bidirectional feature pyramid structure to effectively fuse high-level features and low-level features. We also adopt the GIoU loss function to further improve the accuracy of object detection. The experimental results show that the proposed method has certain advantages compared with other detection algorithms in the PASCAL VOC datasets.
{"title":"Object Detection Algorithm Based on Coordinate Attention and Context Feature Enhancement","authors":"Lingzhi Liu, Baohua Qiang, Yuan-yuan Wang, Xianyi Yang, Jubo Tian, S. Zhang","doi":"10.1145/3581807.3581821","DOIUrl":"https://doi.org/10.1145/3581807.3581821","url":null,"abstract":"In recent years, object detection has been widely used in various fields such as face detection, remote sensing image detection and pedestrian detection. Due to the complex environment in the actual scene, we need to fully obtain the feature information in the image to improve the accuracy of object detection. This paper proposes an object detection algorithm based on coordinate attention and contextual feature enhancement. We design a multi-scale attention feature pyramid network, which first uses multi-branch atrous convolution to capture multi-scale context information, and then fuses the coordinate attention mechanism to embed location information into channel attention, and finally uses a bidirectional feature pyramid structure to effectively fuse high-level features and low-level features. We also adopt the GIoU loss function to further improve the accuracy of object detection. The experimental results show that the proposed method has certain advantages compared with other detection algorithms in the PASCAL VOC datasets.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124244885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Direction-of-arrival (DOA) estimation method based on single-source zone (SSZ) detection, using the sparsity of speech signal, which transforms the multiple sources localization into single source localization. However, there are many time-frequency (TF) points whose direction information are far away from the true DOA in the detected SSZ, these points may disturb the localization performance. Aiming this issue, a DOA estimation of multiple sources based on the angle distribution of TF points is proposed in this paper. Firstly, the SSZs are detected through the recorded signal of sound field microphone. Secondly, the optimized single-source zone (OSSZ) can be acquired by removing the outliers based on the angle distribution of the TF points in the detected SSZ. Thirdly, DOA histogram can be obtained using the TF points in OSSZ, then the envelop of the DOA histogram is gained by kernel density estimation. Finally, peak search is adopted to obtain the DOA estimates and number of sources. The experiment results show that the proposed method can achieve better localization performance than SSZ-based method under medium and high reverberation conditions.
{"title":"DOA Estimation of Multiple Sources based on the Angle Distribution of Time-frequency Points in Single-source Zone","authors":"Liang Tao, Mao-shen Jia, Lu Li","doi":"10.1145/3581807.3581861","DOIUrl":"https://doi.org/10.1145/3581807.3581861","url":null,"abstract":"Direction-of-arrival (DOA) estimation method based on single-source zone (SSZ) detection, using the sparsity of speech signal, which transforms the multiple sources localization into single source localization. However, there are many time-frequency (TF) points whose direction information are far away from the true DOA in the detected SSZ, these points may disturb the localization performance. Aiming this issue, a DOA estimation of multiple sources based on the angle distribution of TF points is proposed in this paper. Firstly, the SSZs are detected through the recorded signal of sound field microphone. Secondly, the optimized single-source zone (OSSZ) can be acquired by removing the outliers based on the angle distribution of the TF points in the detected SSZ. Thirdly, DOA histogram can be obtained using the TF points in OSSZ, then the envelop of the DOA histogram is gained by kernel density estimation. Finally, peak search is adopted to obtain the DOA estimates and number of sources. The experiment results show that the proposed method can achieve better localization performance than SSZ-based method under medium and high reverberation conditions.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"207 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123018714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the field of brain-computer interface, steady-state visual evoked potential (SSVEP) is widely used because of its stability. Although high-intensity stimulus has good accuracy, it can cause severe visual fatigue and even induce epilepsy in subjects. As well as paying attention to its accuracy, personnel should also pay attention to the comfort of the subject. In this paper, combined with the knowledge of spatial psychology, the overlooking map is proposed as the stimulus source to induce the SSVEP signal. 24 subjects participated in the comparison experiment between the overlooking map and the black and white stimuli, and the EEG signal was processed and analyzed online in real time through the CCA algorithm. Afterwards, the two stimuli were scored in terms of personal preference, comfort, and flickering sensation. The experimental results show that the performance of the overlooking map stimulus source is superior to that of the black and white stimulus, and it is more suitable to induce the SSVEP signal of the subjects. As an important aspect of SSVEP-based application and a necessary factor for commercial promotion, user experience provides a good theoretical and experimental research basis for it.
{"title":"The Design Method of SSVEP Stimulus Source based on Overlooking Map","authors":"Dong Wen, Mengmeng Jiang, Wenlong Jiao, Xianglong Wan, Xifa Lan, Yanhong Zhou","doi":"10.1145/3581807.3581874","DOIUrl":"https://doi.org/10.1145/3581807.3581874","url":null,"abstract":"In the field of brain-computer interface, steady-state visual evoked potential (SSVEP) is widely used because of its stability. Although high-intensity stimulus has good accuracy, it can cause severe visual fatigue and even induce epilepsy in subjects. As well as paying attention to its accuracy, personnel should also pay attention to the comfort of the subject. In this paper, combined with the knowledge of spatial psychology, the overlooking map is proposed as the stimulus source to induce the SSVEP signal. 24 subjects participated in the comparison experiment between the overlooking map and the black and white stimuli, and the EEG signal was processed and analyzed online in real time through the CCA algorithm. Afterwards, the two stimuli were scored in terms of personal preference, comfort, and flickering sensation. The experimental results show that the performance of the overlooking map stimulus source is superior to that of the black and white stimulus, and it is more suitable to induce the SSVEP signal of the subjects. As an important aspect of SSVEP-based application and a necessary factor for commercial promotion, user experience provides a good theoretical and experimental research basis for it.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130550058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhanguo Dong, Ming Ke, Jiarong Wang, Lubin Wang, Gang Wang
Most of the current advanced object recognition deep convolutional neural networks (DCNNs) are vulnerable to attacks of adversarial perturbations. In comparison, the primate vision system can effectively suppress the inference of adversarial perturbations. Many studies have shown that the fusion of biological vision mechanisms and DCNNs is a promising way to improve model robustness. The primary visual cortex (V1) is a key brain region for visual information processing in the biological brain, containing various simple cell orientation selection receptive fields, which can specifically respond to low-level features. Therefore, we have developed an object classification DCNN model inspired by V1 orientation selection receptive fields. The V1-inspired model introduces V1 orientation selection receptive fields into DCNN through anisotropic Gaussian kernels, which can enrich the receptive fields of DCNN. In the white-box adversarial attack experiments on CIFAR-100 and Mini-ImageNet, the adversarial robustness of our model is 21.74% and 20.01% higher than that of the baseline DCNN, respectively. Compared with the SOAT VOneNet, the adversarial robustness of our model improves by 2.88% and 8.56%, respectively. It is worth pointing out that our method will not increase the parameter quantity of the baseline model, while the extra training cost is very little.
{"title":"Robust Deep Convolutional Neural Network inspired by the Primary Visual Cortex","authors":"Zhanguo Dong, Ming Ke, Jiarong Wang, Lubin Wang, Gang Wang","doi":"10.1145/3581807.3581893","DOIUrl":"https://doi.org/10.1145/3581807.3581893","url":null,"abstract":"Most of the current advanced object recognition deep convolutional neural networks (DCNNs) are vulnerable to attacks of adversarial perturbations. In comparison, the primate vision system can effectively suppress the inference of adversarial perturbations. Many studies have shown that the fusion of biological vision mechanisms and DCNNs is a promising way to improve model robustness. The primary visual cortex (V1) is a key brain region for visual information processing in the biological brain, containing various simple cell orientation selection receptive fields, which can specifically respond to low-level features. Therefore, we have developed an object classification DCNN model inspired by V1 orientation selection receptive fields. The V1-inspired model introduces V1 orientation selection receptive fields into DCNN through anisotropic Gaussian kernels, which can enrich the receptive fields of DCNN. In the white-box adversarial attack experiments on CIFAR-100 and Mini-ImageNet, the adversarial robustness of our model is 21.74% and 20.01% higher than that of the baseline DCNN, respectively. Compared with the SOAT VOneNet, the adversarial robustness of our model improves by 2.88% and 8.56%, respectively. It is worth pointing out that our method will not increase the parameter quantity of the baseline model, while the extra training cost is very little.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127371863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jie Huang, Xuanheng Rao, Weichuan Zhang, Jingze Song, Xiao Sun
Remote photoplethysmography (rPPG) has the ability to make use of image frame sequences including human faces collected by cameras for measuring heart rate (HR) without any contact. This method generates a time series signal based on the RGB spatial average of the selected region of interest (ROI) to estimate physiological signals such as HR. It is worth to note that the motion artifact produced by the subject’s face shaking is equivalent to adding considerable noise to the signal which will greatly affect the accuracy of the measurement. In this paper, a novel anti-interference multi-ROI analysis (AMA) approach is proposed which effectively utilizes the local information with multiple ROIs, the Euler angle information of the subject’s head, and the interpolation resampling technique of the video for suppressing the influence of face shaking on the accuracy of non-contact heart rate measurement. The proposed method is evaluated on the UBFC-RPPG and PURE datasets, and the experimental results demonstrate that the proposed methods are superior to many state-of-the-art methods.
{"title":"Heart Rate Detection Using Motion Compensation with Multiple ROIs","authors":"Jie Huang, Xuanheng Rao, Weichuan Zhang, Jingze Song, Xiao Sun","doi":"10.1145/3581807.3581870","DOIUrl":"https://doi.org/10.1145/3581807.3581870","url":null,"abstract":"Remote photoplethysmography (rPPG) has the ability to make use of image frame sequences including human faces collected by cameras for measuring heart rate (HR) without any contact. This method generates a time series signal based on the RGB spatial average of the selected region of interest (ROI) to estimate physiological signals such as HR. It is worth to note that the motion artifact produced by the subject’s face shaking is equivalent to adding considerable noise to the signal which will greatly affect the accuracy of the measurement. In this paper, a novel anti-interference multi-ROI analysis (AMA) approach is proposed which effectively utilizes the local information with multiple ROIs, the Euler angle information of the subject’s head, and the interpolation resampling technique of the video for suppressing the influence of face shaking on the accuracy of non-contact heart rate measurement. The proposed method is evaluated on the UBFC-RPPG and PURE datasets, and the experimental results demonstrate that the proposed methods are superior to many state-of-the-art methods.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126886212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a task we name Portrait Interpretation and construct a dataset named Portrait250K for it. Current researches on portraits such as human attribute recognition and person re-identification have achieved many successes, but generally, they: 1) may lack mining the interrelationship between various tasks and the possible benefits it may bring; 2) design deep models specifically for each task, which is inefficient; 3) may be unable to cope with the needs of a unified model and comprehensive perception in actual scenes. In this paper, the proposed portrait interpretation recognizes the perception of humans from a new systematic perspective. We divide the perception of portraits into three aspects, namely Appearance, Posture, and Emotion, and design corresponding sub-tasks for each aspect. Based on the framework of multi-task learning, portrait interpretation requires a comprehensive description of static attributes and dynamic states of portraits. To invigorate research on this new task, we construct a new dataset that contains 250,000 images labeled with identity, gender, age, physique, height, expression, and posture of the whole body and arms. Our dataset is collected from 51 movies, hence covering extensive diversity. Furthermore, we focus on representation learning for portrait interpretation and propose a baseline that reflects our systematic perspective. We also propose an appropriate metric for this task. Our experimental results demonstrate that combining the tasks related to portrait interpretation can yield benefits. Code and dataset will be made public.
{"title":"Portrait Interpretation and a Benchmark","authors":"Yixuan Fan, Zhaopeng Dou, Yali Li, Shengjin Wang","doi":"10.1145/3581807.3581838","DOIUrl":"https://doi.org/10.1145/3581807.3581838","url":null,"abstract":"We propose a task we name Portrait Interpretation and construct a dataset named Portrait250K for it. Current researches on portraits such as human attribute recognition and person re-identification have achieved many successes, but generally, they: 1) may lack mining the interrelationship between various tasks and the possible benefits it may bring; 2) design deep models specifically for each task, which is inefficient; 3) may be unable to cope with the needs of a unified model and comprehensive perception in actual scenes. In this paper, the proposed portrait interpretation recognizes the perception of humans from a new systematic perspective. We divide the perception of portraits into three aspects, namely Appearance, Posture, and Emotion, and design corresponding sub-tasks for each aspect. Based on the framework of multi-task learning, portrait interpretation requires a comprehensive description of static attributes and dynamic states of portraits. To invigorate research on this new task, we construct a new dataset that contains 250,000 images labeled with identity, gender, age, physique, height, expression, and posture of the whole body and arms. Our dataset is collected from 51 movies, hence covering extensive diversity. Furthermore, we focus on representation learning for portrait interpretation and propose a baseline that reflects our systematic perspective. We also propose an appropriate metric for this task. Our experimental results demonstrate that combining the tasks related to portrait interpretation can yield benefits. Code and dataset will be made public.","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127135701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","authors":"","doi":"10.1145/3581807","DOIUrl":"https://doi.org/10.1145/3581807","url":null,"abstract":"","PeriodicalId":292813,"journal":{"name":"Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125166395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}