Ming Zeng, Jiansheng Fang, Hanpei Miao, Tianyang Zhang, Jiang Liu
Diabetic retinopathy (DR), a complication due to diabetes, is a common cause of progressive damage to the retina. The mass screening of populations for DR is time-consuming. Therefore, computerized diagnosis is of great significance in the clinical practice, which providing evidence to assist clinicians in decision making. Specifically, hemorrhages, microaneurysms, hard exudates, soft exudates, and other lesions are verified to be closely associated with DR. These lesions, however, are scattered in different positions and sizes in fundus images, the internal relation of which are hard to be reserved in the ultimate features due to a large number of convolution layers that reduce the detail characteristics. In this paper, we present a deep-learning network with a multi-scale self-attention module to aggregate the global context to learned features for DR image retrieval. The multi-scale fusion enhances, in terms of scale, the efficacious latent relation of different positions in features explored by the self-attention. For the experiment, the proposed network is validated on the Kaggle DR dataset, and the result shows that it achieves state-of-the-art performance.
{"title":"A Multi-Scale Self-Attention Network for Diabetic Retinopathy Retrieval","authors":"Ming Zeng, Jiansheng Fang, Hanpei Miao, Tianyang Zhang, Jiang Liu","doi":"10.1145/3484274.3484290","DOIUrl":"https://doi.org/10.1145/3484274.3484290","url":null,"abstract":"Diabetic retinopathy (DR), a complication due to diabetes, is a common cause of progressive damage to the retina. The mass screening of populations for DR is time-consuming. Therefore, computerized diagnosis is of great significance in the clinical practice, which providing evidence to assist clinicians in decision making. Specifically, hemorrhages, microaneurysms, hard exudates, soft exudates, and other lesions are verified to be closely associated with DR. These lesions, however, are scattered in different positions and sizes in fundus images, the internal relation of which are hard to be reserved in the ultimate features due to a large number of convolution layers that reduce the detail characteristics. In this paper, we present a deep-learning network with a multi-scale self-attention module to aggregate the global context to learned features for DR image retrieval. The multi-scale fusion enhances, in terms of scale, the efficacious latent relation of different positions in features explored by the self-attention. For the experiment, the proposed network is validated on the Kaggle DR dataset, and the result shows that it achieves state-of-the-art performance.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114995233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In order to solve the problem of low detection accuracy of the DETR model for small and medium objects, an object detection algorithm with improved feature extraction combined with FPN structure combined with DETR is proposed. This method first extracts features from the original image through the improved Darknet53 network. In this process, the 104*104 size feature map after the first residual error in the second stage is additionally output as a fourth-scale feature map. Combine this feature map with the feature maps output from the original 3 stages to form 4 feature map outputs of different scales. Secondly, it uses FPN to down-sample and up-sample the feature maps of 4 scales, and to merge them to output 52*52 scales. Then, the feature map and the positional encoding are combined and input into the Transformer to obtain the data, and the category and position information of the predicted object are output through FFNs. On the COCO2017 data set, the accuracy has been improved compared with other models.
{"title":"An Object Detection Algorithm Combining FPN Structure With DETR","authors":"Nan Xiang, Chuanzhong Pan, Xiaozhao Li","doi":"10.1145/3484274.3484284","DOIUrl":"https://doi.org/10.1145/3484274.3484284","url":null,"abstract":"In order to solve the problem of low detection accuracy of the DETR model for small and medium objects, an object detection algorithm with improved feature extraction combined with FPN structure combined with DETR is proposed. This method first extracts features from the original image through the improved Darknet53 network. In this process, the 104*104 size feature map after the first residual error in the second stage is additionally output as a fourth-scale feature map. Combine this feature map with the feature maps output from the original 3 stages to form 4 feature map outputs of different scales. Secondly, it uses FPN to down-sample and up-sample the feature maps of 4 scales, and to merge them to output 52*52 scales. Then, the feature map and the positional encoding are combined and input into the Transformer to obtain the data, and the category and position information of the predicted object are output through FFNs. On the COCO2017 data set, the accuracy has been improved compared with other models.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114349831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
World-wide deployment of photovoltaic system requires accurate forecasting concerning of the uncertainty and imprecision of solar radiation. Multilayer perceptron (MLP) is commonly used in day-ahead photovoltaic forecasting, which has excellent performances in convergence speed and a disadvantage of easily causing overfitting. An ensemble model of MLP is proposed in this paper to counteract the overfitting and reduce the variance of a single MLP model. The input of the ensemble model for day-ahead photovoltaic forecasting comprises feature vectors and the 24-hour power generation of the nearest day. The connection coefficients between MLP are defined by the discounting of feature distance, which measures the dissimilarity between input feature vectors. The forecasting results of a PV system in Macau verifies the effectiveness of the proposed ensemble model of MLP for solving day-ahead photovoltaic forecasting problems.
{"title":"Ensemble Multilayer Perceptron Model for Day-ahead Photovoltaic Forecasting","authors":"Minli Wang, Peihong Wang","doi":"10.1145/3484274.3484304","DOIUrl":"https://doi.org/10.1145/3484274.3484304","url":null,"abstract":"World-wide deployment of photovoltaic system requires accurate forecasting concerning of the uncertainty and imprecision of solar radiation. Multilayer perceptron (MLP) is commonly used in day-ahead photovoltaic forecasting, which has excellent performances in convergence speed and a disadvantage of easily causing overfitting. An ensemble model of MLP is proposed in this paper to counteract the overfitting and reduce the variance of a single MLP model. The input of the ensemble model for day-ahead photovoltaic forecasting comprises feature vectors and the 24-hour power generation of the nearest day. The connection coefficients between MLP are defined by the discounting of feature distance, which measures the dissimilarity between input feature vectors. The forecasting results of a PV system in Macau verifies the effectiveness of the proposed ensemble model of MLP for solving day-ahead photovoltaic forecasting problems.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116347795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The task of image semantic segmentation is to annotate and segment the semantic information of different types of objects in the image, and predict the category and location information of objects. The difficulty lies in obtaining enough semantic information while retaining enough space information. In order to solve this problem, this paper proposes an improved BiSeNetV2 network. The main idea is to add DenseASPP module to detail branch to obtain larger receptive field, and add efficient channel attention (ECA) module to detail and semantic branch to optimize the feature graph extracted in each stage. so as to further improve the network acquisition. Experimental results show that the proposed algorithm improves the MIoU index by 1.62% on cityscapes dataset, and achieves better performance than BiSeNetV2 network.
{"title":"An Improved Image Segmentation Method of BiSeNetV2 Network","authors":"Peng Liu, Huan Zhang, Gaochao Yang, Qing Wang","doi":"10.1145/3484274.3484277","DOIUrl":"https://doi.org/10.1145/3484274.3484277","url":null,"abstract":"The task of image semantic segmentation is to annotate and segment the semantic information of different types of objects in the image, and predict the category and location information of objects. The difficulty lies in obtaining enough semantic information while retaining enough space information. In order to solve this problem, this paper proposes an improved BiSeNetV2 network. The main idea is to add DenseASPP module to detail branch to obtain larger receptive field, and add efficient channel attention (ECA) module to detail and semantic branch to optimize the feature graph extracted in each stage. so as to further improve the network acquisition. Experimental results show that the proposed algorithm improves the MIoU index by 1.62% on cityscapes dataset, and achieves better performance than BiSeNetV2 network.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134291250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traditional visual SLAM only relies on the point features in the scene to complete positioning and mapping. When the texture information in the scene is missing, it affects the accuracy of pose estimation and mapping. In the artificial structured environment, there are a lot of structured lines that can be utilized. Compared with point features, line features contain richer information. For example, structure lines can be used to construct surface features. To improve the robustness and stability of visual SLAM positioning in a low-texture environment, we propose a new point-line feature Visual inertial navigation system based on traditional SLAM method, which makes full use of the structural line features in the scene. Compared to the traditional SLAM system which use point-line features, we adopt a new point-line feature error reprojection model-cross-product of between projection line feature and detected line feature and nonlinear optimization strategy under long line, aiming to increase the robustness in a low-texture environment. The proposed algorithm has been verified by EuRoc dataset and real-world scenarios, and the results show that our algorithm has a greater improvement in accuracy.
{"title":"An Improved Monocular PL-SlAM Method with Point-Line Feature Fusion under Low-Texture Environment","authors":"Gaochao Yang, Qing Wang, Peng Liu, Huan Zhang","doi":"10.1145/3484274.3484293","DOIUrl":"https://doi.org/10.1145/3484274.3484293","url":null,"abstract":"Traditional visual SLAM only relies on the point features in the scene to complete positioning and mapping. When the texture information in the scene is missing, it affects the accuracy of pose estimation and mapping. In the artificial structured environment, there are a lot of structured lines that can be utilized. Compared with point features, line features contain richer information. For example, structure lines can be used to construct surface features. To improve the robustness and stability of visual SLAM positioning in a low-texture environment, we propose a new point-line feature Visual inertial navigation system based on traditional SLAM method, which makes full use of the structural line features in the scene. Compared to the traditional SLAM system which use point-line features, we adopt a new point-line feature error reprojection model-cross-product of between projection line feature and detected line feature and nonlinear optimization strategy under long line, aiming to increase the robustness in a low-texture environment. The proposed algorithm has been verified by EuRoc dataset and real-world scenarios, and the results show that our algorithm has a greater improvement in accuracy.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130837594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Autonomous vehicle has come to reach on the road; however accurate road perception in real-time is one of the crucial factors towards its success. The greatest challenge in this direction includes occlusion, truncation, lighting conditions, and complex backgrounds. In order to improve the accuracy and detection speed of vehicle detection, a dynamic scaling network is proposed that assists in constructing a balanced shape neural network to achieve optimum accuracy with minimal hardware. The net architecture is influenced by YOLOv5 and is composed of Cross-Stage Partial Network (CSPNet) as its backbone. In order to go even further, we have proposed an auto-anchor generating method that makes the network suitable for any datasets. Our neural network is fine-tuned by using activation, loss, and optimization functions so as to get the optimum results. Our experimental results demonstrate that the proposed net provides comparable performance of YOLOv4 and Faster R-CNN based on KITTI dataset as the benchmark.
{"title":"FlexiNet: Fast and Accurate Vehicle Detection for Autonomous Vehicles","authors":"Sabeeha Mehtab, Farah Sarwar, Weiqi Yan","doi":"10.1145/3484274.3484282","DOIUrl":"https://doi.org/10.1145/3484274.3484282","url":null,"abstract":"Autonomous vehicle has come to reach on the road; however accurate road perception in real-time is one of the crucial factors towards its success. The greatest challenge in this direction includes occlusion, truncation, lighting conditions, and complex backgrounds. In order to improve the accuracy and detection speed of vehicle detection, a dynamic scaling network is proposed that assists in constructing a balanced shape neural network to achieve optimum accuracy with minimal hardware. The net architecture is influenced by YOLOv5 and is composed of Cross-Stage Partial Network (CSPNet) as its backbone. In order to go even further, we have proposed an auto-anchor generating method that makes the network suitable for any datasets. Our neural network is fine-tuned by using activation, loss, and optimization functions so as to get the optimum results. Our experimental results demonstrate that the proposed net provides comparable performance of YOLOv4 and Faster R-CNN based on KITTI dataset as the benchmark.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125536528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a new structured light approach dedicated to 3D mapping of underwater galleries in karst aquifers. This kind of method is based on the projection of a light pattern onto the scene captured by a camera. Our originality comes from the projected pattern used, since unlike literature methods, our light projector is a simple conical diving light. With a light cone, the recovered pattern in the image is a closed 2D curve we extract with a light contour detection method we have developed. A specific calibration method has also been created to estimate the cone geometry with respect to the camera. Therefore, we get a calibrated projector-camera pair, we can use to find matches between projected and recovered patterns. Our last contribution is the recovery of 3D data by triangulation using the camera cone relationship and the extracted closed 2D curve. The experimental results that will be presented in this article show the feasibility of the method.
{"title":"An original 3D reconstruction method using a conical light and a camera in underwater caves","authors":"Quentin Massone, S. Druon, J. Triboulet","doi":"10.1145/3484274.3484294","DOIUrl":"https://doi.org/10.1145/3484274.3484294","url":null,"abstract":"This paper presents a new structured light approach dedicated to 3D mapping of underwater galleries in karst aquifers. This kind of method is based on the projection of a light pattern onto the scene captured by a camera. Our originality comes from the projected pattern used, since unlike literature methods, our light projector is a simple conical diving light. With a light cone, the recovered pattern in the image is a closed 2D curve we extract with a light contour detection method we have developed. A specific calibration method has also been created to estimate the cone geometry with respect to the camera. Therefore, we get a calibrated projector-camera pair, we can use to find matches between projected and recovered patterns. Our last contribution is the recovery of 3D data by triangulation using the camera cone relationship and the extracted closed 2D curve. The experimental results that will be presented in this article show the feasibility of the method.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124934192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
At present, matting technology has achieved excellent performance under laboratory conditions, but they still fail to meet the needs of some practical businesses. In this paper, according to the practical problem that the contour of the matting object is not smooth when applying matting technology to car data set, we add the smoothing loss, location information, and detailed information based on the original algorithm. The experimental results show that our improvement has achieved nice results.
{"title":"Car Image Matting","authors":"Jiajian Huang","doi":"10.1145/3484274.3484279","DOIUrl":"https://doi.org/10.1145/3484274.3484279","url":null,"abstract":"At present, matting technology has achieved excellent performance under laboratory conditions, but they still fail to meet the needs of some practical businesses. In this paper, according to the practical problem that the contour of the matting object is not smooth when applying matting technology to car data set, we add the smoothing loss, location information, and detailed information based on the original algorithm. The experimental results show that our improvement has achieved nice results.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132301446","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The leakage of the face template leads to severe security problems since the facial image is unique and irreplaceable to each individual. Many researchers have been devoted to protecting the face template. Nevertheless, to achieve high security for the face template, partial matching accuracy is usually sacrificed. The main challenge of this problem is the low inter-user variations and high intra-user variations of facial images. In this work, we propose a method integrating residual learning and error-correcting codes for face template protection. In particular, the proposed method consists of two major components: (a) a deep residual network component mapping facial images to polar codewords assigned to users, and (b) a polar decoder reducing noise brought by high intra-user variations in the predicted codewords. The proposed method is evaluated on extended Yale B, CMU-PIE, and FEI databases. It provides high security of face template and achieves a high (100%) genuine accept rate at a low false accept rate (0%) simultaneously, which outperforms most state-of-the-arts.
{"title":"Face Template Protection through Residual Learning Based Error-Correcting Codes","authors":"Junwei Zhou, D. Shang, Huile Lang, G. Ye, Zhe Xia","doi":"10.1145/3484274.3484292","DOIUrl":"https://doi.org/10.1145/3484274.3484292","url":null,"abstract":"The leakage of the face template leads to severe security problems since the facial image is unique and irreplaceable to each individual. Many researchers have been devoted to protecting the face template. Nevertheless, to achieve high security for the face template, partial matching accuracy is usually sacrificed. The main challenge of this problem is the low inter-user variations and high intra-user variations of facial images. In this work, we propose a method integrating residual learning and error-correcting codes for face template protection. In particular, the proposed method consists of two major components: (a) a deep residual network component mapping facial images to polar codewords assigned to users, and (b) a polar decoder reducing noise brought by high intra-user variations in the predicted codewords. The proposed method is evaluated on extended Yale B, CMU-PIE, and FEI databases. It provides high security of face template and achieves a high (100%) genuine accept rate at a low false accept rate (0%) simultaneously, which outperforms most state-of-the-arts.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"27 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133761760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper analyzed the dynamic evolution process of road traffic network congestion from a holistic and macroscopic point of view, and establishes a road network congestion propagation model based on the similarity between traffic congestion propagation and infectious disease propagation and the topological structure information of roads. The model takes the road section as the research object, and analyzes the effect of different parameters on congestion propagation through simulation. The simulation results show that the road network congestion propagation model constructed in this paper is consistent with the actual situation, which can provide a new idea for road network congestion management and congestion prediction, and has certain practical application value.
{"title":"Research on Road Network Congestion Propagation Based on Complex Network and SIR Model","authors":"Xiaofan Song, Lidong Zhang, Wei Zhang","doi":"10.1145/3484274.3484300","DOIUrl":"https://doi.org/10.1145/3484274.3484300","url":null,"abstract":"This paper analyzed the dynamic evolution process of road traffic network congestion from a holistic and macroscopic point of view, and establishes a road network congestion propagation model based on the similarity between traffic congestion propagation and infectious disease propagation and the topological structure information of roads. The model takes the road section as the research object, and analyzes the effect of different parameters on congestion propagation through simulation. The simulation results show that the road network congestion propagation model constructed in this paper is consistent with the actual situation, which can provide a new idea for road network congestion management and congestion prediction, and has certain practical application value.","PeriodicalId":143540,"journal":{"name":"Proceedings of the 4th International Conference on Control and Computer Vision","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126638928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}