To achieve automatic fruit object recognition in complex backgrounds, this paper proposes a fruit object detection algorithm based on YOLO-GF. Addressing challenges such as complex backgrounds, significant variations in target shapes, and instances of occlusion in fruit images, we utilize the Global Attention Mechanism (GAM) to enhance the feature extraction capability for fruit targets, thereby improving fruit recognition accuracy. Additionally, the Focal-EIOU loss function is used instead of the CIOU loss function to expedite model convergence. Experimental results demonstrate a significant improvement in recognition accuracy under the same hardware conditions. On the same test dataset, the improved model achieves an mAP50 of 92.1% and mAP50:95 of 76.5%, representing increases of 5.8% and 11.9% compared to the original model, respectively.
{"title":"A deep learning approach for fruit detection: YOLO-GF","authors":"J. Guo, Wei Wu","doi":"10.1117/12.3014430","DOIUrl":"https://doi.org/10.1117/12.3014430","url":null,"abstract":"To achieve automatic fruit object recognition in complex backgrounds, this paper proposes a fruit object detection algorithm based on YOLO-GF. Addressing challenges such as complex backgrounds, significant variations in target shapes, and instances of occlusion in fruit images, we utilize the Global Attention Mechanism (GAM) to enhance the feature extraction capability for fruit targets, thereby improving fruit recognition accuracy. Additionally, the Focal-EIOU loss function is used instead of the CIOU loss function to expedite model convergence. Experimental results demonstrate a significant improvement in recognition accuracy under the same hardware conditions. On the same test dataset, the improved model achieves an mAP50 of 92.1% and mAP50:95 of 76.5%, representing increases of 5.8% and 11.9% compared to the original model, respectively.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"23 2","pages":"129691E - 129691E-5"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Based on 3D images, this study aims to explore automatic segmentation and enhancement methods for airfield runway surface cracks. Firstly, a typical 2D Gaussian filter is used to remove noise from the road surface data. Then, Steerable Matched Filter (SMFB) is introduced to extract crack features. By constructing a set of 52 SMFB filters with different parameters, we are able to accurately capture cracks with different directions and sizes. After that, Tensor Voting (TV) technique is introduced to further enhance the continuity of the cracks. With this method, we are able to detect and segment the cracks in the airfield runway surface for a more accurate and comprehensive analysis. The experimental results show that the proposed method performs well in crack detection and segmentation, providing strong support for airport pavement maintenance and management.
{"title":"The automated segmentation and enhancement of cracks on airport pavements using three-dimensional imaging techniques","authors":"Shanshan Zhai, Yanna Xu","doi":"10.1117/12.3014473","DOIUrl":"https://doi.org/10.1117/12.3014473","url":null,"abstract":"Based on 3D images, this study aims to explore automatic segmentation and enhancement methods for airfield runway surface cracks. Firstly, a typical 2D Gaussian filter is used to remove noise from the road surface data. Then, Steerable Matched Filter (SMFB) is introduced to extract crack features. By constructing a set of 52 SMFB filters with different parameters, we are able to accurately capture cracks with different directions and sizes. After that, Tensor Voting (TV) technique is introduced to further enhance the continuity of the cracks. With this method, we are able to detect and segment the cracks in the airfield runway surface for a more accurate and comprehensive analysis. The experimental results show that the proposed method performs well in crack detection and segmentation, providing strong support for airport pavement maintenance and management.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"19 2","pages":"129691H - 129691H-8"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140512102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shi Wang, Xiangju Liu, Xinshu Liu, JiaHui Chen, XiaoHong Wang
The main task of pedestrian multi objects tracking technology is to continuously track multiple pedestrian objects simultaneously in video sequences and maintain their unique ID numbers. However, current pedestrian multi objects tracking models still have many problems, such as false detection, missed detection, and frequent ID number switching when pedestrians are obstructed or have overly similar appearances, ultimately leading to tracking failure. Therefore, this paper proposes a pedestrian multi objects tracking model based on TBD strategy. It mainly consists of two parts: pedestrian detector and pedestrian tracker. In terms of pedestrian detectors, this paper uses ES-YOLO pedestrian detectors. In terms of pedestrian trackers, this paper draws on the Omni-scale feature learning module in OSNet to redesign the StrongSORT pedestrian appearance feature extraction network, and ultimately obtains the StrongSORT pedestrian tracker based on omni-scale feature fusion, further enhancing its pedestrian feature extraction ability. In terms of experimental results. The experimental results of the pedestrian multi objects tracking model based on the TBD strategy in this paper on the MOT16 dataset show that the proposed pedestrian multi-objective tracking model can effectively improve the accuracy of pedestrian multi objects tracking and reduce the problem of frequent pedestrian ID number switching.
行人多目标跟踪技术的主要任务是在视频序列中同时连续跟踪多个行人目标,并保持其唯一的 ID 编号。然而,目前的行人多目标跟踪模型仍然存在很多问题,例如误检、漏检,以及当行人受到遮挡或外观过于相似时频繁切换 ID 号,最终导致跟踪失败。因此,本文提出了一种基于 TBD 策略的行人多目标跟踪模型。它主要由两部分组成:行人检测器和行人跟踪器。在行人检测器方面,本文使用 ES-YOLO 行人检测器。在行人跟踪器方面,本文借鉴 OSNet 中的全尺度特征学习模块,重新设计了 StrongSORT 行人外观特征提取网络,最终得到了基于全尺度特征融合的 StrongSORT 行人跟踪器,进一步增强了其行人特征提取能力。在实验结果方面。基于本文 TBD 策略的行人多目标跟踪模型在 MOT16 数据集上的实验结果表明,本文提出的行人多目标跟踪模型能有效提高行人多目标跟踪的精度,减少行人 ID 号频繁切换的问题。
{"title":"Optimization research on pedestrian multiobjects tracking model based on TBD strategy","authors":"Shi Wang, Xiangju Liu, Xinshu Liu, JiaHui Chen, XiaoHong Wang","doi":"10.1117/12.3014360","DOIUrl":"https://doi.org/10.1117/12.3014360","url":null,"abstract":"The main task of pedestrian multi objects tracking technology is to continuously track multiple pedestrian objects simultaneously in video sequences and maintain their unique ID numbers. However, current pedestrian multi objects tracking models still have many problems, such as false detection, missed detection, and frequent ID number switching when pedestrians are obstructed or have overly similar appearances, ultimately leading to tracking failure. Therefore, this paper proposes a pedestrian multi objects tracking model based on TBD strategy. It mainly consists of two parts: pedestrian detector and pedestrian tracker. In terms of pedestrian detectors, this paper uses ES-YOLO pedestrian detectors. In terms of pedestrian trackers, this paper draws on the Omni-scale feature learning module in OSNet to redesign the StrongSORT pedestrian appearance feature extraction network, and ultimately obtains the StrongSORT pedestrian tracker based on omni-scale feature fusion, further enhancing its pedestrian feature extraction ability. In terms of experimental results. The experimental results of the pedestrian multi objects tracking model based on the TBD strategy in this paper on the MOT16 dataset show that the proposed pedestrian multi-objective tracking model can effectively improve the accuracy of pedestrian multi objects tracking and reduce the problem of frequent pedestrian ID number switching.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"33 4","pages":"129692K - 129692K-7"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140512240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a 3D dangerous goods detection method based on RetinaNet is proposed. This method uses the bidirectional feature pyramid network structure of RetinaNet to extract multi-scale features from point cloud data and trains the system using Focal Loss function to achieve fast and accurate detection of dangerous goods. In addition, in order to improve the detection accuracy, this paper also introduces the 3D region proposal network (3D RPN) and nonmaximum suppression (NMS) algorithm. The experimental results show that the proposed method performs well on our self-built CT dataset, with high accuracy and low false positive rate, and is suitable for dangerous goods detection tasks in practical scenarios.
{"title":"Three-dimensional target detection algorithm for dangerous goods in CT security inspection","authors":"Jingze He, Yao Guo, qing song","doi":"10.1117/12.3014353","DOIUrl":"https://doi.org/10.1117/12.3014353","url":null,"abstract":"In this paper, a 3D dangerous goods detection method based on RetinaNet is proposed. This method uses the bidirectional feature pyramid network structure of RetinaNet to extract multi-scale features from point cloud data and trains the system using Focal Loss function to achieve fast and accurate detection of dangerous goods. In addition, in order to improve the detection accuracy, this paper also introduces the 3D region proposal network (3D RPN) and nonmaximum suppression (NMS) algorithm. The experimental results show that the proposed method performs well on our self-built CT dataset, with high accuracy and low false positive rate, and is suitable for dangerous goods detection tasks in practical scenarios.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"54 5","pages":"1296902 - 1296902-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the low accuracy of collecting vehicle position information, the error in the positioning stage is relatively large. Therefore, the collaborative positioning of intelligent vehicle aided navigation based on computer vision technology is proposed. Taking the computer vision equipment-smart cameras VOF/VOF-S as a specific data acquisition device, and combining with the specific running state of the vehicle, the specific parameters in the data acquisition stage are set differently, so as to realize the accurate acquisition of vehicle position information. In the positioning stage, the plane where the wheel is located is taken as the road plane, and the coordinate parameters of data information collected by several road ground points in VOF/VOF-S computer vision technology device are integrated to realize the transformation of vehicle position information in real space. In the test results, the positioning error of vehicle position under different driving conditions is always stable within 1.50m, which has high accuracy.
{"title":"Research on collaborative positioning of intelligent vehicle aided navigation based on computer vision technology","authors":"Shun Zhang","doi":"10.1117/12.3014415","DOIUrl":"https://doi.org/10.1117/12.3014415","url":null,"abstract":"Due to the low accuracy of collecting vehicle position information, the error in the positioning stage is relatively large. Therefore, the collaborative positioning of intelligent vehicle aided navigation based on computer vision technology is proposed. Taking the computer vision equipment-smart cameras VOF/VOF-S as a specific data acquisition device, and combining with the specific running state of the vehicle, the specific parameters in the data acquisition stage are set differently, so as to realize the accurate acquisition of vehicle position information. In the positioning stage, the plane where the wheel is located is taken as the road plane, and the coordinate parameters of data information collected by several road ground points in VOF/VOF-S computer vision technology device are integrated to realize the transformation of vehicle position information in real space. In the test results, the positioning error of vehicle position under different driving conditions is always stable within 1.50m, which has high accuracy.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"4 1","pages":"129692P - 129692P-5"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Na Geng, Hu Sheng, Weizhi Sun, Yifeng Wang, Tan Yu, Zihan Liu
Under the influence of high density operation and natural environment, the rail surface will appear abrasion damage, which will affect the safety and comfort of the train. Rail surface defect detection is an important part to ensure the safe and efficient operation of railway system. In order to distinguish whether there are defects on the rail surface, a method of rail surface defect image segmentation based on FPSO 2D-Otsu algorithm is proposed. The rail image is denoised and enhanced by adaptive fractional calculus, and then the rail image is segmented by FPSO 2D-Otsu algorithm. In order to verify the accuracy of the algorithm, the proposed algorithm is compared with PSO 2D-Otsu image segmentation algorithm. The experimental results show that the accuracy of FPSO 2D-Otsu algorithm in rail image segmentation is improved from 48.76% to 83.59% compared with PSO 2D-Otsu algorithm.
{"title":"Image segmentation of rail surface defects based on fractional order particle swarm optimization 2D-Otsu algorithm","authors":"Na Geng, Hu Sheng, Weizhi Sun, Yifeng Wang, Tan Yu, Zihan Liu","doi":"10.1117/12.3014444","DOIUrl":"https://doi.org/10.1117/12.3014444","url":null,"abstract":"Under the influence of high density operation and natural environment, the rail surface will appear abrasion damage, which will affect the safety and comfort of the train. Rail surface defect detection is an important part to ensure the safe and efficient operation of railway system. In order to distinguish whether there are defects on the rail surface, a method of rail surface defect image segmentation based on FPSO 2D-Otsu algorithm is proposed. The rail image is denoised and enhanced by adaptive fractional calculus, and then the rail image is segmented by FPSO 2D-Otsu algorithm. In order to verify the accuracy of the algorithm, the proposed algorithm is compared with PSO 2D-Otsu image segmentation algorithm. The experimental results show that the accuracy of FPSO 2D-Otsu algorithm in rail image segmentation is improved from 48.76% to 83.59% compared with PSO 2D-Otsu algorithm.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"226 1","pages":"129690A - 129690A-4"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BaiYang Xiang, BoKai Li, Huaijuan Zang, Zeliang Zhao, Shu Zhan
Video facial micro expression recognition is difficult to extract features due to its short duration and small action amplitude. In order to better combine temporal and spatial information of video, the whole model is divided into local attention module, global attention module and temporal module. First, the local attention module intercepts the key areas and sends them to the network with channel attention after processing; Then the global attention module sends the data into the network with spatial attention after random erasure avoiding key areas; Finally, the temporal module sends the micro expression occurrence frame to the network with temporal shift module and spatial attention after processing; Finally, the classification results are obtained through three full connection layers after feature fusion. The experiment is tested based on CASMEⅡ dataset,After five-fold Cross Validation, the average accuracy rate is 76.15, the unweighted F1 value is 0.691.Compared with the mainstream algorithm, this method has improvement.
视频面部微表情识别因其持续时间短、动作幅度小而难以提取特征。为了更好地结合视频的时空信息,整个模型分为局部注意模块、全局注意模块和时间模块。首先,局部注意模块截取关键区域,经过处理后发送到通道注意网络;然后,全局注意模块随机擦除关键区域后,将数据发送到空间注意网络;最后,时序模块将微表情发生帧经过处理后发送到时移模块和空间注意网络;最后,通过三个全连接层进行特征融合后得到分类结果。实验基于 CASMEⅡ 数据集进行测试,经过五倍交叉验证后,平均准确率为 76.15,非加权 F1 值为 0.691。
{"title":"Microexpression recognition algorithm based on multi feature fusion","authors":"BaiYang Xiang, BoKai Li, Huaijuan Zang, Zeliang Zhao, Shu Zhan","doi":"10.1117/12.3014469","DOIUrl":"https://doi.org/10.1117/12.3014469","url":null,"abstract":"Video facial micro expression recognition is difficult to extract features due to its short duration and small action amplitude. In order to better combine temporal and spatial information of video, the whole model is divided into local attention module, global attention module and temporal module. First, the local attention module intercepts the key areas and sends them to the network with channel attention after processing; Then the global attention module sends the data into the network with spatial attention after random erasure avoiding key areas; Finally, the temporal module sends the micro expression occurrence frame to the network with temporal shift module and spatial attention after processing; Finally, the classification results are obtained through three full connection layers after feature fusion. The experiment is tested based on CASMEⅡ dataset,After five-fold Cross Validation, the average accuracy rate is 76.15, the unweighted F1 value is 0.691.Compared with the mainstream algorithm, this method has improvement.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"12 6","pages":"1296908 - 1296908-10"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140512112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenxuan Song, Jinming Liu, Chunqi Wang, Zhijiang Li
Rice is susceptible to mold and mildew during storage. Metabolites such as aflatoxin produced during mildew will do great harm to consumers. To meet the need for rapid detection of normal rice adulterated with moldy rice, a rapid identification method of adulterated rice was established based on data fusion of near-infrared spectroscopy and machine vision. Using competitive adaptive reweighted sampling (CARS), genetic algorithm (GA), and least angle regression (LARS) for spectral and image feature extraction, combined with support vector classification (SVC), random forest (RF), and gradient boosting tree (GBT) nonlinear discriminant models, and use Bayesian search to optimize modeling parameters. The results show that the GBT fusion data model established by LARS optimization of spectral and image feature variables has the highest discrimination accuracy, with recognition accuracy rates of 100.00% and 98.11% for its training and testing sets, respectively. The discrimination performance is significantly improved compared to single near-infrared spectroscopy and machine vision. The results indicate that rapid identification of adulterated rice based on near-infrared spectroscopy and machine vision data fusion technology is feasible, providing theoretical support for the development of online identification equipment for adulterated rice.
大米在储存过程中容易发霉。霉变过程中产生的黄曲霉毒素等代谢物会对消费者造成极大伤害。为了满足快速检测正常大米与霉变大米掺假的需要,建立了一种基于近红外光谱和机器视觉数据融合的快速识别掺假大米的方法。利用竞争性自适应加权采样(CARS)、遗传算法(GA)和最小角度回归(LARS)进行光谱和图像特征提取,结合支持向量分类(SVC)、随机森林(RF)和梯度提升树(GBT)非线性判别模型,并利用贝叶斯搜索优化建模参数。结果表明,通过对光谱和图像特征变量进行 LARS 优化而建立的 GBT 融合数据模型的判别准确率最高,其训练集和测试集的识别准确率分别为 100.00% 和 98.11%。与单一的近红外光谱仪和机器视觉相比,其识别性能明显提高。结果表明,基于近红外光谱和机器视觉数据融合技术快速识别掺假大米是可行的,为掺假大米在线识别设备的开发提供了理论支持。
{"title":"Rapid identification of adulterated rice using fusion of near-infrared spectroscopy and machine vision data: the combination of feature optimization and nonlinear modeling","authors":"Chenxuan Song, Jinming Liu, Chunqi Wang, Zhijiang Li","doi":"10.1117/12.3014380","DOIUrl":"https://doi.org/10.1117/12.3014380","url":null,"abstract":"Rice is susceptible to mold and mildew during storage. Metabolites such as aflatoxin produced during mildew will do great harm to consumers. To meet the need for rapid detection of normal rice adulterated with moldy rice, a rapid identification method of adulterated rice was established based on data fusion of near-infrared spectroscopy and machine vision. Using competitive adaptive reweighted sampling (CARS), genetic algorithm (GA), and least angle regression (LARS) for spectral and image feature extraction, combined with support vector classification (SVC), random forest (RF), and gradient boosting tree (GBT) nonlinear discriminant models, and use Bayesian search to optimize modeling parameters. The results show that the GBT fusion data model established by LARS optimization of spectral and image feature variables has the highest discrimination accuracy, with recognition accuracy rates of 100.00% and 98.11% for its training and testing sets, respectively. The discrimination performance is significantly improved compared to single near-infrared spectroscopy and machine vision. The results indicate that rapid identification of adulterated rice based on near-infrared spectroscopy and machine vision data fusion technology is feasible, providing theoretical support for the development of online identification equipment for adulterated rice.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"63 2","pages":"129692J - 129692J-16"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Zhu, Gaoxiang He, Bo Xie, Yi Chen, Yaoxuan Zhu, Liuying Chen
Although the Neural Radiance Fields (NeRF) has been shown to achieve high-quality novel view synthesis, existing models still perform poorly in some scenarios, particularly unbounded scenes. These models either require excessively long training times or produce suboptimal synthesis results. Consequently, we propose SD-NeRF, which consists of a compact neural radiance field model and self-supervised depth regularization. Experimental results demonstrate that SDNeRF can shorten training time by over 20 times compared to Mip-NeRF360 without compromising reconstruction accuracy.
{"title":"Fast and high quality neural radiance fields reconstruction based on depth regularization","authors":"Bin Zhu, Gaoxiang He, Bo Xie, Yi Chen, Yaoxuan Zhu, Liuying Chen","doi":"10.1117/12.3014528","DOIUrl":"https://doi.org/10.1117/12.3014528","url":null,"abstract":"Although the Neural Radiance Fields (NeRF) has been shown to achieve high-quality novel view synthesis, existing models still perform poorly in some scenarios, particularly unbounded scenes. These models either require excessively long training times or produce suboptimal synthesis results. Consequently, we propose SD-NeRF, which consists of a compact neural radiance field model and self-supervised depth regularization. Experimental results demonstrate that SDNeRF can shorten training time by over 20 times compared to Mip-NeRF360 without compromising reconstruction accuracy.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"43 3","pages":"129692F - 129692F-9"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Combinatorial action recognition has recently attracted the attention of researchers in the field of computer vision. It focuses on the effective representation and discrimination of spatio-temporal interactions occurring between different actions and objects in video data. Existing work tends to strengthen the framework's object recognition capabilities and relationship modeling capabilities, e.g., attention mechanisms, and graph structures. We find that existing algorithms can be influenced by interaction-independent video segments in a video, misleading the algorithm to focus on additional information in the vision. For the algorithm to analyze the spatio-temporal interactions of causally related video segments in a video, a Causal Slice Recognition Network (CSRN) is proposed. This method can effectively remove the interference of video background segments by explicitly recognizing and extracting the causally related segments in the video. We validate the method on the Something-else dataset and obtain the best results.
{"title":"Combinatorial action recognition based on causal segment intervention","authors":"Xiaozhou Sun","doi":"10.1117/12.3014465","DOIUrl":"https://doi.org/10.1117/12.3014465","url":null,"abstract":"Combinatorial action recognition has recently attracted the attention of researchers in the field of computer vision. It focuses on the effective representation and discrimination of spatio-temporal interactions occurring between different actions and objects in video data. Existing work tends to strengthen the framework's object recognition capabilities and relationship modeling capabilities, e.g., attention mechanisms, and graph structures. We find that existing algorithms can be influenced by interaction-independent video segments in a video, misleading the algorithm to focus on additional information in the vision. For the algorithm to analyze the spatio-temporal interactions of causally related video segments in a video, a Causal Slice Recognition Network (CSRN) is proposed. This method can effectively remove the interference of video background segments by explicitly recognizing and extracting the causally related segments in the video. We validate the method on the Something-else dataset and obtain the best results.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"252 1","pages":"129692W - 129692W-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}