For complex stroke rehabilitation scenarios, visual algorithms, such as motion recognition or video understanding, find it challenging to focus on patient areas with slow motion amplitude and pay more attention to targets with drastic changes in light flow. Therefore, it can provide critical perspectives and adequate information for the above visual tasks using a semantic segmentation algorithm to capture the patient's area from the captured image. Currently, the weakly supervised segmentation algorithm based on bounding boxes tends to utilize existing image classification methods. They can perform secondary processing on the internal images of boxes to obtain larger areas of pseudo-label information. In order to avoid the redundancy caused by algorithm concatenation, this paper proposes an end-to-end weakly supervised segmentation algorithm. In this method, a U-shaped residual module with variable depth is designed to capture the deep semantic features of images, and its output is integrated into the target matrix of the NCut problem in the form of blocks. Then, the region of the target is indicated by solving the sub-minimum eigenvector of the generalized eigensystem, and the segmentation is realized. We conducted experiments on the PASCAL VOC 2012 dataset, and the proposed method achieved 67.7% mIoU. On the private dataset, we compared the proposed method with similar algorithms, which can segment the target area more intensively
{"title":"Box-driven coarse-grained segmentation for stroke rehabilitation scenarios","authors":"Yiming Fan, Yunjia Liu, Xiaofeng Lu","doi":"10.1117/12.3014426","DOIUrl":"https://doi.org/10.1117/12.3014426","url":null,"abstract":"For complex stroke rehabilitation scenarios, visual algorithms, such as motion recognition or video understanding, find it challenging to focus on patient areas with slow motion amplitude and pay more attention to targets with drastic changes in light flow. Therefore, it can provide critical perspectives and adequate information for the above visual tasks using a semantic segmentation algorithm to capture the patient's area from the captured image. Currently, the weakly supervised segmentation algorithm based on bounding boxes tends to utilize existing image classification methods. They can perform secondary processing on the internal images of boxes to obtain larger areas of pseudo-label information. In order to avoid the redundancy caused by algorithm concatenation, this paper proposes an end-to-end weakly supervised segmentation algorithm. In this method, a U-shaped residual module with variable depth is designed to capture the deep semantic features of images, and its output is integrated into the target matrix of the NCut problem in the form of blocks. Then, the region of the target is indicated by solving the sub-minimum eigenvector of the generalized eigensystem, and the segmentation is realized. We conducted experiments on the PASCAL VOC 2012 dataset, and the proposed method achieved 67.7% mIoU. On the private dataset, we compared the proposed method with similar algorithms, which can segment the target area more intensively","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 3","pages":"129692D - 129692D-7"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is difficult to extract deep semantic features for English composition scoring methods based on artificial features, and it is difficult for English composition scoring methods based on neural networks to extract shallow features such as the number of words, resulting in the limitations of different composition scoring methods. Based on existing research results, this paper proposes an English composition scoring method that combines artificial feature extraction methods and deep learning methods. This method uses artificially designed features to extract shallow features at the word and sentence levels in the composition, draws on existing methods to extract semantic features of the composition, and performs regression calculations on the deep features and shallow features to obtain the total score of the composition. The experiment uses the Pearson evaluation index to measure the correlation between the predicted total score of the essay and the true total score under the combination method. The experiment shows that compared with the average results of 0.747 and 0.645 of baseline models such as BiLSTM and RNN, the algorithm proposed in this article is respectively improvements are 0.068 and 0.17, which proves the effectiveness of the method proposed in this paper.
{"title":"Research on automatic scoring algorithm for English composition based on machine learning","authors":"Hui Li","doi":"10.1117/12.3014482","DOIUrl":"https://doi.org/10.1117/12.3014482","url":null,"abstract":"It is difficult to extract deep semantic features for English composition scoring methods based on artificial features, and it is difficult for English composition scoring methods based on neural networks to extract shallow features such as the number of words, resulting in the limitations of different composition scoring methods. Based on existing research results, this paper proposes an English composition scoring method that combines artificial feature extraction methods and deep learning methods. This method uses artificially designed features to extract shallow features at the word and sentence levels in the composition, draws on existing methods to extract semantic features of the composition, and performs regression calculations on the deep features and shallow features to obtain the total score of the composition. The experiment uses the Pearson evaluation index to measure the correlation between the predicted total score of the essay and the true total score under the combination method. The experiment shows that compared with the average results of 0.747 and 0.645 of baseline models such as BiLSTM and RNN, the algorithm proposed in this article is respectively improvements are 0.068 and 0.17, which proves the effectiveness of the method proposed in this paper.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"20 6","pages":"129690T - 129690T-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du
The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.
{"title":"Enhancing audio perception in augmented reality: a dynamic vocal information processing framework","authors":"Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du","doi":"10.1117/12.3014440","DOIUrl":"https://doi.org/10.1117/12.3014440","url":null,"abstract":"The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 22","pages":"129691Z - 129691Z-9"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the rapidly advancing information technology era, information overload poses a significant challenge. Recommender systems offer a partial solution, yet traditional methods grapple with issues like sparse data and accuracy. For this reason, this paper introduces a novel approach—a high-order graph convolutional collaborative filtering model. This model employs a subgraph generation module to enhance the importance of neighbor nodes during high-order graph convolutions. Our approach yields enhanced embeddings by embedding user-item interaction information using graph techniques, stacking multi-layer graph convolutional networks to capture complex interactions, and leveraging both initial and convoluted embeddings. This paper introduces a constraint loss function to address over-smoothing in graph-based recommendations. Our method's effectiveness is confirmed through extensive experiments on three real-world datasets
{"title":"Collaborative filtering recommendation method based on graph convolutional neural networks","authors":"Zhengwu Yuan, Xiling Zhan, Yatao Zhou, Hao Yang","doi":"10.1117/12.3014407","DOIUrl":"https://doi.org/10.1117/12.3014407","url":null,"abstract":"In the rapidly advancing information technology era, information overload poses a significant challenge. Recommender systems offer a partial solution, yet traditional methods grapple with issues like sparse data and accuracy. For this reason, this paper introduces a novel approach—a high-order graph convolutional collaborative filtering model. This model employs a subgraph generation module to enhance the importance of neighbor nodes during high-order graph convolutions. Our approach yields enhanced embeddings by embedding user-item interaction information using graph techniques, stacking multi-layer graph convolutional networks to capture complex interactions, and leveraging both initial and convoluted embeddings. This paper introduces a constraint loss function to address over-smoothing in graph-based recommendations. Our method's effectiveness is confirmed through extensive experiments on three real-world datasets","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 56","pages":"129691U - 129691U-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, a 3D dangerous goods detection method based on RetinaNet is proposed. This method uses the bidirectional feature pyramid network structure of RetinaNet to extract multi-scale features from point cloud data and trains the system using Focal Loss function to achieve fast and accurate detection of dangerous goods. In addition, in order to improve the detection accuracy, this paper also introduces the 3D region proposal network (3D RPN) and nonmaximum suppression (NMS) algorithm. The experimental results show that the proposed method performs well on our self-built CT dataset, with high accuracy and low false positive rate, and is suitable for dangerous goods detection tasks in practical scenarios.
{"title":"Three-dimensional target detection algorithm for dangerous goods in CT security inspection","authors":"Jingze He, Yao Guo, qing song","doi":"10.1117/12.3014353","DOIUrl":"https://doi.org/10.1117/12.3014353","url":null,"abstract":"In this paper, a 3D dangerous goods detection method based on RetinaNet is proposed. This method uses the bidirectional feature pyramid network structure of RetinaNet to extract multi-scale features from point cloud data and trains the system using Focal Loss function to achieve fast and accurate detection of dangerous goods. In addition, in order to improve the detection accuracy, this paper also introduces the 3D region proposal network (3D RPN) and nonmaximum suppression (NMS) algorithm. The experimental results show that the proposed method performs well on our self-built CT dataset, with high accuracy and low false positive rate, and is suitable for dangerous goods detection tasks in practical scenarios.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"54 5","pages":"1296902 - 1296902-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenxuan Song, Jinming Liu, Chunqi Wang, Zhijiang Li
Rice is susceptible to mold and mildew during storage. Metabolites such as aflatoxin produced during mildew will do great harm to consumers. To meet the need for rapid detection of normal rice adulterated with moldy rice, a rapid identification method of adulterated rice was established based on data fusion of near-infrared spectroscopy and machine vision. Using competitive adaptive reweighted sampling (CARS), genetic algorithm (GA), and least angle regression (LARS) for spectral and image feature extraction, combined with support vector classification (SVC), random forest (RF), and gradient boosting tree (GBT) nonlinear discriminant models, and use Bayesian search to optimize modeling parameters. The results show that the GBT fusion data model established by LARS optimization of spectral and image feature variables has the highest discrimination accuracy, with recognition accuracy rates of 100.00% and 98.11% for its training and testing sets, respectively. The discrimination performance is significantly improved compared to single near-infrared spectroscopy and machine vision. The results indicate that rapid identification of adulterated rice based on near-infrared spectroscopy and machine vision data fusion technology is feasible, providing theoretical support for the development of online identification equipment for adulterated rice.
大米在储存过程中容易发霉。霉变过程中产生的黄曲霉毒素等代谢物会对消费者造成极大伤害。为了满足快速检测正常大米与霉变大米掺假的需要,建立了一种基于近红外光谱和机器视觉数据融合的快速识别掺假大米的方法。利用竞争性自适应加权采样(CARS)、遗传算法(GA)和最小角度回归(LARS)进行光谱和图像特征提取,结合支持向量分类(SVC)、随机森林(RF)和梯度提升树(GBT)非线性判别模型,并利用贝叶斯搜索优化建模参数。结果表明,通过对光谱和图像特征变量进行 LARS 优化而建立的 GBT 融合数据模型的判别准确率最高,其训练集和测试集的识别准确率分别为 100.00% 和 98.11%。与单一的近红外光谱仪和机器视觉相比,其识别性能明显提高。结果表明,基于近红外光谱和机器视觉数据融合技术快速识别掺假大米是可行的,为掺假大米在线识别设备的开发提供了理论支持。
{"title":"Rapid identification of adulterated rice using fusion of near-infrared spectroscopy and machine vision data: the combination of feature optimization and nonlinear modeling","authors":"Chenxuan Song, Jinming Liu, Chunqi Wang, Zhijiang Li","doi":"10.1117/12.3014380","DOIUrl":"https://doi.org/10.1117/12.3014380","url":null,"abstract":"Rice is susceptible to mold and mildew during storage. Metabolites such as aflatoxin produced during mildew will do great harm to consumers. To meet the need for rapid detection of normal rice adulterated with moldy rice, a rapid identification method of adulterated rice was established based on data fusion of near-infrared spectroscopy and machine vision. Using competitive adaptive reweighted sampling (CARS), genetic algorithm (GA), and least angle regression (LARS) for spectral and image feature extraction, combined with support vector classification (SVC), random forest (RF), and gradient boosting tree (GBT) nonlinear discriminant models, and use Bayesian search to optimize modeling parameters. The results show that the GBT fusion data model established by LARS optimization of spectral and image feature variables has the highest discrimination accuracy, with recognition accuracy rates of 100.00% and 98.11% for its training and testing sets, respectively. The discrimination performance is significantly improved compared to single near-infrared spectroscopy and machine vision. The results indicate that rapid identification of adulterated rice based on near-infrared spectroscopy and machine vision data fusion technology is feasible, providing theoretical support for the development of online identification equipment for adulterated rice.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"63 2","pages":"129692J - 129692J-16"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Zhu, Gaoxiang He, Bo Xie, Yi Chen, Yaoxuan Zhu, Liuying Chen
Although the Neural Radiance Fields (NeRF) has been shown to achieve high-quality novel view synthesis, existing models still perform poorly in some scenarios, particularly unbounded scenes. These models either require excessively long training times or produce suboptimal synthesis results. Consequently, we propose SD-NeRF, which consists of a compact neural radiance field model and self-supervised depth regularization. Experimental results demonstrate that SDNeRF can shorten training time by over 20 times compared to Mip-NeRF360 without compromising reconstruction accuracy.
{"title":"Fast and high quality neural radiance fields reconstruction based on depth regularization","authors":"Bin Zhu, Gaoxiang He, Bo Xie, Yi Chen, Yaoxuan Zhu, Liuying Chen","doi":"10.1117/12.3014528","DOIUrl":"https://doi.org/10.1117/12.3014528","url":null,"abstract":"Although the Neural Radiance Fields (NeRF) has been shown to achieve high-quality novel view synthesis, existing models still perform poorly in some scenarios, particularly unbounded scenes. These models either require excessively long training times or produce suboptimal synthesis results. Consequently, we propose SD-NeRF, which consists of a compact neural radiance field model and self-supervised depth regularization. Experimental results demonstrate that SDNeRF can shorten training time by over 20 times compared to Mip-NeRF360 without compromising reconstruction accuracy.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"43 3","pages":"129692F - 129692F-9"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Due to the low accuracy of collecting vehicle position information, the error in the positioning stage is relatively large. Therefore, the collaborative positioning of intelligent vehicle aided navigation based on computer vision technology is proposed. Taking the computer vision equipment-smart cameras VOF/VOF-S as a specific data acquisition device, and combining with the specific running state of the vehicle, the specific parameters in the data acquisition stage are set differently, so as to realize the accurate acquisition of vehicle position information. In the positioning stage, the plane where the wheel is located is taken as the road plane, and the coordinate parameters of data information collected by several road ground points in VOF/VOF-S computer vision technology device are integrated to realize the transformation of vehicle position information in real space. In the test results, the positioning error of vehicle position under different driving conditions is always stable within 1.50m, which has high accuracy.
{"title":"Research on collaborative positioning of intelligent vehicle aided navigation based on computer vision technology","authors":"Shun Zhang","doi":"10.1117/12.3014415","DOIUrl":"https://doi.org/10.1117/12.3014415","url":null,"abstract":"Due to the low accuracy of collecting vehicle position information, the error in the positioning stage is relatively large. Therefore, the collaborative positioning of intelligent vehicle aided navigation based on computer vision technology is proposed. Taking the computer vision equipment-smart cameras VOF/VOF-S as a specific data acquisition device, and combining with the specific running state of the vehicle, the specific parameters in the data acquisition stage are set differently, so as to realize the accurate acquisition of vehicle position information. In the positioning stage, the plane where the wheel is located is taken as the road plane, and the coordinate parameters of data information collected by several road ground points in VOF/VOF-S computer vision technology device are integrated to realize the transformation of vehicle position information in real space. In the test results, the positioning error of vehicle position under different driving conditions is always stable within 1.50m, which has high accuracy.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"4 1","pages":"129692P - 129692P-5"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Na Geng, Hu Sheng, Weizhi Sun, Yifeng Wang, Tan Yu, Zihan Liu
Under the influence of high density operation and natural environment, the rail surface will appear abrasion damage, which will affect the safety and comfort of the train. Rail surface defect detection is an important part to ensure the safe and efficient operation of railway system. In order to distinguish whether there are defects on the rail surface, a method of rail surface defect image segmentation based on FPSO 2D-Otsu algorithm is proposed. The rail image is denoised and enhanced by adaptive fractional calculus, and then the rail image is segmented by FPSO 2D-Otsu algorithm. In order to verify the accuracy of the algorithm, the proposed algorithm is compared with PSO 2D-Otsu image segmentation algorithm. The experimental results show that the accuracy of FPSO 2D-Otsu algorithm in rail image segmentation is improved from 48.76% to 83.59% compared with PSO 2D-Otsu algorithm.
{"title":"Image segmentation of rail surface defects based on fractional order particle swarm optimization 2D-Otsu algorithm","authors":"Na Geng, Hu Sheng, Weizhi Sun, Yifeng Wang, Tan Yu, Zihan Liu","doi":"10.1117/12.3014444","DOIUrl":"https://doi.org/10.1117/12.3014444","url":null,"abstract":"Under the influence of high density operation and natural environment, the rail surface will appear abrasion damage, which will affect the safety and comfort of the train. Rail surface defect detection is an important part to ensure the safe and efficient operation of railway system. In order to distinguish whether there are defects on the rail surface, a method of rail surface defect image segmentation based on FPSO 2D-Otsu algorithm is proposed. The rail image is denoised and enhanced by adaptive fractional calculus, and then the rail image is segmented by FPSO 2D-Otsu algorithm. In order to verify the accuracy of the algorithm, the proposed algorithm is compared with PSO 2D-Otsu image segmentation algorithm. The experimental results show that the accuracy of FPSO 2D-Otsu algorithm in rail image segmentation is improved from 48.76% to 83.59% compared with PSO 2D-Otsu algorithm.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"226 1","pages":"129690A - 129690A-4"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BaiYang Xiang, BoKai Li, Huaijuan Zang, Zeliang Zhao, Shu Zhan
Video facial micro expression recognition is difficult to extract features due to its short duration and small action amplitude. In order to better combine temporal and spatial information of video, the whole model is divided into local attention module, global attention module and temporal module. First, the local attention module intercepts the key areas and sends them to the network with channel attention after processing; Then the global attention module sends the data into the network with spatial attention after random erasure avoiding key areas; Finally, the temporal module sends the micro expression occurrence frame to the network with temporal shift module and spatial attention after processing; Finally, the classification results are obtained through three full connection layers after feature fusion. The experiment is tested based on CASMEⅡ dataset,After five-fold Cross Validation, the average accuracy rate is 76.15, the unweighted F1 value is 0.691.Compared with the mainstream algorithm, this method has improvement.
视频面部微表情识别因其持续时间短、动作幅度小而难以提取特征。为了更好地结合视频的时空信息,整个模型分为局部注意模块、全局注意模块和时间模块。首先,局部注意模块截取关键区域,经过处理后发送到通道注意网络;然后,全局注意模块随机擦除关键区域后,将数据发送到空间注意网络;最后,时序模块将微表情发生帧经过处理后发送到时移模块和空间注意网络;最后,通过三个全连接层进行特征融合后得到分类结果。实验基于 CASMEⅡ 数据集进行测试,经过五倍交叉验证后,平均准确率为 76.15,非加权 F1 值为 0.691。
{"title":"Microexpression recognition algorithm based on multi feature fusion","authors":"BaiYang Xiang, BoKai Li, Huaijuan Zang, Zeliang Zhao, Shu Zhan","doi":"10.1117/12.3014469","DOIUrl":"https://doi.org/10.1117/12.3014469","url":null,"abstract":"Video facial micro expression recognition is difficult to extract features due to its short duration and small action amplitude. In order to better combine temporal and spatial information of video, the whole model is divided into local attention module, global attention module and temporal module. First, the local attention module intercepts the key areas and sends them to the network with channel attention after processing; Then the global attention module sends the data into the network with spatial attention after random erasure avoiding key areas; Finally, the temporal module sends the micro expression occurrence frame to the network with temporal shift module and spatial attention after processing; Finally, the classification results are obtained through three full connection layers after feature fusion. The experiment is tested based on CASMEⅡ dataset,After five-fold Cross Validation, the average accuracy rate is 76.15, the unweighted F1 value is 0.691.Compared with the mainstream algorithm, this method has improvement.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"12 6","pages":"1296908 - 1296908-10"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140512112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}