Human motion prediction forecasts future human poses from the histories, which is necessary for all tasks that need human-robot interactions. Currently, almost existing approaches make predictions based on visual observations, while vision-based motion capture (Mocap) systems have a significant limitation, e.g. occlusions. The vision-based Mocap systems will inevitably suffer from the occlusions. The first reason is the deep ambiguity of mapping the single-view observations to the 3D human pose; and then considering the complex environments in the wild, other objects will lead to the missing observations of the subject. Considering these factors, some researchers utilize non-visual systems as alternatives. We propose to utilize inertial measurement units (IMUs) to capture human poses and make predictions. To bump up the accuracy, we propose a novel model based on MetaFormer with spatial MLP and Temporal pooling (SMTPFormer) to learn the structural and temporal relationships. With extensive experiments on both TotalCapture and DIP-IMU, the proposed SMTPFormer has achieved superior accuracy compared with the existing baselines.
{"title":"Human Motion Prediction based on IMUs and MetaFormer","authors":"Tian Xu, Chunyu Zhi, Qiongjie Cui","doi":"10.1145/3582177.3582179","DOIUrl":"https://doi.org/10.1145/3582177.3582179","url":null,"abstract":"Human motion prediction forecasts future human poses from the histories, which is necessary for all tasks that need human-robot interactions. Currently, almost existing approaches make predictions based on visual observations, while vision-based motion capture (Mocap) systems have a significant limitation, e.g. occlusions. The vision-based Mocap systems will inevitably suffer from the occlusions. The first reason is the deep ambiguity of mapping the single-view observations to the 3D human pose; and then considering the complex environments in the wild, other objects will lead to the missing observations of the subject. Considering these factors, some researchers utilize non-visual systems as alternatives. We propose to utilize inertial measurement units (IMUs) to capture human poses and make predictions. To bump up the accuracy, we propose a novel model based on MetaFormer with spatial MLP and Temporal pooling (SMTPFormer) to learn the structural and temporal relationships. With extensive experiments on both TotalCapture and DIP-IMU, the proposed SMTPFormer has achieved superior accuracy compared with the existing baselines.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121514915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several (k, n) Secret Image Sharing (SIS) schemes supporting authentication of reconstructed secret image and that of shares are proposed in the past. In these existing schemes, two similar state-of-the-art SIS schemes performing visual based share authentication have following merits compared to other schemes: no restriction on values of k and n to be 2, linear time complexity of share authentication, lossless secret image reconstruction, no pixel expansion, and support for share authentication in both dealer participatory and non-participatory environments. In this paper, we show that respective share authentication in these two similar state-of-the-art SIS schemes is computationally insecure in malicious model. We first identify the vulnerabilities in their respective share authentication through security analysis. Then, we propose two linear time algorithms for generating invalid shares from original shares by exploiting the identified vulnerabilities. These generated invalid shares are capable of passing respective authentication in the two analyzed SIS schemes. In addition, usage of a generated invalid share in place of original share during secret image reconstruction results in distorted secret image. Finally, we provide experimental results that accord with inferences of security analysis and linear time complexity of the proposed algorithms for invalid shares generation.
{"title":"Security Analysis of Visual based Share Authentication and Algorithms for Invalid Shares Generation in Malicious Model","authors":"K. Bhat, D. Jinwala, Y. Prasad, M. Zaveri","doi":"10.1145/3582177.3582192","DOIUrl":"https://doi.org/10.1145/3582177.3582192","url":null,"abstract":"Several (k, n) Secret Image Sharing (SIS) schemes supporting authentication of reconstructed secret image and that of shares are proposed in the past. In these existing schemes, two similar state-of-the-art SIS schemes performing visual based share authentication have following merits compared to other schemes: no restriction on values of k and n to be 2, linear time complexity of share authentication, lossless secret image reconstruction, no pixel expansion, and support for share authentication in both dealer participatory and non-participatory environments. In this paper, we show that respective share authentication in these two similar state-of-the-art SIS schemes is computationally insecure in malicious model. We first identify the vulnerabilities in their respective share authentication through security analysis. Then, we propose two linear time algorithms for generating invalid shares from original shares by exploiting the identified vulnerabilities. These generated invalid shares are capable of passing respective authentication in the two analyzed SIS schemes. In addition, usage of a generated invalid share in place of original share during secret image reconstruction results in distorted secret image. Finally, we provide experimental results that accord with inferences of security analysis and linear time complexity of the proposed algorithms for invalid shares generation.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There has been a lot of research on the use of deep neural networks in forecasting time series and chaotic time series data. However, there exist very few works on multi-step ahead forecasting in chaotic time series using deep neural networks. Several strategies that deal with multi-step-ahead forecasting problem have been proposed in literature: recursive (or iterated) strategy, direct strategy, a combination of both the recursive and direct strategies, called DirRec, the Multiple-Input Multiple-Output (MIMO) strategy, and the fifth strategy, called DirMO which combines Direct and MIMO strategies. This paper aims to propose a new deep learning model for chaotic time series forecasting: LSTM-based stacked autoencoder and answer the research question: which strategy for multi-step ahead forecasting using LSTM-based stacked autoencoder yields the best performance for chaotic time series. We evaluated and compared in terms of two performance criteria: Root-Mean-Square Error (RMSE) and Mean-Absolute-Percentage Error (MAPE). The experimental results on synthetic and real-world chaotic time series datasets reveal that MIMO strategy provides the best predictive accuracy for chaotic time series forecasting using LSTM-based stacked autoencoder.
{"title":"Strategies of Multi-Step-ahead Forecasting for Chaotic Time Series using Autoencoder and LSTM Neural Networks: A Comparative Study","authors":"Ngoc Phien Nguyen, T. Duong, Platos Jan","doi":"10.1145/3582177.3582187","DOIUrl":"https://doi.org/10.1145/3582177.3582187","url":null,"abstract":"There has been a lot of research on the use of deep neural networks in forecasting time series and chaotic time series data. However, there exist very few works on multi-step ahead forecasting in chaotic time series using deep neural networks. Several strategies that deal with multi-step-ahead forecasting problem have been proposed in literature: recursive (or iterated) strategy, direct strategy, a combination of both the recursive and direct strategies, called DirRec, the Multiple-Input Multiple-Output (MIMO) strategy, and the fifth strategy, called DirMO which combines Direct and MIMO strategies. This paper aims to propose a new deep learning model for chaotic time series forecasting: LSTM-based stacked autoencoder and answer the research question: which strategy for multi-step ahead forecasting using LSTM-based stacked autoencoder yields the best performance for chaotic time series. We evaluated and compared in terms of two performance criteria: Root-Mean-Square Error (RMSE) and Mean-Absolute-Percentage Error (MAPE). The experimental results on synthetic and real-world chaotic time series datasets reveal that MIMO strategy provides the best predictive accuracy for chaotic time series forecasting using LSTM-based stacked autoencoder.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131485503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Convolutional neural networks (CNNs), one of the most successful models for visual identification, have shown excellent performance outcomes in different visual recognition challenges, attracting much interest in recent years. However, deploying CNN models to hyperspectral imaging (HSI) data continues to be a challenge due to the strongly correlated bands and insufficient training sets. Furthermore, HSI categorization is hugely dependent on spectral-spatial information. Hence, a 2D-CNN is a possible technique to analyze these features. However, because of the volume and spectral dimensions, a 3D CNN can be an option but is more computationally expensive. Furthermore, the models underperform in areas with comparable spectrums due to their inability to extract feature maps of high quality. This work, therefore, proposes a 3D/2D CNN combined with the MobineNetV2 model that uses both spectral-spatial feature maps to achieve competitive performance. First, the HSI data cube is split into small overlapping 3-D batches using the principal component analysis (PCA) to get the desired dimensions. These batches are then processed to build 3-D feature maps over many contiguous bands using a 3D convolutional kernel function, which retains the spectral properties. The performance of our model is validated using three benchmark HSI data sets (i.e., Pavia University, Indian Pines, and Salinas Scene). The results are then compared with different state-of-the-art (SOTA) methods.
卷积神经网络(Convolutional neural networks, cnn)是视觉识别领域最成功的模型之一,近年来在不同的视觉识别挑战中表现出优异的性能,引起了人们的广泛关注。然而,将CNN模型部署到高光谱成像(HSI)数据中仍然是一个挑战,因为波段相关性强,训练集不足。此外,恒指分类很大程度上依赖于光谱空间信息。因此,2D-CNN是分析这些特征的一种可能的技术。然而,由于体积和光谱尺寸,3D CNN可以作为一种选择,但计算成本更高。此外,由于无法提取高质量的特征图,这些模型在具有可比光谱的区域中表现不佳。因此,这项工作提出了一个3D/2D CNN与MobineNetV2模型相结合,该模型使用光谱空间特征映射来获得具有竞争力的性能。首先,使用主成分分析(PCA)将HSI数据立方体分割成小的重叠的3-D批次,以获得所需的维度。然后使用保留光谱特性的3D卷积核函数对这些批次进行处理,在许多连续的波段上构建3D特征图。使用三个基准HSI数据集(即Pavia University, Indian Pines和Salinas Scene)验证了我们模型的性能。然后将结果与不同的最先进(SOTA)方法进行比较。
{"title":"Deep 3D-2D Convolutional Neural Networks Combined With Mobinenetv2 For Hyperspectral Image Classification","authors":"DouglasOmwenga Nyabuga","doi":"10.1145/3582177.3582185","DOIUrl":"https://doi.org/10.1145/3582177.3582185","url":null,"abstract":"Convolutional neural networks (CNNs), one of the most successful models for visual identification, have shown excellent performance outcomes in different visual recognition challenges, attracting much interest in recent years. However, deploying CNN models to hyperspectral imaging (HSI) data continues to be a challenge due to the strongly correlated bands and insufficient training sets. Furthermore, HSI categorization is hugely dependent on spectral-spatial information. Hence, a 2D-CNN is a possible technique to analyze these features. However, because of the volume and spectral dimensions, a 3D CNN can be an option but is more computationally expensive. Furthermore, the models underperform in areas with comparable spectrums due to their inability to extract feature maps of high quality. This work, therefore, proposes a 3D/2D CNN combined with the MobineNetV2 model that uses both spectral-spatial feature maps to achieve competitive performance. First, the HSI data cube is split into small overlapping 3-D batches using the principal component analysis (PCA) to get the desired dimensions. These batches are then processed to build 3-D feature maps over many contiguous bands using a 3D convolutional kernel function, which retains the spectral properties. The performance of our model is validated using three benchmark HSI data sets (i.e., Pavia University, Indian Pines, and Salinas Scene). The results are then compared with different state-of-the-art (SOTA) methods.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130504800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Masum, Shad Reza, N. Sharmin, Md. Mahbubur Rahman
The effectiveness of the Artillery shell is measured by its precision of hitting the target. During hitting the target, the intended firing path can be affected by the inherited source as well as external factors like wind, temperature changes, etc. Thus tracking such a path becomes crucial. In this paper, we address the problem of artillery shell tracking with optical flow and adopted the solution for it with sparse and dense optical flow. The outcome of this work can be used in test firing to determine the accuracy of the artillery target. Experimental findings show that the proposed methodology is more effective than the current military artillery unguided shell’s impact point locating, even if the shell ended up being an unexplored or blind shell.
{"title":"Tracking of Artillery Shell using Optical Flow","authors":"A. Masum, Shad Reza, N. Sharmin, Md. Mahbubur Rahman","doi":"10.1145/3582177.3582180","DOIUrl":"https://doi.org/10.1145/3582177.3582180","url":null,"abstract":"The effectiveness of the Artillery shell is measured by its precision of hitting the target. During hitting the target, the intended firing path can be affected by the inherited source as well as external factors like wind, temperature changes, etc. Thus tracking such a path becomes crucial. In this paper, we address the problem of artillery shell tracking with optical flow and adopted the solution for it with sparse and dense optical flow. The outcome of this work can be used in test firing to determine the accuracy of the artillery target. Experimental findings show that the proposed methodology is more effective than the current military artillery unguided shell’s impact point locating, even if the shell ended up being an unexplored or blind shell.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131468053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virtual Reality and Augmented Reality technologies have greatly improved recently, and developers are trying to make the experience as realistic as possible and close the gap between the physical world and the virtual world. In this paper, we propose an efficient and intuitive method to create an immersive Mixed Reality environment by automatically mapping your room. Our method is view direction driven, which allows the users to simply “look at” any indoor space to create a 3-dimensional model of the area the user is located in. This approach is easier and more intuitive for the users to use and reduces the time and effort compared to other MR environment generating methods. We use the Meta Quest 2’s cameras and gyroscope sensor and the Unity engine for the ray casting and the passthrough API. We will present the mathematical details of our method and show that the proposed method achieves better results than previous methods through the user study results.
{"title":"A View Direction-Driven Approach for Automatic Room Mapping in Mixed Reality","authors":"Dong Jun Kim, Wanwan Li","doi":"10.1145/3582177.3582183","DOIUrl":"https://doi.org/10.1145/3582177.3582183","url":null,"abstract":"Virtual Reality and Augmented Reality technologies have greatly improved recently, and developers are trying to make the experience as realistic as possible and close the gap between the physical world and the virtual world. In this paper, we propose an efficient and intuitive method to create an immersive Mixed Reality environment by automatically mapping your room. Our method is view direction driven, which allows the users to simply “look at” any indoor space to create a 3-dimensional model of the area the user is located in. This approach is easier and more intuitive for the users to use and reduces the time and effort compared to other MR environment generating methods. We use the Meta Quest 2’s cameras and gyroscope sensor and the Unity engine for the ray casting and the passthrough API. We will present the mathematical details of our method and show that the proposed method achieves better results than previous methods through the user study results.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131732654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. K. Agbesi, Wenyu Chen, Sintayehu M. Gizaw, C. Ukwuoma, Abush S. Ameneshewa, C. Ejiyi
Extracting features with deep learning models recorded considerable performance in classifying various sentiments. However, modeling lengthy sentences from low-resource document-level datasets, and examining the semantic relationships between these sentences is challenging. Also, recent document representation modules deployed rarely distinguish between the significance of various sentiment features from general features. To solve these challenges, we exploit a neural network-based feature extraction technique. The technique exploits an attention-based two-layer bidirectional gated recurrent unit with a two-dimension convolutional neural network (AttnBiGRU-2DCNN). Specifically, the Bi-GRU layers extract compositional semantics of the low resource document. These layers generate the semantic feature vector of the sentence and present the documents in a matrix form with a time-step dimension and a feature dimension. Due to the high dimensional multiple features generated, we propose a Hunger Games Search Algorithm to select essential features and subdue unnecessary features to increase classification accuracy. Extensive experiments on two low-resourced datasets indicate that the proposed method captures low-resource sentimental relations and also outperformed known state-of-the-art approaches.
{"title":"Attention Based BiGRU-2DCNN with Hunger Game Search Technique for Low-Resource Document-Level Sentiment Classification","authors":"V. K. Agbesi, Wenyu Chen, Sintayehu M. Gizaw, C. Ukwuoma, Abush S. Ameneshewa, C. Ejiyi","doi":"10.1145/3582177.3582186","DOIUrl":"https://doi.org/10.1145/3582177.3582186","url":null,"abstract":"Extracting features with deep learning models recorded considerable performance in classifying various sentiments. However, modeling lengthy sentences from low-resource document-level datasets, and examining the semantic relationships between these sentences is challenging. Also, recent document representation modules deployed rarely distinguish between the significance of various sentiment features from general features. To solve these challenges, we exploit a neural network-based feature extraction technique. The technique exploits an attention-based two-layer bidirectional gated recurrent unit with a two-dimension convolutional neural network (AttnBiGRU-2DCNN). Specifically, the Bi-GRU layers extract compositional semantics of the low resource document. These layers generate the semantic feature vector of the sentence and present the documents in a matrix form with a time-step dimension and a feature dimension. Due to the high dimensional multiple features generated, we propose a Hunger Games Search Algorithm to select essential features and subdue unnecessary features to increase classification accuracy. Extensive experiments on two low-resourced datasets indicate that the proposed method captures low-resource sentimental relations and also outperformed known state-of-the-art approaches.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"30 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124682997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leiming Duan, Xiantao Jiang, W. Li, Jiayuan Jin, Tian Song, F. Yu
Versatile Video Coding (VVC) is the latest video coding standard, which uses a hybrid coding model. VVC achieves 50% bitrate saving compared with High Efficiency Video Coding (HEVC) standard. However, the encoding complexity of VVC is higher. In this work, a fast partition decision algorithm is proposed to reduce the encoding complexity of VVC, and the CU splitting or no splitting is modeled as a binary classification problem based on Naive Bayes theory. This method has good performance and balances encoding efficiency and encoding complexity. Experimental results show that, compared with the VVC reference software model, the proposed algorithm can reduce encoding time by 48.00%, while the loss of the BD-rate is only 1.69%.
通用视频编码(VVC)是采用混合编码模型的最新视频编码标准。与HEVC (High Efficiency Video Coding)标准相比,VVC可以节省50%的比特率。但是,VVC的编码复杂度较高。本文提出了一种快速分割决策算法来降低VVC的编码复杂度,并将CU分割或不分割问题建模为基于朴素贝叶斯理论的二分类问题。该方法在编码效率和编码复杂度之间取得了良好的平衡。实验结果表明,与VVC参考软件模型相比,该算法可将编码时间缩短48.00%,而bd率的损失仅为1.69%。
{"title":"VVC Coding Unit Partitioning Decision based on Naive Bayes Theory","authors":"Leiming Duan, Xiantao Jiang, W. Li, Jiayuan Jin, Tian Song, F. Yu","doi":"10.1145/3582177.3582188","DOIUrl":"https://doi.org/10.1145/3582177.3582188","url":null,"abstract":"Versatile Video Coding (VVC) is the latest video coding standard, which uses a hybrid coding model. VVC achieves 50% bitrate saving compared with High Efficiency Video Coding (HEVC) standard. However, the encoding complexity of VVC is higher. In this work, a fast partition decision algorithm is proposed to reduce the encoding complexity of VVC, and the CU splitting or no splitting is modeled as a binary classification problem based on Naive Bayes theory. This method has good performance and balances encoding efficiency and encoding complexity. Experimental results show that, compared with the VVC reference software model, the proposed algorithm can reduce encoding time by 48.00%, while the loss of the BD-rate is only 1.69%.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115380905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
W. Li, Xiantao Jiang, Jiayuan Jin, Tian Song, F. Yu
Abstract: Different from the traditional quaternary tree (QT) structure utilized in the previous generation video coding standard H.265/HEVC, a new partition structure named quadtree with nested multi-type tree (QTMT) is applied in the latest codec H.266/VVC. The introduction of QTMT brings in superior encoding performance at the cost of great time-consuming. Therefore, this paper proposes a fast coding unit (CU) partitioning algorithm based on CU texture complexity and texture direction. First, we terminate further splitting of a CU when its texture is judged as simple. Then, we use the gray level co-occurrence matrix (GLCM) to extract the texture direction of the block to decide whether to partition this CU by QT, thus terminating further MT partitions. Finally, a final partition type is selected from the four MT partitions in combination with the multi-level texture complexity and texture direction of the block. The simulation results show that the overall algorithm can significantly reduce the encoding time, while the loss of coding efficiency is reasonably low. In comparison with the reference model, the encoding time is reduced by up to 44.71%, while the BDBR is increased by only 0.84% on average.
{"title":"A Fast CU Partitioning Algorithm Based on Texture Characteristics for VVC","authors":"W. Li, Xiantao Jiang, Jiayuan Jin, Tian Song, F. Yu","doi":"10.1145/3582177.3582193","DOIUrl":"https://doi.org/10.1145/3582177.3582193","url":null,"abstract":"Abstract: Different from the traditional quaternary tree (QT) structure utilized in the previous generation video coding standard H.265/HEVC, a new partition structure named quadtree with nested multi-type tree (QTMT) is applied in the latest codec H.266/VVC. The introduction of QTMT brings in superior encoding performance at the cost of great time-consuming. Therefore, this paper proposes a fast coding unit (CU) partitioning algorithm based on CU texture complexity and texture direction. First, we terminate further splitting of a CU when its texture is judged as simple. Then, we use the gray level co-occurrence matrix (GLCM) to extract the texture direction of the block to decide whether to partition this CU by QT, thus terminating further MT partitions. Finally, a final partition type is selected from the four MT partitions in combination with the multi-level texture complexity and texture direction of the block. The simulation results show that the overall algorithm can significantly reduce the encoding time, while the loss of coding efficiency is reasonably low. In comparison with the reference model, the encoding time is reduced by up to 44.71%, while the BDBR is increased by only 0.84% on average.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124442472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this study, automatic gait gender classification using convolutional neural networks includes three phases: i) human gait signature generation, ii) which convolves the gait energy images with filters for feature extraction and iii) classified using feed-forward convolutional neural networks. Analysed performance of Gabor and Log Gabor features using classification accuracy. The Log Gabor filter's accuracy was 92.11% for the Normal vs Normal dataset, 74.14% for the Normal vs Bag dataset, 46.55% for the Normal vs Coat dataset, 72.41% for the Normal vs Case dataset and whiles Gabor filter's accuracy was 75% for the Normal vs Normal dataset, 60.34% for the Normal vs Bag dataset 65.52% for the Normal vs Coat dataset and 55.17% for the Normal vs Case dataset.
在本研究中,基于卷积神经网络的步态性别自动分类包括三个阶段:1)人体步态特征生成;2)对步态能量图像进行卷积滤波进行特征提取;3)利用前馈卷积神经网络进行分类。利用分类精度分析了Gabor和Log Gabor特征的性能。Log Gabor过滤器对于Normal vs Normal数据集的准确率为92.11%,对于Normal vs Bag数据集的准确率为74.14%,对于Normal vs Coat数据集的准确率为46.55%,对于Normal vs Case数据集的准确率为72.41%,而对于Normal vs Normal数据集的准确率为75%,对于Normal vs Bag数据集的准确率为60.34%,对于Normal vs Coat数据集的准确率为65.52%,对于Normal vs Case数据集的准确率为55.17%。
{"title":"Automatic Gait Gender Classification Using Convolutional Neural Networks","authors":"L. Srinivasan","doi":"10.1145/3582177.3582184","DOIUrl":"https://doi.org/10.1145/3582177.3582184","url":null,"abstract":"In this study, automatic gait gender classification using convolutional neural networks includes three phases: i) human gait signature generation, ii) which convolves the gait energy images with filters for feature extraction and iii) classified using feed-forward convolutional neural networks. Analysed performance of Gabor and Log Gabor features using classification accuracy. The Log Gabor filter's accuracy was 92.11% for the Normal vs Normal dataset, 74.14% for the Normal vs Bag dataset, 46.55% for the Normal vs Coat dataset, 72.41% for the Normal vs Case dataset and whiles Gabor filter's accuracy was 75% for the Normal vs Normal dataset, 60.34% for the Normal vs Bag dataset 65.52% for the Normal vs Coat dataset and 55.17% for the Normal vs Case dataset.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131273482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}