Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision最新文献

英文中文

Human Motion Prediction based on IMUs and MetaFormer 基于imu和MetaFormer的人体运动预测

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582179

Tian Xu, Chunyu Zhi, Qiongjie Cui

Human motion prediction forecasts future human poses from the histories, which is necessary for all tasks that need human-robot interactions. Currently, almost existing approaches make predictions based on visual observations, while vision-based motion capture (Mocap) systems have a significant limitation, e.g. occlusions. The vision-based Mocap systems will inevitably suffer from the occlusions. The first reason is the deep ambiguity of mapping the single-view observations to the 3D human pose; and then considering the complex environments in the wild, other objects will lead to the missing observations of the subject. Considering these factors, some researchers utilize non-visual systems as alternatives. We propose to utilize inertial measurement units (IMUs) to capture human poses and make predictions. To bump up the accuracy, we propose a novel model based on MetaFormer with spatial MLP and Temporal pooling (SMTPFormer) to learn the structural and temporal relationships. With extensive experiments on both TotalCapture and DIP-IMU, the proposed SMTPFormer has achieved superior accuracy compared with the existing baselines.

人体运动预测从历史中预测未来的人体姿势，这是所有需要人机交互的任务所必需的。目前，几乎现有的方法都是基于视觉观察进行预测，而基于视觉的运动捕捉(Mocap)系统有很大的局限性，例如遮挡。基于视觉的动作捕捉系统将不可避免地受到遮挡的影响。第一个原因是将单视图观测映射到3D人体姿态的深度模糊性;然后考虑到野外复杂的环境，其他物体会导致对主体的缺失观察。考虑到这些因素，一些研究人员利用非视觉系统作为替代方案。我们建议利用惯性测量单元(imu)来捕捉人体姿势并进行预测。为了提高准确率，我们提出了一种基于空间MLP和时间池的MetaFormer模型(SMTPFormer)来学习结构和时间关系。通过对TotalCapture和DIP-IMU的大量实验，与现有基线相比，所提出的SMTPFormer具有更高的精度。

{"title":"Human Motion Prediction based on IMUs and MetaFormer","authors":"Tian Xu, Chunyu Zhi, Qiongjie Cui","doi":"10.1145/3582177.3582179","DOIUrl":"https://doi.org/10.1145/3582177.3582179","url":null,"abstract":"Human motion prediction forecasts future human poses from the histories, which is necessary for all tasks that need human-robot interactions. Currently, almost existing approaches make predictions based on visual observations, while vision-based motion capture (Mocap) systems have a significant limitation, e.g. occlusions. The vision-based Mocap systems will inevitably suffer from the occlusions. The first reason is the deep ambiguity of mapping the single-view observations to the 3D human pose; and then considering the complex environments in the wild, other objects will lead to the missing observations of the subject. Considering these factors, some researchers utilize non-visual systems as alternatives. We propose to utilize inertial measurement units (IMUs) to capture human poses and make predictions. To bump up the accuracy, we propose a novel model based on MetaFormer with spatial MLP and Temporal pooling (SMTPFormer) to learn the structural and temporal relationships. With extensive experiments on both TotalCapture and DIP-IMU, the proposed SMTPFormer has achieved superior accuracy compared with the existing baselines.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121514915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Security Analysis of Visual based Share Authentication and Algorithms for Invalid Shares Generation in Malicious Model 基于可视化的共享认证安全性分析及恶意模型中无效共享生成算法

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582192

K. Bhat, D. Jinwala, Y. Prasad, M. Zaveri

Several (k, n) Secret Image Sharing (SIS) schemes supporting authentication of reconstructed secret image and that of shares are proposed in the past. In these existing schemes, two similar state-of-the-art SIS schemes performing visual based share authentication have following merits compared to other schemes: no restriction on values of k and n to be 2, linear time complexity of share authentication, lossless secret image reconstruction, no pixel expansion, and support for share authentication in both dealer participatory and non-participatory environments. In this paper, we show that respective share authentication in these two similar state-of-the-art SIS schemes is computationally insecure in malicious model. We first identify the vulnerabilities in their respective share authentication through security analysis. Then, we propose two linear time algorithms for generating invalid shares from original shares by exploiting the identified vulnerabilities. These generated invalid shares are capable of passing respective authentication in the two analyzed SIS schemes. In addition, usage of a generated invalid share in place of original share during secret image reconstruction results in distorted secret image. Finally, we provide experimental results that accord with inferences of security analysis and linear time complexity of the proposed algorithms for invalid shares generation.

过去提出了几种(k, n)秘密图像共享(SIS)方案，支持重建的秘密图像和共享的身份验证。在这些现有方案中，两种类似的最先进的SIS方案执行基于视觉的共享认证，与其他方案相比具有以下优点:不限制k和n的值为2，共享认证的线性时间复杂度，无损的秘密图像重建，不扩展像素，支持经销商参与式和非参与式环境下的共享认证。在本文中，我们证明了在这两种类似的最先进的SIS方案中各自的共享身份验证在恶意模型中是计算不安全的。我们首先通过安全分析找出各自共享身份验证中的漏洞。然后，我们提出了两种线性时间算法，利用识别出的漏洞从原始份额生成无效份额。在分析的两种SIS方案中，这些生成的无效共享能够通过各自的身份验证。此外，在秘密图像重建过程中使用生成的无效共享代替原始共享会导致秘密图像失真。最后，我们提供的实验结果符合安全性分析的推断和所提出的无效共享生成算法的线性时间复杂度。

{"title":"Security Analysis of Visual based Share Authentication and Algorithms for Invalid Shares Generation in Malicious Model","authors":"K. Bhat, D. Jinwala, Y. Prasad, M. Zaveri","doi":"10.1145/3582177.3582192","DOIUrl":"https://doi.org/10.1145/3582177.3582192","url":null,"abstract":"Several (k, n) Secret Image Sharing (SIS) schemes supporting authentication of reconstructed secret image and that of shares are proposed in the past. In these existing schemes, two similar state-of-the-art SIS schemes performing visual based share authentication have following merits compared to other schemes: no restriction on values of k and n to be 2, linear time complexity of share authentication, lossless secret image reconstruction, no pixel expansion, and support for share authentication in both dealer participatory and non-participatory environments. In this paper, we show that respective share authentication in these two similar state-of-the-art SIS schemes is computationally insecure in malicious model. We first identify the vulnerabilities in their respective share authentication through security analysis. Then, we propose two linear time algorithms for generating invalid shares from original shares by exploiting the identified vulnerabilities. These generated invalid shares are capable of passing respective authentication in the two analyzed SIS schemes. In addition, usage of a generated invalid share in place of original share during secret image reconstruction results in distorted secret image. Finally, we provide experimental results that accord with inferences of security analysis and linear time complexity of the proposed algorithms for invalid shares generation.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127416215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Strategies of Multi-Step-ahead Forecasting for Chaotic Time Series using Autoencoder and LSTM Neural Networks: A Comparative Study 基于自编码器和LSTM神经网络的混沌时间序列多步超前预测策略的比较研究

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582187

Ngoc Phien Nguyen, T. Duong, Platos Jan

There has been a lot of research on the use of deep neural networks in forecasting time series and chaotic time series data. However, there exist very few works on multi-step ahead forecasting in chaotic time series using deep neural networks. Several strategies that deal with multi-step-ahead forecasting problem have been proposed in literature: recursive (or iterated) strategy, direct strategy, a combination of both the recursive and direct strategies, called DirRec, the Multiple-Input Multiple-Output (MIMO) strategy, and the fifth strategy, called DirMO which combines Direct and MIMO strategies. This paper aims to propose a new deep learning model for chaotic time series forecasting: LSTM-based stacked autoencoder and answer the research question: which strategy for multi-step ahead forecasting using LSTM-based stacked autoencoder yields the best performance for chaotic time series. We evaluated and compared in terms of two performance criteria: Root-Mean-Square Error (RMSE) and Mean-Absolute-Percentage Error (MAPE). The experimental results on synthetic and real-world chaotic time series datasets reveal that MIMO strategy provides the best predictive accuracy for chaotic time series forecasting using LSTM-based stacked autoencoder.

利用深度神经网络对时间序列和混沌时间序列数据进行预测已经有了大量的研究。然而，利用深度神经网络对混沌时间序列进行多步预测的研究却很少。文献中提出了几种处理多步超前预测问题的策略:递归(或迭代)策略，直接策略，递归和直接策略的组合，称为DirRec，多输入多输出(MIMO)策略，第五种策略，称为DirMO，结合了直接和MIMO策略。本文旨在提出一种新的混沌时间序列预测深度学习模型:基于lstm的堆叠自编码器，并回答研究问题:使用基于lstm的堆叠自编码器进行多步超前预测，哪种策略对混沌时间序列的预测效果最好?我们根据两个性能标准进行评估和比较:均方根误差(RMSE)和平均绝对百分比误差(MAPE)。在合成混沌时间序列数据集和实际混沌时间序列数据集上的实验结果表明，MIMO策略对基于lstm的堆叠自编码器的混沌时间序列预测具有最佳的预测精度。

{"title":"Strategies of Multi-Step-ahead Forecasting for Chaotic Time Series using Autoencoder and LSTM Neural Networks: A Comparative Study","authors":"Ngoc Phien Nguyen, T. Duong, Platos Jan","doi":"10.1145/3582177.3582187","DOIUrl":"https://doi.org/10.1145/3582177.3582187","url":null,"abstract":"There has been a lot of research on the use of deep neural networks in forecasting time series and chaotic time series data. However, there exist very few works on multi-step ahead forecasting in chaotic time series using deep neural networks. Several strategies that deal with multi-step-ahead forecasting problem have been proposed in literature: recursive (or iterated) strategy, direct strategy, a combination of both the recursive and direct strategies, called DirRec, the Multiple-Input Multiple-Output (MIMO) strategy, and the fifth strategy, called DirMO which combines Direct and MIMO strategies. This paper aims to propose a new deep learning model for chaotic time series forecasting: LSTM-based stacked autoencoder and answer the research question: which strategy for multi-step ahead forecasting using LSTM-based stacked autoencoder yields the best performance for chaotic time series. We evaluated and compared in terms of two performance criteria: Root-Mean-Square Error (RMSE) and Mean-Absolute-Percentage Error (MAPE). The experimental results on synthetic and real-world chaotic time series datasets reveal that MIMO strategy provides the best predictive accuracy for chaotic time series forecasting using LSTM-based stacked autoencoder.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131485503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Deep 3D-2D Convolutional Neural Networks Combined With Mobinenetv2 For Hyperspectral Image Classification 结合Mobinenetv2的深度3D-2D卷积神经网络用于高光谱图像分类

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582185

DouglasOmwenga Nyabuga

Convolutional neural networks (CNNs), one of the most successful models for visual identification, have shown excellent performance outcomes in different visual recognition challenges, attracting much interest in recent years. However, deploying CNN models to hyperspectral imaging (HSI) data continues to be a challenge due to the strongly correlated bands and insufficient training sets. Furthermore, HSI categorization is hugely dependent on spectral-spatial information. Hence, a 2D-CNN is a possible technique to analyze these features. However, because of the volume and spectral dimensions, a 3D CNN can be an option but is more computationally expensive. Furthermore, the models underperform in areas with comparable spectrums due to their inability to extract feature maps of high quality. This work, therefore, proposes a 3D/2D CNN combined with the MobineNetV2 model that uses both spectral-spatial feature maps to achieve competitive performance. First, the HSI data cube is split into small overlapping 3-D batches using the principal component analysis (PCA) to get the desired dimensions. These batches are then processed to build 3-D feature maps over many contiguous bands using a 3D convolutional kernel function, which retains the spectral properties. The performance of our model is validated using three benchmark HSI data sets (i.e., Pavia University, Indian Pines, and Salinas Scene). The results are then compared with different state-of-the-art (SOTA) methods.

卷积神经网络(Convolutional neural networks, cnn)是视觉识别领域最成功的模型之一，近年来在不同的视觉识别挑战中表现出优异的性能，引起了人们的广泛关注。然而，将CNN模型部署到高光谱成像(HSI)数据中仍然是一个挑战，因为波段相关性强，训练集不足。此外，恒指分类很大程度上依赖于光谱空间信息。因此，2D-CNN是分析这些特征的一种可能的技术。然而，由于体积和光谱尺寸，3D CNN可以作为一种选择，但计算成本更高。此外，由于无法提取高质量的特征图，这些模型在具有可比光谱的区域中表现不佳。因此，这项工作提出了一个3D/2D CNN与MobineNetV2模型相结合，该模型使用光谱空间特征映射来获得具有竞争力的性能。首先，使用主成分分析(PCA)将HSI数据立方体分割成小的重叠的3-D批次，以获得所需的维度。然后使用保留光谱特性的3D卷积核函数对这些批次进行处理，在许多连续的波段上构建3D特征图。使用三个基准HSI数据集(即Pavia University, Indian Pines和Salinas Scene)验证了我们模型的性能。然后将结果与不同的最先进(SOTA)方法进行比较。

{"title":"Deep 3D-2D Convolutional Neural Networks Combined With Mobinenetv2 For Hyperspectral Image Classification","authors":"DouglasOmwenga Nyabuga","doi":"10.1145/3582177.3582185","DOIUrl":"https://doi.org/10.1145/3582177.3582185","url":null,"abstract":"Convolutional neural networks (CNNs), one of the most successful models for visual identification, have shown excellent performance outcomes in different visual recognition challenges, attracting much interest in recent years. However, deploying CNN models to hyperspectral imaging (HSI) data continues to be a challenge due to the strongly correlated bands and insufficient training sets. Furthermore, HSI categorization is hugely dependent on spectral-spatial information. Hence, a 2D-CNN is a possible technique to analyze these features. However, because of the volume and spectral dimensions, a 3D CNN can be an option but is more computationally expensive. Furthermore, the models underperform in areas with comparable spectrums due to their inability to extract feature maps of high quality. This work, therefore, proposes a 3D/2D CNN combined with the MobineNetV2 model that uses both spectral-spatial feature maps to achieve competitive performance. First, the HSI data cube is split into small overlapping 3-D batches using the principal component analysis (PCA) to get the desired dimensions. These batches are then processed to build 3-D feature maps over many contiguous bands using a 3D convolutional kernel function, which retains the spectral properties. The performance of our model is validated using three benchmark HSI data sets (i.e., Pavia University, Indian Pines, and Salinas Scene). The results are then compared with different state-of-the-art (SOTA) methods.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130504800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tracking of Artillery Shell using Optical Flow 利用光流跟踪炮弹

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582180

A. Masum, Shad Reza, N. Sharmin, Md. Mahbubur Rahman

The effectiveness of the Artillery shell is measured by its precision of hitting the target. During hitting the target, the intended firing path can be affected by the inherited source as well as external factors like wind, temperature changes, etc. Thus tracking such a path becomes crucial. In this paper, we address the problem of artillery shell tracking with optical flow and adopted the solution for it with sparse and dense optical flow. The outcome of this work can be used in test firing to determine the accuracy of the artillery target. Experimental findings show that the proposed methodology is more effective than the current military artillery unguided shell’s impact point locating, even if the shell ended up being an unexplored or blind shell.

炮弹的效力是由它击中目标的精度来衡量的。在击中目标的过程中，预定的射击路径会受到遗传源以及风、温度变化等外部因素的影响。因此，追踪这样的路径变得至关重要。本文研究了利用光流进行炮弹跟踪的问题，采用了稀疏和密集光流的解决方案。研究结果可用于试验射击，以确定火炮目标的精度。实验结果表明，所提出的方法比当前军用火炮非制导炮弹的弹着点定位更有效，即使炮弹最终是一个未探测或盲弹。

引用次数: 0

A View Direction-Driven Approach for Automatic Room Mapping in Mixed Reality 混合现实中基于视图方向驱动的房间自动映射方法

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582183

Dong Jun Kim, Wanwan Li

Virtual Reality and Augmented Reality technologies have greatly improved recently, and developers are trying to make the experience as realistic as possible and close the gap between the physical world and the virtual world. In this paper, we propose an efficient and intuitive method to create an immersive Mixed Reality environment by automatically mapping your room. Our method is view direction driven, which allows the users to simply “look at” any indoor space to create a 3-dimensional model of the area the user is located in. This approach is easier and more intuitive for the users to use and reduces the time and effort compared to other MR environment generating methods. We use the Meta Quest 2’s cameras and gyroscope sensor and the Unity engine for the ray casting and the passthrough API. We will present the mathematical details of our method and show that the proposed method achieves better results than previous methods through the user study results.

虚拟现实和增强现实技术最近有了很大的进步，开发人员正在努力使体验尽可能逼真，缩小物理世界和虚拟世界之间的差距。在本文中，我们提出了一种高效直观的方法，通过自动映射您的房间来创建沉浸式混合现实环境。我们的方法是视图方向驱动，这允许用户简单地“看”任何室内空间，以创建用户所在区域的三维模型。与其他MR环境生成方法相比，这种方法对用户来说更容易、更直观，并且减少了时间和精力。我们使用Meta Quest 2的相机和陀螺仪传感器以及Unity引擎进行光线投射和直通API。我们将介绍我们的方法的数学细节，并通过用户研究结果表明，所提出的方法比以前的方法取得了更好的结果。

引用次数: 0

Attention Based BiGRU-2DCNN with Hunger Game Search Technique for Low-Resource Document-Level Sentiment Classification 基于注意力和饥饿游戏搜索的BiGRU-2DCNN低资源文档级情感分类

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582186

V. K. Agbesi, Wenyu Chen, Sintayehu M. Gizaw, C. Ukwuoma, Abush S. Ameneshewa, C. Ejiyi

Extracting features with deep learning models recorded considerable performance in classifying various sentiments. However, modeling lengthy sentences from low-resource document-level datasets, and examining the semantic relationships between these sentences is challenging. Also, recent document representation modules deployed rarely distinguish between the significance of various sentiment features from general features. To solve these challenges, we exploit a neural network-based feature extraction technique. The technique exploits an attention-based two-layer bidirectional gated recurrent unit with a two-dimension convolutional neural network (AttnBiGRU-2DCNN). Specifically, the Bi-GRU layers extract compositional semantics of the low resource document. These layers generate the semantic feature vector of the sentence and present the documents in a matrix form with a time-step dimension and a feature dimension. Due to the high dimensional multiple features generated, we propose a Hunger Games Search Algorithm to select essential features and subdue unnecessary features to increase classification accuracy. Extensive experiments on two low-resourced datasets indicate that the proposed method captures low-resource sentimental relations and also outperformed known state-of-the-art approaches.

使用深度学习模型提取特征在分类各种情绪方面取得了可观的表现。然而，从低资源文档级数据集中建模长句子，并检查这些句子之间的语义关系是具有挑战性的。此外，最近部署的文档表示模块很少区分各种情感特征和一般特征的重要性。为了解决这些问题，我们利用了一种基于神经网络的特征提取技术。该技术利用基于注意力的双层双向门控循环单元和二维卷积神经网络(AttnBiGRU-2DCNN)。具体来说，Bi-GRU层提取低资源文档的组合语义。这些层生成句子的语义特征向量，并以具有时间步维和特征维的矩阵形式表示文档。针对生成的高维多特征，我们提出了一种饥饿游戏搜索算法来选择必要的特征，抑制不必要的特征，以提高分类精度。在两个低资源数据集上进行的大量实验表明，所提出的方法捕获了低资源情感关系，并且优于已知的最先进的方法。

{"title":"Attention Based BiGRU-2DCNN with Hunger Game Search Technique for Low-Resource Document-Level Sentiment Classification","authors":"V. K. Agbesi, Wenyu Chen, Sintayehu M. Gizaw, C. Ukwuoma, Abush S. Ameneshewa, C. Ejiyi","doi":"10.1145/3582177.3582186","DOIUrl":"https://doi.org/10.1145/3582177.3582186","url":null,"abstract":"Extracting features with deep learning models recorded considerable performance in classifying various sentiments. However, modeling lengthy sentences from low-resource document-level datasets, and examining the semantic relationships between these sentences is challenging. Also, recent document representation modules deployed rarely distinguish between the significance of various sentiment features from general features. To solve these challenges, we exploit a neural network-based feature extraction technique. The technique exploits an attention-based two-layer bidirectional gated recurrent unit with a two-dimension convolutional neural network (AttnBiGRU-2DCNN). Specifically, the Bi-GRU layers extract compositional semantics of the low resource document. These layers generate the semantic feature vector of the sentence and present the documents in a matrix form with a time-step dimension and a feature dimension. Due to the high dimensional multiple features generated, we propose a Hunger Games Search Algorithm to select essential features and subdue unnecessary features to increase classification accuracy. Extensive experiments on two low-resourced datasets indicate that the proposed method captures low-resource sentimental relations and also outperformed known state-of-the-art approaches.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"30 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124682997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

VVC Coding Unit Partitioning Decision based on Naive Bayes Theory 基于朴素贝叶斯理论的VVC编码单元划分决策

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582188

Leiming Duan, Xiantao Jiang, W. Li, Jiayuan Jin, Tian Song, F. Yu

Versatile Video Coding (VVC) is the latest video coding standard, which uses a hybrid coding model. VVC achieves 50% bitrate saving compared with High Efficiency Video Coding (HEVC) standard. However, the encoding complexity of VVC is higher. In this work, a fast partition decision algorithm is proposed to reduce the encoding complexity of VVC, and the CU splitting or no splitting is modeled as a binary classification problem based on Naive Bayes theory. This method has good performance and balances encoding efficiency and encoding complexity. Experimental results show that, compared with the VVC reference software model, the proposed algorithm can reduce encoding time by 48.00%, while the loss of the BD-rate is only 1.69%.

通用视频编码(VVC)是采用混合编码模型的最新视频编码标准。与HEVC (High Efficiency Video Coding)标准相比，VVC可以节省50%的比特率。但是，VVC的编码复杂度较高。本文提出了一种快速分割决策算法来降低VVC的编码复杂度，并将CU分割或不分割问题建模为基于朴素贝叶斯理论的二分类问题。该方法在编码效率和编码复杂度之间取得了良好的平衡。实验结果表明，与VVC参考软件模型相比，该算法可将编码时间缩短48.00%，而bd率的损失仅为1.69%。

引用次数: 0

A Fast CU Partitioning Algorithm Based on Texture Characteristics for VVC 一种基于纹理特征的VVC快速CU划分算法

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582193

W. Li, Xiantao Jiang, Jiayuan Jin, Tian Song, F. Yu

Abstract: Different from the traditional quaternary tree (QT) structure utilized in the previous generation video coding standard H.265/HEVC, a new partition structure named quadtree with nested multi-type tree (QTMT) is applied in the latest codec H.266/VVC. The introduction of QTMT brings in superior encoding performance at the cost of great time-consuming. Therefore, this paper proposes a fast coding unit (CU) partitioning algorithm based on CU texture complexity and texture direction. First, we terminate further splitting of a CU when its texture is judged as simple. Then, we use the gray level co-occurrence matrix (GLCM) to extract the texture direction of the block to decide whether to partition this CU by QT, thus terminating further MT partitions. Finally, a final partition type is selected from the four MT partitions in combination with the multi-level texture complexity and texture direction of the block. The simulation results show that the overall algorithm can significantly reduce the encoding time, while the loss of coding efficiency is reasonably low. In comparison with the reference model, the encoding time is reduced by up to 44.71%, while the BDBR is increased by only 0.84% on average.

摘要:与上一代视频编码标准H.265/HEVC采用的传统四元树(QT)结构不同，最新编解码器H.266/VVC采用了一种新的分区结构——嵌套多类型树四叉树(QTMT)。QTMT的引入以耗费大量时间为代价，带来了优越的编码性能。为此，本文提出了一种基于CU纹理复杂度和纹理方向的快速编码单元(CU)划分算法。首先，当一个CU的纹理被判断为简单时，我们终止进一步的分割。然后，我们使用灰度共生矩阵(GLCM)提取块的纹理方向，以决定是否对该CU进行QT分区，从而终止进一步的MT分区。最后，结合块的多层次纹理复杂度和纹理方向，从四个MT分区中选择最终的分区类型。仿真结果表明，整体算法可以显著缩短编码时间，同时编码效率的损失也比较低。与参考模型相比，编码时间缩短了44.71%，而BDBR平均仅提高了0.84%。

{"title":"A Fast CU Partitioning Algorithm Based on Texture Characteristics for VVC","authors":"W. Li, Xiantao Jiang, Jiayuan Jin, Tian Song, F. Yu","doi":"10.1145/3582177.3582193","DOIUrl":"https://doi.org/10.1145/3582177.3582193","url":null,"abstract":"Abstract: Different from the traditional quaternary tree (QT) structure utilized in the previous generation video coding standard H.265/HEVC, a new partition structure named quadtree with nested multi-type tree (QTMT) is applied in the latest codec H.266/VVC. The introduction of QTMT brings in superior encoding performance at the cost of great time-consuming. Therefore, this paper proposes a fast coding unit (CU) partitioning algorithm based on CU texture complexity and texture direction. First, we terminate further splitting of a CU when its texture is judged as simple. Then, we use the gray level co-occurrence matrix (GLCM) to extract the texture direction of the block to decide whether to partition this CU by QT, thus terminating further MT partitions. Finally, a final partition type is selected from the four MT partitions in combination with the multi-level texture complexity and texture direction of the block. The simulation results show that the overall algorithm can significantly reduce the encoding time, while the loss of coding efficiency is reasonably low. In comparison with the reference model, the encoding time is reduced by up to 44.71%, while the BDBR is increased by only 0.84% on average.","PeriodicalId":170327,"journal":{"name":"Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124442472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automatic Gait Gender Classification Using Convolutional Neural Networks 基于卷积神经网络的步态性别自动分类

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

Pub Date : 2023-01-13 DOI: 10.1145/3582177.3582184

L. Srinivasan

In this study, automatic gait gender classification using convolutional neural networks includes three phases: i) human gait signature generation, ii) which convolves the gait energy images with filters for feature extraction and iii) classified using feed-forward convolutional neural networks. Analysed performance of Gabor and Log Gabor features using classification accuracy. The Log Gabor filter's accuracy was 92.11% for the Normal vs Normal dataset, 74.14% for the Normal vs Bag dataset, 46.55% for the Normal vs Coat dataset, 72.41% for the Normal vs Case dataset and whiles Gabor filter's accuracy was 75% for the Normal vs Normal dataset, 60.34% for the Normal vs Bag dataset 65.52% for the Normal vs Coat dataset and 55.17% for the Normal vs Case dataset.

在本研究中，基于卷积神经网络的步态性别自动分类包括三个阶段:1)人体步态特征生成;2)对步态能量图像进行卷积滤波进行特征提取;3)利用前馈卷积神经网络进行分类。利用分类精度分析了Gabor和Log Gabor特征的性能。Log Gabor过滤器对于Normal vs Normal数据集的准确率为92.11%，对于Normal vs Bag数据集的准确率为74.14%，对于Normal vs Coat数据集的准确率为46.55%，对于Normal vs Case数据集的准确率为72.41%，而对于Normal vs Normal数据集的准确率为75%，对于Normal vs Bag数据集的准确率为60.34%，对于Normal vs Coat数据集的准确率为65.52%，对于Normal vs Case数据集的准确率为55.17%。

引用次数: 2

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 2023 5th International Conference on Image Processing and Machine Vision

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀