2022 International Workshop on Intelligent Systems (IWIS)最新文献

英文中文

Skeleton-based Recognition of Pedestrian Crossing Intention using Attention Graph Neural Networks 基于骨架的注意图神经网络行人过马路意图识别

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920850

M. Le, Truong-Dong Do, Minh-Thien Duong, Tran-Nhat-Minh Ta, Van-Binh Nguyen, M. Le

Besides the ability to automatically detect and localize on the road, self-driving cars need to observe and understand pedestrian attention to ensure safe operations. In this study, a compact skeleton-based method to predict pedestrian crossing intention is presented. The skeleton data is first extracted using a state-of-the-art pose estimation method. Then, the proposed approach combines graph neural networks, self-attention mechanisms, and temporal convolutions to create distinctive representations of pedestrian moving skeleton sequences. The crossing intention of people is classified based on the extracted features. The experiments demonstrate competitive results with previous methods on the public JAAD dataset.

除了在道路上自动检测和定位的能力外，自动驾驶汽车还需要观察和理解行人的注意力，以确保安全运行。本文提出了一种基于骨架的行人过马路意图预测方法。首先使用最先进的姿态估计方法提取骨骼数据。然后，该方法结合了图神经网络、自注意机制和时间卷积来创建行人运动骨架序列的独特表示。根据提取的特征对人的交叉意图进行分类。在公开的JAAD数据集上，实验证明了与先前方法的竞争结果。

引用次数: 1

Efficient Residual Bottleneck for Object Detection on CPU 基于CPU的高效剩余瓶颈目标检测

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920946

Jinsu An, M. D. Putro, K. Jo

Object detection is the most fundamental and important task in computer vision. With the development of hardware such as computing power of GPUs and cameras, object detection technology is gradually improving. However, there are many difficulties in using GPUs in industrial fields. Therefore, it is very important to use efficient deep learning technology in the CPU environment. In this paper, we propose a deep learning model that can detect objects in real-time from images and videos using CPU. By modifying the CSP [1] bottleneck, which corresponds to the backbone of YOLOv5 [2], an experiment was conducted to reduce the amount of computation and improve the FPS. The model was trained using the MS COCO dataset, and compared with the original YOLOv5, the number of parameters was reduced by about 2.4%, and compared with RefineDetLite, the mAP value was measured to be 0.367 mAP, which is 0.071 higher than that of RefineDetLite. The FPS was 23.010, which was sufficient for real-time object detection.

目标检测是计算机视觉中最基础、最重要的任务。随着gpu、摄像头等硬件的发展，目标检测技术也在逐步提高。然而，gpu在工业领域的应用存在许多困难。因此，在CPU环境下使用高效的深度学习技术是非常重要的。在本文中，我们提出了一种利用CPU从图像和视频中实时检测物体的深度学习模型。通过修改与YOLOv5[2]骨干网对应的CSP[1]瓶颈，进行了减少计算量和提高FPS的实验。使用MS COCO数据集对模型进行训练，与原始的YOLOv5相比，参数数量减少了约2.4%，与RefineDetLite相比，mAP值测量为0.367 mAP，比RefineDetLite高0.071。FPS为23.010，足以实现实时目标检测。

引用次数: 4

Development of Web-based Metaverse Platform 基于web的元宇宙平台的开发

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920930

Junmyeong Kim, Changhyeon Jeong, Kanghyun Jo

Most of the metaverses need to install programs or buy some equipment for entering the world. These processes reduce the accessibility of metaverse. To improve accessibility, this work develops a metaverse for service on the web environment using Unity and web development. The developed web-based metaverse has two advantages. The first advantage is increasing accessibility, when the metaverse is servicing on a web environment, users can connect to the world by typing the specific Uniform Resource Locator (URL) in an address bar. The second advantage is to mix the advantages of Unity and web development. Unity has many methods and assets for generating and controlling the metaverse, and web development has the good ability for communicating between browser and server, controlling information in the database, etc. To interact between Unity and the web, React was used in this work. React provides an Application Programming Interface (API) for interacting between Unity and the web. API makes the metaverse can use functions of the web in Unity. In summary, the goal of this paper is to build a web-based metaverse platform for improving the accessibility of metaverse. To achieve this goal, this work used Unity, Photon, Socket.IO, React, Node.js, MongoDB, and Express for building a more accessible metaverse. For accessing, this work built metaverse, the URL is provided as https://busanmayor.org/.

大多数元玩家需要安装程序或购买一些设备才能进入世界。这些过程降低了元数据的可访问性。为了提高可访问性，这项工作使用Unity和web开发在web环境上开发了一个服务的元数据库。开发的基于web的元宇宙有两个优势。第一个优点是增加了可访问性，当元世界在web环境中提供服务时，用户可以通过在地址栏中输入特定的统一资源定位符(URL)来连接到世界。第二个优势是结合了Unity和web开发的优势。Unity有许多生成和控制元世界的方法和资产，web开发具有良好的浏览器和服务器之间的通信，控制数据库中的信息等能力。为了在Unity和web之间进行交互，在这项工作中使用了React。React为Unity和web之间的交互提供了一个应用程序编程接口(API)。API使得meta可以在Unity中使用web的功能。综上所述，本文的目标是构建一个基于web的元宇宙平台，以提高元宇宙的可访问性。为了实现这个目标，这项工作使用了Unity, Photon, Socket。IO, React, Node.js, MongoDB和Express来构建一个更容易访问的元数据库。为了访问这个工作构建的元数据库，URL提供为https://busanmayor.org/。

{"title":"Development of Web-based Metaverse Platform","authors":"Junmyeong Kim, Changhyeon Jeong, Kanghyun Jo","doi":"10.1109/IWIS56333.2022.9920930","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920930","url":null,"abstract":"Most of the metaverses need to install programs or buy some equipment for entering the world. These processes reduce the accessibility of metaverse. To improve accessibility, this work develops a metaverse for service on the web environment using Unity and web development. The developed web-based metaverse has two advantages. The first advantage is increasing accessibility, when the metaverse is servicing on a web environment, users can connect to the world by typing the specific Uniform Resource Locator (URL) in an address bar. The second advantage is to mix the advantages of Unity and web development. Unity has many methods and assets for generating and controlling the metaverse, and web development has the good ability for communicating between browser and server, controlling information in the database, etc. To interact between Unity and the web, React was used in this work. React provides an Application Programming Interface (API) for interacting between Unity and the web. API makes the metaverse can use functions of the web in Unity. In summary, the goal of this paper is to build a web-based metaverse platform for improving the accessibility of metaverse. To achieve this goal, this work used Unity, Photon, Socket.IO, React, Node.js, MongoDB, and Express for building a more accessible metaverse. For accessing, this work built metaverse, the URL is provided as https://busanmayor.org/.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126473718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Implementation of P&O MPPT Control for Standalone PV System and Its Efficiency Analysis 独立光伏系统P&O MPPT控制的实现及其效率分析

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920779

Yicheng Zhou, Jin-Hua She, R. Yokoyama, Y. Nakanishi

MPPT Control is significantly affected by the switching frequency which is mainly controlled by the PWM for the DC/DC converter. In this paper, the P&O MPPT control is directly implemented using MATLAB/Simulink tool with Simscape library. For understanding the relationship between the efficiency and switching frequency, this paper also gives a simple test case for verification of its useable.

MPPT控制受开关频率的影响很大，而开关频率主要由DC/DC变换器的PWM控制。本文利用MATLAB/Simulink工具和Simscape库直接实现了P&O MPPT控制。为了理解效率与开关频率之间的关系，本文还给出了一个简单的测试用例来验证其实用性。

引用次数: 1

BiSeNet with Depthwise Attention Spatial Path for Semantic Segmentation 基于深度注意空间路径的BiSeNet语义分割

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920717

S. Kim, Kanghyun Jo

This paper proposes a new structure to obtain similar results while reducing the computational amount of BiSeNet for Real-Time Semantic Segmentation. Among the Spatial Path and Context Path of BiSeNet, the study was conducted focusing on the large size kernel of the Spatial Path. Spatial Path has rich spatial information by creating a feature map 1/8 times the size of the original image through three convolution operations. The convolution operation used at this time is performed in the order of 7×7, 3×3, and 3×3. When a general convolution is used for a kernel of such a large size, the calculated cost increases due to a large number of parameters. To solve this problem, this paper uses Depthwise Separable Convolution. At this time, in Depthwise Separable Convolution, loss occurs in Spatial Information. To solve this information loss, an attention mechanism [1] was applied by elementwise summing between the input and output feature maps of depthwise separable convolution. To solve the dimensional difference between input and output, PPM: Pooling Pointwise Module is used. PPM uses Maxpooling to change the Spatial Dimension of input features and Channel Dimension through Pointwise Convolution (lx1 Convolution) [2]. This paper propose to use Depthwise Attention Spatial Path for BiSeNet using these methods. Through our proposed methods, mIoU in SS, SSC, MSF, and MSCF were 72.7%, 74.1 %, 74.3%, and 76.1 %. Proposed network can segment the part that the original one can't when using our Depthwise Attention Spatial Path.

本文提出了一种新的结构来获得相似的结果，同时减少了BiSeNet实时语义分割的计算量。在BiSeNet的空间路径和上下文路径中，重点研究了空间路径的大尺寸核。空间路径通过三次卷积运算，生成大小为原图像1/8倍的特征图，具有丰富的空间信息。此时使用的卷积运算按7×7、3×3、3×3的顺序执行。当对如此大的核使用一般卷积时，由于大量的参数，计算成本会增加。为了解决这一问题，本文采用了深度可分离卷积。此时，在深度可分卷积中，空间信息发生了损失。为了解决这种信息丢失问题，我们采用了一种注意力机制[1]，将深度可分离卷积的输入和输出特征映射进行元素求和。为了解决输入和输出之间的尺寸差异，使用PPM: Pooling Pointwise Module。PPM使用Maxpooling通过Pointwise Convolution (lx1 Convolution)改变输入特征的Spatial Dimension和Channel Dimension[2]。在此基础上，本文提出了对BiSeNet进行深度注意空间路径的方法。通过我们提出的方法，SS、SSC、MSF和MSCF的mIoU分别为72.7%、74.1%、74.3%和76.1%。利用我们的深度注意空间路径，我们提出的网络可以分割原有网络无法分割的部分。

{"title":"BiSeNet with Depthwise Attention Spatial Path for Semantic Segmentation","authors":"S. Kim, Kanghyun Jo","doi":"10.1109/IWIS56333.2022.9920717","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920717","url":null,"abstract":"This paper proposes a new structure to obtain similar results while reducing the computational amount of BiSeNet for Real-Time Semantic Segmentation. Among the Spatial Path and Context Path of BiSeNet, the study was conducted focusing on the large size kernel of the Spatial Path. Spatial Path has rich spatial information by creating a feature map 1/8 times the size of the original image through three convolution operations. The convolution operation used at this time is performed in the order of 7×7, 3×3, and 3×3. When a general convolution is used for a kernel of such a large size, the calculated cost increases due to a large number of parameters. To solve this problem, this paper uses Depthwise Separable Convolution. At this time, in Depthwise Separable Convolution, loss occurs in Spatial Information. To solve this information loss, an attention mechanism [1] was applied by elementwise summing between the input and output feature maps of depthwise separable convolution. To solve the dimensional difference between input and output, PPM: Pooling Pointwise Module is used. PPM uses Maxpooling to change the Spatial Dimension of input features and Channel Dimension through Pointwise Convolution (lx1 Convolution) [2]. This paper propose to use Depthwise Attention Spatial Path for BiSeNet using these methods. Through our proposed methods, mIoU in SS, SSC, MSF, and MSCF were 72.7%, 74.1 %, 74.3%, and 76.1 %. Proposed network can segment the part that the original one can't when using our Depthwise Attention Spatial Path.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126413648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Low Computational Vehicle Lane Changing Prediction Using Drone Traffic Dataset 基于无人机交通数据集的低计算车辆变道预测

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920801

Youlkyeong Lee, Qing Tang, Jehwan Choi, Kanghyun Jo

Safe autonomous driving assistance systems are actively being developed based on Convolutional Neural Network (CNN). Unlike understanding the road environment through the image viewed from the existing vehicle, it has the advantage of a drone image that can see a large area at once. It is used as safe driving assistance information by understanding the movements of various vehicles and predicting movement information according to time. In this paper, vehicle movement is predicted using LSTM by extracting vehicle time series information. Use YOLOv5 to detect the vehicle on the road. Road areas are collected as drone flight images. YOLOv5 is learned by labeling the vehicle through the collected image. Time-series vehicle movement information is extracted from the detected vehicle and the movement of each vehicle is predicted using the LSTM model. The predicted vehicle information is represented by an error through the MSE.

以卷积神经网络(CNN)为基础，正在积极开发安全的自动驾驶辅助系统。与通过现有车辆的图像来了解道路环境不同，它具有无人机图像的优势，可以一次看到大面积的区域。通过了解各种车辆的运动，并根据时间预测运动信息，作为安全驾驶辅助信息。本文通过提取车辆时间序列信息，采用LSTM方法对车辆运动进行预测。使用YOLOv5检测道路上的车辆。道路区域被收集为无人机飞行图像。YOLOv5是通过收集到的图像对车辆进行标记来学习的。从检测到的车辆中提取时间序列的车辆运动信息，并利用LSTM模型预测每辆车的运动。预测的车辆信息通过MSE用误差表示。

引用次数: 4

Real-time Train Wagon Counting and Number Recognition Algorithm 实时列车计数及数字识别算法

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920835

A. Vavilin, A. Lomov, Titkov Roman

In this work we present an efficient solution for counting train wagons and recognizing their numbers using deep learning computer vision models. The proposed method is a good alternative for radio-frequency identification (RFID) method in terms of low cost and ease of use. Our system shows 99% accuracy in real-world scenarios, including corrupted wagon numbers and night shooting conditions. At the same time, the proposed method is capable to process video-stream in real-time speed without GPU-acceleration.

在这项工作中，我们提出了一种有效的解决方案，用于计算火车车厢并使用深度学习计算机视觉模型识别它们的数量。该方法具有成本低、易于使用等优点，是射频识别(RFID)方法的一个很好的替代方案。我们的系统在真实场景中显示出99%的准确率，包括损坏的车号和夜间射击条件。同时，该方法能够在不需要gpu加速的情况下实时处理视频流。

引用次数: 0

A Vision-based Hand-sign Language Teaching System using Deep Neural Network: Methodology and Experiments 基于视觉的深度神经网络手语教学系统:方法与实验

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920883

Nghe-Nhan Truong, Truong-Dong Do, Thien Nguyen, Minh-Thien Duong, Thanh-Hai Nguyen, M. Le

In this paper, a real-time hand-sign language teaching system using deep neural network is proposed. Communication presents a significant barrier for persons who are impaired in hearing and speaking. There are various projects and studies have been conducted to create or improve smart systems for this rapid-growth population. Deep learning approaches became widely used to enhance the accuracy of sign language recognition models. However, most research has primarily concentrated on hand gestures for translation, not language self-learning for long-term development. This work aims to construct a complete system to assist deaf and mute people in studying and examining their performance. First, we designed and built a prosthetic arm equipped with a monocular camera using 3D printing. Second, the MediaPipe library was used to extract key points from collected videos of the hand gestures. Then, the Gated Recurrent Units model is trained to recognize words based on the data. The real-time experimental results demonstrate the system's effectiveness and potential with 97 percent accuracy.

本文提出了一种基于深度神经网络的实时手语教学系统。对于有听力和语言障碍的人来说，交流是一个很大的障碍。已经开展了各种项目和研究，为这一快速增长的人口创建或改进智能系统。深度学习方法被广泛用于提高手语识别模型的准确性。然而，大多数研究主要集中在用于翻译的手势，而不是用于长期发展的语言自学。本工作旨在构建一个完整的系统，帮助聋哑人学习和检查自己的表现。首先，我们使用3D打印技术设计并制作了一个装有单目摄像头的义肢。其次，利用MediaPipe库从收集到的手势视频中提取关键点。然后，训练门控循环单元模型来识别基于数据的单词。实时实验结果证明了该系统的有效性和潜力，准确率高达97%。

{"title":"A Vision-based Hand-sign Language Teaching System using Deep Neural Network: Methodology and Experiments","authors":"Nghe-Nhan Truong, Truong-Dong Do, Thien Nguyen, Minh-Thien Duong, Thanh-Hai Nguyen, M. Le","doi":"10.1109/IWIS56333.2022.9920883","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920883","url":null,"abstract":"In this paper, a real-time hand-sign language teaching system using deep neural network is proposed. Communication presents a significant barrier for persons who are impaired in hearing and speaking. There are various projects and studies have been conducted to create or improve smart systems for this rapid-growth population. Deep learning approaches became widely used to enhance the accuracy of sign language recognition models. However, most research has primarily concentrated on hand gestures for translation, not language self-learning for long-term development. This work aims to construct a complete system to assist deaf and mute people in studying and examining their performance. First, we designed and built a prosthetic arm equipped with a monocular camera using 3D printing. Second, the MediaPipe library was used to extract key points from collected videos of the hand gestures. Then, the Gated Recurrent Units model is trained to recognize words based on the data. The real-time experimental results demonstrate the system's effectiveness and potential with 97 percent accuracy.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129645893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Real-time Face Detector on CPU Using Efficient Transformer 基于高效变压器的CPU实时人脸检测器

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920713

M. D. Putro, Adri Priadana, Duy-Linh Nguyen, K. Jo

Face detection is a basic vision method to find the fa-ciallocation. It is usually used in the initial step of advanced facial analysis. Therefore, this approach is required to work quickly, especially on low-cost devices to support practical applications. A deep learning architecture can robustly extract the distinctive feature by employing a lot of weighted filters. However, the model produces heavy parameters and computational complexity. A transformer is a deep learning architecture that can capture the feature position relationship, which increases the detector performance. This work in this paper proposes a new efficient transformer architecture that is implemented to face detection. It can highlight the spatial information from a similarity map by utilizing a 2D-convolutional filter. This architecture generates low computation and lightweight trainable parameters that serve the proposed face detector to run fast on an inexpensive device. As a result, this proposed network achieves high performance and competitive precision with the low-cost model. Additionally, the proposed transformer module does not significantly add computation and parameters that can run fast at 95 frames per second on a Core is CPU.

人脸检测是一种基本的视觉识别方法。它通常用于高级面部分析的初始步骤。因此，这种方法需要快速工作，特别是在低成本设备上支持实际应用。深度学习架构可以通过使用大量的加权滤波器来鲁棒地提取特征。然而，该模型产生了大量的参数和计算复杂度。变压器是一种深度学习架构，可以捕获特征位置关系，从而提高检测器的性能。本文提出了一种新的高效变压器结构，并将其应用于人脸检测。该算法利用二维卷积滤波器突出相似图中的空间信息。该架构产生低计算量和轻量级可训练参数，使所提出的人脸检测器在廉价设备上快速运行。结果表明，该网络在低成本的基础上实现了高性能和具有竞争力的精度。此外，提议的变压器模块不会显著增加计算和参数，可以在Core is CPU上以每秒95帧的速度快速运行。

{"title":"A Real-time Face Detector on CPU Using Efficient Transformer","authors":"M. D. Putro, Adri Priadana, Duy-Linh Nguyen, K. Jo","doi":"10.1109/IWIS56333.2022.9920713","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920713","url":null,"abstract":"Face detection is a basic vision method to find the fa-ciallocation. It is usually used in the initial step of advanced facial analysis. Therefore, this approach is required to work quickly, especially on low-cost devices to support practical applications. A deep learning architecture can robustly extract the distinctive feature by employing a lot of weighted filters. However, the model produces heavy parameters and computational complexity. A transformer is a deep learning architecture that can capture the feature position relationship, which increases the detector performance. This work in this paper proposes a new efficient transformer architecture that is implemented to face detection. It can highlight the spatial information from a similarity map by utilizing a 2D-convolutional filter. This architecture generates low computation and lightweight trainable parameters that serve the proposed face detector to run fast on an inexpensive device. As a result, this proposed network achieves high performance and competitive precision with the low-cost model. Additionally, the proposed transformer module does not significantly add computation and parameters that can run fast at 95 frames per second on a Core is CPU.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125376243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient High-Resolution Network for Human Pose Estimation 高效的高分辨率人体姿态估计网络

2022 International Workshop on Intelligent Systems (IWIS)

Pub Date : 2022-08-17 DOI: 10.1109/IWIS56333.2022.9920796

T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo

Convolution neural networks (CNNs) have achieved the best performance nowadays not just for 2D or 3D pose estimation but also for many machine vision applications (e.g., image classification, semantic segmentation, object detection and so on). Beside, The Attention Module also show their leader for improve the accuracy in neural network. Hence, the proposed research is focus on creating a suitable feed-forward AM for CNNs which can save the computational cost also improve the accuracy. First, input the tensor into the attention mechanism, which is divided into two main part: channel attention module and spatial attention module. After that, the tensor passing through a stage in the backbone network. The main mechanism then multiplies these two feature maps and sends them to the next stage of backbone. The network enhance the data in terms of long-distance dependencies (channels) and geographic data. Our proposed research would also reveal a distinction between the use of the attention mechanism and nowadays approaches. The proposed research got better result when compare with the baseline-HRNet by 1.3 points in terms of AP but maintain the number of parameter not change much. Our architecture was trained on the COCO 2017 dataset, which are now available as an open benchmark.

卷积神经网络(cnn)现在不仅在2D或3D姿态估计方面取得了最好的性能，而且在许多机器视觉应用中(例如，图像分类，语义分割，目标检测等)也取得了最好的性能。此外，注意力模块在提高神经网络的准确率方面也表现出领先地位。因此，本文的研究重点是为cnn创建一种合适的前馈调幅，既可以节省计算成本，又可以提高精度。首先，将张量输入到注意机制中，注意机制主要分为两个部分:通道注意模块和空间注意模块。之后，张量在骨干网络中经过一个阶段。然后，主机制将这两个特征映射相乘，并将它们发送到骨干的下一阶段。网络增强了数据的远距离依赖关系(通道)和地理数据。我们提出的研究还将揭示注意机制的使用与现在的方法之间的区别。与基线- hrnet相比，本研究的AP值提高了1.3个点，但保持参数数量变化不大。我们的架构是在COCO 2017数据集上进行训练的，该数据集现在可以作为开放基准使用。

{"title":"Efficient High-Resolution Network for Human Pose Estimation","authors":"T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo","doi":"10.1109/IWIS56333.2022.9920796","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920796","url":null,"abstract":"Convolution neural networks (CNNs) have achieved the best performance nowadays not just for 2D or 3D pose estimation but also for many machine vision applications (e.g., image classification, semantic segmentation, object detection and so on). Beside, The Attention Module also show their leader for improve the accuracy in neural network. Hence, the proposed research is focus on creating a suitable feed-forward AM for CNNs which can save the computational cost also improve the accuracy. First, input the tensor into the attention mechanism, which is divided into two main part: channel attention module and spatial attention module. After that, the tensor passing through a stage in the backbone network. The main mechanism then multiplies these two feature maps and sends them to the next stage of backbone. The network enhance the data in terms of long-distance dependencies (channels) and geographic data. Our proposed research would also reveal a distinction between the use of the attention mechanism and nowadays approaches. The proposed research got better result when compare with the baseline-HRNet by 1.3 points in terms of AP but maintain the number of parameter not change much. Our architecture was trained on the COCO 2017 dataset, which are now available as an open benchmark.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132473269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2022 International Workshop on Intelligent Systems (IWIS)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀