Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920850
M. Le, Truong-Dong Do, Minh-Thien Duong, Tran-Nhat-Minh Ta, Van-Binh Nguyen, M. Le
Besides the ability to automatically detect and localize on the road, self-driving cars need to observe and understand pedestrian attention to ensure safe operations. In this study, a compact skeleton-based method to predict pedestrian crossing intention is presented. The skeleton data is first extracted using a state-of-the-art pose estimation method. Then, the proposed approach combines graph neural networks, self-attention mechanisms, and temporal convolutions to create distinctive representations of pedestrian moving skeleton sequences. The crossing intention of people is classified based on the extracted features. The experiments demonstrate competitive results with previous methods on the public JAAD dataset.
{"title":"Skeleton-based Recognition of Pedestrian Crossing Intention using Attention Graph Neural Networks","authors":"M. Le, Truong-Dong Do, Minh-Thien Duong, Tran-Nhat-Minh Ta, Van-Binh Nguyen, M. Le","doi":"10.1109/IWIS56333.2022.9920850","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920850","url":null,"abstract":"Besides the ability to automatically detect and localize on the road, self-driving cars need to observe and understand pedestrian attention to ensure safe operations. In this study, a compact skeleton-based method to predict pedestrian crossing intention is presented. The skeleton data is first extracted using a state-of-the-art pose estimation method. Then, the proposed approach combines graph neural networks, self-attention mechanisms, and temporal convolutions to create distinctive representations of pedestrian moving skeleton sequences. The crossing intention of people is classified based on the extracted features. The experiments demonstrate competitive results with previous methods on the public JAAD dataset.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130809517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920946
Jinsu An, M. D. Putro, K. Jo
Object detection is the most fundamental and important task in computer vision. With the development of hardware such as computing power of GPUs and cameras, object detection technology is gradually improving. However, there are many difficulties in using GPUs in industrial fields. Therefore, it is very important to use efficient deep learning technology in the CPU environment. In this paper, we propose a deep learning model that can detect objects in real-time from images and videos using CPU. By modifying the CSP [1] bottleneck, which corresponds to the backbone of YOLOv5 [2], an experiment was conducted to reduce the amount of computation and improve the FPS. The model was trained using the MS COCO dataset, and compared with the original YOLOv5, the number of parameters was reduced by about 2.4%, and compared with RefineDetLite, the mAP value was measured to be 0.367 mAP, which is 0.071 higher than that of RefineDetLite. The FPS was 23.010, which was sufficient for real-time object detection.
{"title":"Efficient Residual Bottleneck for Object Detection on CPU","authors":"Jinsu An, M. D. Putro, K. Jo","doi":"10.1109/IWIS56333.2022.9920946","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920946","url":null,"abstract":"Object detection is the most fundamental and important task in computer vision. With the development of hardware such as computing power of GPUs and cameras, object detection technology is gradually improving. However, there are many difficulties in using GPUs in industrial fields. Therefore, it is very important to use efficient deep learning technology in the CPU environment. In this paper, we propose a deep learning model that can detect objects in real-time from images and videos using CPU. By modifying the CSP [1] bottleneck, which corresponds to the backbone of YOLOv5 [2], an experiment was conducted to reduce the amount of computation and improve the FPS. The model was trained using the MS COCO dataset, and compared with the original YOLOv5, the number of parameters was reduced by about 2.4%, and compared with RefineDetLite, the mAP value was measured to be 0.367 mAP, which is 0.071 higher than that of RefineDetLite. The FPS was 23.010, which was sufficient for real-time object detection.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129578257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920930
Junmyeong Kim, Changhyeon Jeong, Kanghyun Jo
Most of the metaverses need to install programs or buy some equipment for entering the world. These processes reduce the accessibility of metaverse. To improve accessibility, this work develops a metaverse for service on the web environment using Unity and web development. The developed web-based metaverse has two advantages. The first advantage is increasing accessibility, when the metaverse is servicing on a web environment, users can connect to the world by typing the specific Uniform Resource Locator (URL) in an address bar. The second advantage is to mix the advantages of Unity and web development. Unity has many methods and assets for generating and controlling the metaverse, and web development has the good ability for communicating between browser and server, controlling information in the database, etc. To interact between Unity and the web, React was used in this work. React provides an Application Programming Interface (API) for interacting between Unity and the web. API makes the metaverse can use functions of the web in Unity. In summary, the goal of this paper is to build a web-based metaverse platform for improving the accessibility of metaverse. To achieve this goal, this work used Unity, Photon, Socket.IO, React, Node.js, MongoDB, and Express for building a more accessible metaverse. For accessing, this work built metaverse, the URL is provided as https://busanmayor.org/.
{"title":"Development of Web-based Metaverse Platform","authors":"Junmyeong Kim, Changhyeon Jeong, Kanghyun Jo","doi":"10.1109/IWIS56333.2022.9920930","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920930","url":null,"abstract":"Most of the metaverses need to install programs or buy some equipment for entering the world. These processes reduce the accessibility of metaverse. To improve accessibility, this work develops a metaverse for service on the web environment using Unity and web development. The developed web-based metaverse has two advantages. The first advantage is increasing accessibility, when the metaverse is servicing on a web environment, users can connect to the world by typing the specific Uniform Resource Locator (URL) in an address bar. The second advantage is to mix the advantages of Unity and web development. Unity has many methods and assets for generating and controlling the metaverse, and web development has the good ability for communicating between browser and server, controlling information in the database, etc. To interact between Unity and the web, React was used in this work. React provides an Application Programming Interface (API) for interacting between Unity and the web. API makes the metaverse can use functions of the web in Unity. In summary, the goal of this paper is to build a web-based metaverse platform for improving the accessibility of metaverse. To achieve this goal, this work used Unity, Photon, Socket.IO, React, Node.js, MongoDB, and Express for building a more accessible metaverse. For accessing, this work built metaverse, the URL is provided as https://busanmayor.org/.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126473718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920779
Yicheng Zhou, Jin-Hua She, R. Yokoyama, Y. Nakanishi
MPPT Control is significantly affected by the switching frequency which is mainly controlled by the PWM for the DC/DC converter. In this paper, the P&O MPPT control is directly implemented using MATLAB/Simulink tool with Simscape library. For understanding the relationship between the efficiency and switching frequency, this paper also gives a simple test case for verification of its useable.
{"title":"Implementation of P&O MPPT Control for Standalone PV System and Its Efficiency Analysis","authors":"Yicheng Zhou, Jin-Hua She, R. Yokoyama, Y. Nakanishi","doi":"10.1109/IWIS56333.2022.9920779","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920779","url":null,"abstract":"MPPT Control is significantly affected by the switching frequency which is mainly controlled by the PWM for the DC/DC converter. In this paper, the P&O MPPT control is directly implemented using MATLAB/Simulink tool with Simscape library. For understanding the relationship between the efficiency and switching frequency, this paper also gives a simple test case for verification of its useable.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123057034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920717
S. Kim, Kanghyun Jo
This paper proposes a new structure to obtain similar results while reducing the computational amount of BiSeNet for Real-Time Semantic Segmentation. Among the Spatial Path and Context Path of BiSeNet, the study was conducted focusing on the large size kernel of the Spatial Path. Spatial Path has rich spatial information by creating a feature map 1/8 times the size of the original image through three convolution operations. The convolution operation used at this time is performed in the order of 7×7, 3×3, and 3×3. When a general convolution is used for a kernel of such a large size, the calculated cost increases due to a large number of parameters. To solve this problem, this paper uses Depthwise Separable Convolution. At this time, in Depthwise Separable Convolution, loss occurs in Spatial Information. To solve this information loss, an attention mechanism [1] was applied by elementwise summing between the input and output feature maps of depthwise separable convolution. To solve the dimensional difference between input and output, PPM: Pooling Pointwise Module is used. PPM uses Maxpooling to change the Spatial Dimension of input features and Channel Dimension through Pointwise Convolution (lx1 Convolution) [2]. This paper propose to use Depthwise Attention Spatial Path for BiSeNet using these methods. Through our proposed methods, mIoU in SS, SSC, MSF, and MSCF were 72.7%, 74.1 %, 74.3%, and 76.1 %. Proposed network can segment the part that the original one can't when using our Depthwise Attention Spatial Path.
{"title":"BiSeNet with Depthwise Attention Spatial Path for Semantic Segmentation","authors":"S. Kim, Kanghyun Jo","doi":"10.1109/IWIS56333.2022.9920717","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920717","url":null,"abstract":"This paper proposes a new structure to obtain similar results while reducing the computational amount of BiSeNet for Real-Time Semantic Segmentation. Among the Spatial Path and Context Path of BiSeNet, the study was conducted focusing on the large size kernel of the Spatial Path. Spatial Path has rich spatial information by creating a feature map 1/8 times the size of the original image through three convolution operations. The convolution operation used at this time is performed in the order of 7×7, 3×3, and 3×3. When a general convolution is used for a kernel of such a large size, the calculated cost increases due to a large number of parameters. To solve this problem, this paper uses Depthwise Separable Convolution. At this time, in Depthwise Separable Convolution, loss occurs in Spatial Information. To solve this information loss, an attention mechanism [1] was applied by elementwise summing between the input and output feature maps of depthwise separable convolution. To solve the dimensional difference between input and output, PPM: Pooling Pointwise Module is used. PPM uses Maxpooling to change the Spatial Dimension of input features and Channel Dimension through Pointwise Convolution (lx1 Convolution) [2]. This paper propose to use Depthwise Attention Spatial Path for BiSeNet using these methods. Through our proposed methods, mIoU in SS, SSC, MSF, and MSCF were 72.7%, 74.1 %, 74.3%, and 76.1 %. Proposed network can segment the part that the original one can't when using our Depthwise Attention Spatial Path.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126413648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920801
Youlkyeong Lee, Qing Tang, Jehwan Choi, Kanghyun Jo
Safe autonomous driving assistance systems are actively being developed based on Convolutional Neural Network (CNN). Unlike understanding the road environment through the image viewed from the existing vehicle, it has the advantage of a drone image that can see a large area at once. It is used as safe driving assistance information by understanding the movements of various vehicles and predicting movement information according to time. In this paper, vehicle movement is predicted using LSTM by extracting vehicle time series information. Use YOLOv5 to detect the vehicle on the road. Road areas are collected as drone flight images. YOLOv5 is learned by labeling the vehicle through the collected image. Time-series vehicle movement information is extracted from the detected vehicle and the movement of each vehicle is predicted using the LSTM model. The predicted vehicle information is represented by an error through the MSE.
{"title":"Low Computational Vehicle Lane Changing Prediction Using Drone Traffic Dataset","authors":"Youlkyeong Lee, Qing Tang, Jehwan Choi, Kanghyun Jo","doi":"10.1109/IWIS56333.2022.9920801","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920801","url":null,"abstract":"Safe autonomous driving assistance systems are actively being developed based on Convolutional Neural Network (CNN). Unlike understanding the road environment through the image viewed from the existing vehicle, it has the advantage of a drone image that can see a large area at once. It is used as safe driving assistance information by understanding the movements of various vehicles and predicting movement information according to time. In this paper, vehicle movement is predicted using LSTM by extracting vehicle time series information. Use YOLOv5 to detect the vehicle on the road. Road areas are collected as drone flight images. YOLOv5 is learned by labeling the vehicle through the collected image. Time-series vehicle movement information is extracted from the detected vehicle and the movement of each vehicle is predicted using the LSTM model. The predicted vehicle information is represented by an error through the MSE.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122880295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920835
A. Vavilin, A. Lomov, Titkov Roman
In this work we present an efficient solution for counting train wagons and recognizing their numbers using deep learning computer vision models. The proposed method is a good alternative for radio-frequency identification (RFID) method in terms of low cost and ease of use. Our system shows 99% accuracy in real-world scenarios, including corrupted wagon numbers and night shooting conditions. At the same time, the proposed method is capable to process video-stream in real-time speed without GPU-acceleration.
{"title":"Real-time Train Wagon Counting and Number Recognition Algorithm","authors":"A. Vavilin, A. Lomov, Titkov Roman","doi":"10.1109/IWIS56333.2022.9920835","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920835","url":null,"abstract":"In this work we present an efficient solution for counting train wagons and recognizing their numbers using deep learning computer vision models. The proposed method is a good alternative for radio-frequency identification (RFID) method in terms of low cost and ease of use. Our system shows 99% accuracy in real-world scenarios, including corrupted wagon numbers and night shooting conditions. At the same time, the proposed method is capable to process video-stream in real-time speed without GPU-acceleration.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133251879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920883
Nghe-Nhan Truong, Truong-Dong Do, Thien Nguyen, Minh-Thien Duong, Thanh-Hai Nguyen, M. Le
In this paper, a real-time hand-sign language teaching system using deep neural network is proposed. Communication presents a significant barrier for persons who are impaired in hearing and speaking. There are various projects and studies have been conducted to create or improve smart systems for this rapid-growth population. Deep learning approaches became widely used to enhance the accuracy of sign language recognition models. However, most research has primarily concentrated on hand gestures for translation, not language self-learning for long-term development. This work aims to construct a complete system to assist deaf and mute people in studying and examining their performance. First, we designed and built a prosthetic arm equipped with a monocular camera using 3D printing. Second, the MediaPipe library was used to extract key points from collected videos of the hand gestures. Then, the Gated Recurrent Units model is trained to recognize words based on the data. The real-time experimental results demonstrate the system's effectiveness and potential with 97 percent accuracy.
{"title":"A Vision-based Hand-sign Language Teaching System using Deep Neural Network: Methodology and Experiments","authors":"Nghe-Nhan Truong, Truong-Dong Do, Thien Nguyen, Minh-Thien Duong, Thanh-Hai Nguyen, M. Le","doi":"10.1109/IWIS56333.2022.9920883","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920883","url":null,"abstract":"In this paper, a real-time hand-sign language teaching system using deep neural network is proposed. Communication presents a significant barrier for persons who are impaired in hearing and speaking. There are various projects and studies have been conducted to create or improve smart systems for this rapid-growth population. Deep learning approaches became widely used to enhance the accuracy of sign language recognition models. However, most research has primarily concentrated on hand gestures for translation, not language self-learning for long-term development. This work aims to construct a complete system to assist deaf and mute people in studying and examining their performance. First, we designed and built a prosthetic arm equipped with a monocular camera using 3D printing. Second, the MediaPipe library was used to extract key points from collected videos of the hand gestures. Then, the Gated Recurrent Units model is trained to recognize words based on the data. The real-time experimental results demonstrate the system's effectiveness and potential with 97 percent accuracy.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129645893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920713
M. D. Putro, Adri Priadana, Duy-Linh Nguyen, K. Jo
Face detection is a basic vision method to find the fa-ciallocation. It is usually used in the initial step of advanced facial analysis. Therefore, this approach is required to work quickly, especially on low-cost devices to support practical applications. A deep learning architecture can robustly extract the distinctive feature by employing a lot of weighted filters. However, the model produces heavy parameters and computational complexity. A transformer is a deep learning architecture that can capture the feature position relationship, which increases the detector performance. This work in this paper proposes a new efficient transformer architecture that is implemented to face detection. It can highlight the spatial information from a similarity map by utilizing a 2D-convolutional filter. This architecture generates low computation and lightweight trainable parameters that serve the proposed face detector to run fast on an inexpensive device. As a result, this proposed network achieves high performance and competitive precision with the low-cost model. Additionally, the proposed transformer module does not significantly add computation and parameters that can run fast at 95 frames per second on a Core is CPU.
人脸检测是一种基本的视觉识别方法。它通常用于高级面部分析的初始步骤。因此,这种方法需要快速工作,特别是在低成本设备上支持实际应用。深度学习架构可以通过使用大量的加权滤波器来鲁棒地提取特征。然而,该模型产生了大量的参数和计算复杂度。变压器是一种深度学习架构,可以捕获特征位置关系,从而提高检测器的性能。本文提出了一种新的高效变压器结构,并将其应用于人脸检测。该算法利用二维卷积滤波器突出相似图中的空间信息。该架构产生低计算量和轻量级可训练参数,使所提出的人脸检测器在廉价设备上快速运行。结果表明,该网络在低成本的基础上实现了高性能和具有竞争力的精度。此外,提议的变压器模块不会显著增加计算和参数,可以在Core is CPU上以每秒95帧的速度快速运行。
{"title":"A Real-time Face Detector on CPU Using Efficient Transformer","authors":"M. D. Putro, Adri Priadana, Duy-Linh Nguyen, K. Jo","doi":"10.1109/IWIS56333.2022.9920713","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920713","url":null,"abstract":"Face detection is a basic vision method to find the fa-ciallocation. It is usually used in the initial step of advanced facial analysis. Therefore, this approach is required to work quickly, especially on low-cost devices to support practical applications. A deep learning architecture can robustly extract the distinctive feature by employing a lot of weighted filters. However, the model produces heavy parameters and computational complexity. A transformer is a deep learning architecture that can capture the feature position relationship, which increases the detector performance. This work in this paper proposes a new efficient transformer architecture that is implemented to face detection. It can highlight the spatial information from a similarity map by utilizing a 2D-convolutional filter. This architecture generates low computation and lightweight trainable parameters that serve the proposed face detector to run fast on an inexpensive device. As a result, this proposed network achieves high performance and competitive precision with the low-cost model. Additionally, the proposed transformer module does not significantly add computation and parameters that can run fast at 95 frames per second on a Core is CPU.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125376243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-08-17DOI: 10.1109/IWIS56333.2022.9920796
T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo
Convolution neural networks (CNNs) have achieved the best performance nowadays not just for 2D or 3D pose estimation but also for many machine vision applications (e.g., image classification, semantic segmentation, object detection and so on). Beside, The Attention Module also show their leader for improve the accuracy in neural network. Hence, the proposed research is focus on creating a suitable feed-forward AM for CNNs which can save the computational cost also improve the accuracy. First, input the tensor into the attention mechanism, which is divided into two main part: channel attention module and spatial attention module. After that, the tensor passing through a stage in the backbone network. The main mechanism then multiplies these two feature maps and sends them to the next stage of backbone. The network enhance the data in terms of long-distance dependencies (channels) and geographic data. Our proposed research would also reveal a distinction between the use of the attention mechanism and nowadays approaches. The proposed research got better result when compare with the baseline-HRNet by 1.3 points in terms of AP but maintain the number of parameter not change much. Our architecture was trained on the COCO 2017 dataset, which are now available as an open benchmark.
{"title":"Efficient High-Resolution Network for Human Pose Estimation","authors":"T. Tran, Xuan-Thuy Vo, Duy-Linh Nguyen, K. Jo","doi":"10.1109/IWIS56333.2022.9920796","DOIUrl":"https://doi.org/10.1109/IWIS56333.2022.9920796","url":null,"abstract":"Convolution neural networks (CNNs) have achieved the best performance nowadays not just for 2D or 3D pose estimation but also for many machine vision applications (e.g., image classification, semantic segmentation, object detection and so on). Beside, The Attention Module also show their leader for improve the accuracy in neural network. Hence, the proposed research is focus on creating a suitable feed-forward AM for CNNs which can save the computational cost also improve the accuracy. First, input the tensor into the attention mechanism, which is divided into two main part: channel attention module and spatial attention module. After that, the tensor passing through a stage in the backbone network. The main mechanism then multiplies these two feature maps and sends them to the next stage of backbone. The network enhance the data in terms of long-distance dependencies (channels) and geographic data. Our proposed research would also reveal a distinction between the use of the attention mechanism and nowadays approaches. The proposed research got better result when compare with the baseline-HRNet by 1.3 points in terms of AP but maintain the number of parameter not change much. Our architecture was trained on the COCO 2017 dataset, which are now available as an open benchmark.","PeriodicalId":340399,"journal":{"name":"2022 International Workshop on Intelligent Systems (IWIS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132473269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}