首页 > 最新文献

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

英文 中文
Summary of the 2022 Low-Power Deep Learning Semantic Segmentation Model Compression Competition for Traffic Scene In Asian Countries 2022亚洲国家交通场景低功耗深度学习语义分割模型压缩竞赛综述
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859367
Yu-Shu Ni, Chia-Chi Tsai, Chih-Cheng Chen, Po-Yu Chen, Hsien-Kai Kuo, Man-Yu Lee, Kuo Chin-Chuan, Zhe-Ln Hu, Po-Chi Hu, Ted T. Kuo, Jenq-Neng Hwang, Jiun-In Guo
The 2022 low-power deep learning semantic segmentation model compression competition for traffic scene in Asian countries held in IEEE ICME2022 Grand Challenges focuses on the semantic segmentation technologies in autonomous driving scenarios. The competition aims to semantically segment objects in traffic with low power and high mean intersection over union (mIOU) in the Asia countries (e.g., Taiwan), which contain several harsh driving environments. The target segmented objects include dashed white line, dashed yellow line, single white line, single yellow line, double dashed white line, double white line, double yellow line, main lane, and alter lane. There are 35,500 annotated images provided for model training revised from Berkeley Deep Drive 100K and 130 annotated images provided for example from Asian road conditions. Additional 2,012 testing images are used in the contest evaluation process, in which 1,200 of them are used in the qualification stage competition, and the rest are used in the final stage competition. There are in total 203 registered teams joining this competition, and the top 15 teams with the highest mIOU entered the final stage competition, from which 8 teams submitted the final results. The overall best model belongs to team “okt2077”, followed by team “asdggg” and team “AVCLab.” A special award for the best INT8 model development award is absent.
在IEEE ICME2022大挑战中举行的2022年亚洲国家交通场景低功耗深度学习语义分割模型压缩竞赛,重点关注自动驾驶场景中的语义分割技术。该竞赛旨在对亚洲国家(如台湾)中具有低功率和高平均交叉路口(mIOU)的交通对象进行语义分割,这些国家包含几个恶劣的驾驶环境。目标分割对象包括白虚线、黄虚线、单白线、单黄线、双白虚线、双白线、双黄线、主要车道、改变车道。为模型训练提供了35500张来自Berkeley Deep Drive 100K的注释图像,并提供了130张来自亚洲路况的注释图像。另外,在比赛评审过程中使用了2012张测试图像,其中1200张用于资格赛阶段的比赛,其余的用于决赛阶段的比赛。本次比赛共有203支报名队伍参加,mIOU得分最高的前15支队伍进入决赛阶段比赛,其中8支队伍提交了最终成绩。整体最佳模型为“okt2077”团队,其次为“asdggg”团队和“AVCLab”团队。没有设立最佳INT8车型开发特别奖。
{"title":"Summary of the 2022 Low-Power Deep Learning Semantic Segmentation Model Compression Competition for Traffic Scene In Asian Countries","authors":"Yu-Shu Ni, Chia-Chi Tsai, Chih-Cheng Chen, Po-Yu Chen, Hsien-Kai Kuo, Man-Yu Lee, Kuo Chin-Chuan, Zhe-Ln Hu, Po-Chi Hu, Ted T. Kuo, Jenq-Neng Hwang, Jiun-In Guo","doi":"10.1109/ICMEW56448.2022.9859367","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859367","url":null,"abstract":"The 2022 low-power deep learning semantic segmentation model compression competition for traffic scene in Asian countries held in IEEE ICME2022 Grand Challenges focuses on the semantic segmentation technologies in autonomous driving scenarios. The competition aims to semantically segment objects in traffic with low power and high mean intersection over union (mIOU) in the Asia countries (e.g., Taiwan), which contain several harsh driving environments. The target segmented objects include dashed white line, dashed yellow line, single white line, single yellow line, double dashed white line, double white line, double yellow line, main lane, and alter lane. There are 35,500 annotated images provided for model training revised from Berkeley Deep Drive 100K and 130 annotated images provided for example from Asian road conditions. Additional 2,012 testing images are used in the contest evaluation process, in which 1,200 of them are used in the qualification stage competition, and the rest are used in the final stage competition. There are in total 203 registered teams joining this competition, and the top 15 teams with the highest mIOU entered the final stage competition, from which 8 teams submitted the final results. The overall best model belongs to team “okt2077”, followed by team “asdggg” and team “AVCLab.” A special award for the best INT8 model development award is absent.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121533987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Local to Global Transformer for Video Based 3d Human Pose Estimation 基于视频的三维人体姿态估计的局部到全局变换
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859482
Haifeng Ma, Ke Lu, Jian Xue, Zehai Niu, Pengcheng Gao
Transformer-based architecture has achieved great results in sequence to sequence tasks and vision tasks including 3D human pose estimation. However, transformer based 3D human pose estimation method is not as strong as RNN and CNN in terms of local information acquisition. Additionally, local information plays a major role in obtaining 3D positional relationships. In this paper, we propose a method that combines local human body parts and global skeleton joints using a temporal transformer to finely track the temporal motion of human body parts. First, we encode positional and temporal information, then we use a local to global temporal transformer to obtain local and global information, and finally we obtain the target 3D human pose. To evaluate the effectiveness of our method, we quantitatively and qualitatively evaluated our method on two popular and standard benchmark datasets: Human3.6M and HumanEva-I. Extensive experiments demonstrated that we achieved state-of-the-art performance on Human3.6M with 2D ground truth as input.
基于变压器的架构在序列到序列任务和视觉任务(包括3D人体姿态估计)中取得了很大的成果。然而,基于变压器的三维人体姿态估计方法在局部信息获取方面不如RNN和CNN强。此外,局部信息在获取三维位置关系中起着重要作用。在本文中,我们提出了一种结合局部人体部位和整体骨骼关节的方法,利用时间转换器来精细跟踪人体部位的时间运动。首先对位置信息和时间信息进行编码,然后利用局部到全局的时间变换来获取局部和全局信息,最后得到目标三维人体姿态。为了评估我们方法的有效性,我们在两个流行的标准基准数据集:Human3.6M和HumanEva-I上定量和定性地评估了我们的方法。大量的实验表明,我们在Human3.6M上实现了最先进的性能,并以二维地面真相为输入。
{"title":"Local to Global Transformer for Video Based 3d Human Pose Estimation","authors":"Haifeng Ma, Ke Lu, Jian Xue, Zehai Niu, Pengcheng Gao","doi":"10.1109/ICMEW56448.2022.9859482","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859482","url":null,"abstract":"Transformer-based architecture has achieved great results in sequence to sequence tasks and vision tasks including 3D human pose estimation. However, transformer based 3D human pose estimation method is not as strong as RNN and CNN in terms of local information acquisition. Additionally, local information plays a major role in obtaining 3D positional relationships. In this paper, we propose a method that combines local human body parts and global skeleton joints using a temporal transformer to finely track the temporal motion of human body parts. First, we encode positional and temporal information, then we use a local to global temporal transformer to obtain local and global information, and finally we obtain the target 3D human pose. To evaluate the effectiveness of our method, we quantitatively and qualitatively evaluated our method on two popular and standard benchmark datasets: Human3.6M and HumanEva-I. Extensive experiments demonstrated that we achieved state-of-the-art performance on Human3.6M with 2D ground truth as input.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127352885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Watermarking Protocol for Deep Neural Network Ownership Regulation in Federated Learning 联邦学习中深度神经网络所有权调节的水印协议
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859395
Fangqi Li, Shilin Wang, Alan Wee-Chung Liew
With the wide application of deep learning models, it is important to verify an author’s possession over a deep neural network model by watermarks and protect the model. The development of distributed learning paradigms such as federated learning raises new challenges for model protection. Each author should be able to conduct independent verification and trace traitors. To meet those requirements, we propose a watermarking protocol, Merkle-Sign to meet the prerequisites for ownership verification in federated learning. Our work paves the way for generalizing watermark as a practical security mechanism for protecting deep learning models in distributed learning platforms.
随着深度学习模型的广泛应用,利用水印来验证作者对深度神经网络模型的所有权和对模型的保护变得非常重要。分布式学习范式(如联邦学习)的发展对模型保护提出了新的挑战。每个作者都应该能够进行独立的核查和追踪叛徒。为了满足这些需求,我们提出了一种水印协议,即默克尔签名,以满足联邦学习中所有权验证的先决条件。我们的工作为推广水印作为分布式学习平台中保护深度学习模型的实用安全机制铺平了道路。
{"title":"Watermarking Protocol for Deep Neural Network Ownership Regulation in Federated Learning","authors":"Fangqi Li, Shilin Wang, Alan Wee-Chung Liew","doi":"10.1109/ICMEW56448.2022.9859395","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859395","url":null,"abstract":"With the wide application of deep learning models, it is important to verify an author’s possession over a deep neural network model by watermarks and protect the model. The development of distributed learning paradigms such as federated learning raises new challenges for model protection. Each author should be able to conduct independent verification and trace traitors. To meet those requirements, we propose a watermarking protocol, Merkle-Sign to meet the prerequisites for ownership verification in federated learning. Our work paves the way for generalizing watermark as a practical security mechanism for protecting deep learning models in distributed learning platforms.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123474391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploring Multisensory Feedback for Virtual Reality Relaxation 探索虚拟现实放松的多感官反馈
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859362
Jing-Yuan Huang, Grace Theodore, You-Shin Tsai, Jerry Chin-Han Goh, Mu-Hang Lin, Kuan-Wei Tseng, Y. Hung
Multisensory experience enables Virtual Reality (VR) to have a great potential to reduce stress. We explore four different senses, including sight, hearing, smell, and touch, that can promote relaxation in VR. In particular, we construct an immersive virtual scene, which is combined with selffamiliar vocal guidance, precisely-delivered scent, and a haptic breathing stuffed animal, to provide visual, auditory, olfactory, and tactile feedback in VR. Each component in our system achieves high fidelity so that, when integrated, the user can enjoy an effective relaxation experience.
多感官体验使虚拟现实(VR)在减轻压力方面具有巨大的潜力。我们探索了四种不同的感官,包括视觉、听觉、嗅觉和触觉,它们可以促进VR中的放松。特别地,我们构建了一个沉浸式的虚拟场景,它结合了自我熟悉的声音引导,精确传递的气味和触觉呼吸的毛绒玩具,在VR中提供视觉,听觉,嗅觉和触觉反馈。我们系统中的每个组件都实现了高保真度,因此,当集成时,用户可以享受有效的放松体验。
{"title":"Exploring Multisensory Feedback for Virtual Reality Relaxation","authors":"Jing-Yuan Huang, Grace Theodore, You-Shin Tsai, Jerry Chin-Han Goh, Mu-Hang Lin, Kuan-Wei Tseng, Y. Hung","doi":"10.1109/ICMEW56448.2022.9859362","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859362","url":null,"abstract":"Multisensory experience enables Virtual Reality (VR) to have a great potential to reduce stress. We explore four different senses, including sight, hearing, smell, and touch, that can promote relaxation in VR. In particular, we construct an immersive virtual scene, which is combined with selffamiliar vocal guidance, precisely-delivered scent, and a haptic breathing stuffed animal, to provide visual, auditory, olfactory, and tactile feedback in VR. Each component in our system achieves high fidelity so that, when integrated, the user can enjoy an effective relaxation experience.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115641914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emotional Quality Evaluation for Generated Music Based on Emotion Recognition Model 基于情感识别模型的生成音乐情感质量评价
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859459
Hongfei Wang, Wei Zhong, Lin Ma, Long Ye, Qin Zhang
In the field of musical emotion evaluation, the existing methods usually use subjective experiments, which are demanding on the experimental environment and lack of unified evaluation standard. This paper proposes an emotional quality evaluation method for generated music from the perspective of music emotion recognition. In the proposed method, we analyze the correlation between audio features and emotion category of music, and choose MFCC and Mel spectrum as the most significant audio features. And then the emotion recognition model is constructed based on residual convolutional network to predict the emotion category of generated music. In the experiments, we apply the proposed model to evaluate the emotional quality of generated music. The experimental results show that our model can achieve higher recognition accuracy and thus exhibits strong reliability for the objective emotional quality evaluation of generated music.
在音乐情感评价领域,现有的方法多采用主观实验,对实验环境要求高,缺乏统一的评价标准。本文从音乐情感识别的角度提出了一种生成音乐情感质量评价方法。在该方法中,我们分析了音频特征与音乐情感类别之间的相关性,并选择MFCC和Mel谱作为最显著的音频特征。然后基于残差卷积网络构建情感识别模型,对生成的音乐进行情感分类预测。在实验中,我们应用所提出的模型来评估生成的音乐的情感质量。实验结果表明,该模型能够达到较高的识别精度,对生成音乐的客观情感质量评价具有较强的可靠性。
{"title":"Emotional Quality Evaluation for Generated Music Based on Emotion Recognition Model","authors":"Hongfei Wang, Wei Zhong, Lin Ma, Long Ye, Qin Zhang","doi":"10.1109/ICMEW56448.2022.9859459","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859459","url":null,"abstract":"In the field of musical emotion evaluation, the existing methods usually use subjective experiments, which are demanding on the experimental environment and lack of unified evaluation standard. This paper proposes an emotional quality evaluation method for generated music from the perspective of music emotion recognition. In the proposed method, we analyze the correlation between audio features and emotion category of music, and choose MFCC and Mel spectrum as the most significant audio features. And then the emotion recognition model is constructed based on residual convolutional network to predict the emotion category of generated music. In the experiments, we apply the proposed model to evaluate the emotional quality of generated music. The experimental results show that our model can achieve higher recognition accuracy and thus exhibits strong reliability for the objective emotional quality evaluation of generated music.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"177 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114422260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Conditional Sentence Rephrasing without Parallel Training Corpus 无平行训练语料库的条件句改写
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859385
Yen-Ting Lee, Cheng-te Li, Shou-De Lin
This paper aims to rephrase a sentence with a given condition, and the generated sentence should be similar to the origin sentence and satisfy the given condition without parallel training corpus. We propose a conditional sentence VAE (CS-VAE) model to solve the task. CS-VAE is trained as an autoencoder, along with the condition control on the generated sentence with the same semantics. With the experimental demonstration supported, CS-VAE is proven to effectively solve the task with high-quality sentences.
本文的目的是在给定条件下对句子进行改写,生成的句子应与原句子相似并满足给定条件,而不需要并行训练语料库。我们提出了一个条件句VAE (CS-VAE)模型来解决这个问题。CS-VAE作为一个自动编码器进行训练,同时对生成的具有相同语义的句子进行条件控制。在实验论证的支持下,CS-VAE被证明可以有效地解决高质量句子的任务。
{"title":"Conditional Sentence Rephrasing without Parallel Training Corpus","authors":"Yen-Ting Lee, Cheng-te Li, Shou-De Lin","doi":"10.1109/ICMEW56448.2022.9859385","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859385","url":null,"abstract":"This paper aims to rephrase a sentence with a given condition, and the generated sentence should be similar to the origin sentence and satisfy the given condition without parallel training corpus. We propose a conditional sentence VAE (CS-VAE) model to solve the task. CS-VAE is trained as an autoencoder, along with the condition control on the generated sentence with the same semantics. With the experimental demonstration supported, CS-VAE is proven to effectively solve the task with high-quality sentences.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123587384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Music Question Answering:Cognize and Perceive Music 音乐问答:认知和感知音乐
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859499
Wenhao Gao, Xiaobing Li, Cong Jin, Tie Yun
Music analysis and understanding has always been the work of professionals. In order to help ordinary people congnize and perceive music, we put forward the Music Question Answering task in this paper. The goal of this task is to provide accurate answers given music and related questions. To this end, we made MQAdataset based on MagnaTagATune, which contains seven basic categories. According to the main source of the questions, all questions are divided into basic questions and depth questions. We tested on several models and analyzed the experimental results. The best model, Musicnn-MALiMo (Spectrogram,i=4), obtained 71.13% accuracy.
音乐的分析和理解一直是专业人士的工作。为了帮助普通人认识和感知音乐,我们提出了音乐问答任务。这个任务的目标是给出音乐和相关问题的准确答案。为此,我们基于MagnaTagATune制作MQAdataset,它包含七个基本类别。根据问题的主要来源,所有问题分为基本问题和深度问题。我们对几种模型进行了测试,并对实验结果进行了分析。最佳模型Musicnn-MALiMo (Spectrogram,i=4)的准确率为71.13%。
{"title":"Music Question Answering:Cognize and Perceive Music","authors":"Wenhao Gao, Xiaobing Li, Cong Jin, Tie Yun","doi":"10.1109/ICMEW56448.2022.9859499","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859499","url":null,"abstract":"Music analysis and understanding has always been the work of professionals. In order to help ordinary people congnize and perceive music, we put forward the Music Question Answering task in this paper. The goal of this task is to provide accurate answers given music and related questions. To this end, we made MQAdataset based on MagnaTagATune, which contains seven basic categories. According to the main source of the questions, all questions are divided into basic questions and depth questions. We tested on several models and analyzed the experimental results. The best model, Musicnn-MALiMo (Spectrogram,i=4), obtained 71.13% accuracy.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125573174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Augmented-Training-Aware Bisenet for Real-Time Semantic Segmentation 用于实时语义分割的增强训练感知双组
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859497
Chih-Chung Hsu, Cheih Lee, Shen-Chieh Tai, Yun Jiang
Semantic segmentation techniques have become an attractive research field for autonomous driving. However, it is well-known that the computational complexity of the conventional semantic segmentation is relatively high compared to other computer vision applications. Fast inference of the semantic segmentation for autonomous driving is highly desired. A lightweight convolutional neural network, the Bilateral segmentation network (BiSeNet), is adopted in this paper. However, the performance of the conventional BiSeNet is not so reliable that the model quantization could lead to an even worse result. Therefore, we proposed an augmented training strategy to significantly improve the semantic segmentation task’s performance. First, heavy data augmentation, including CutOut, deformable distortion, and step-wise hard example mining, is used in the training phase to boost the performance of the feature representation learning. Second, the L1 and L2 norm regularization are also used in the model training to prevent the possible overfitting issue. Then, the post-quantization is performed on the TensorFlow-Lite model to significantly reduce the model size and computational complexity. The comprehensive experiments verified that the proposed method is effective and efficient for autonomous driving applications over other state-of-the-art methods.
语义分割技术已成为自动驾驶领域的研究热点。然而,众所周知,与其他计算机视觉应用相比,传统语义分割的计算复杂度相对较高。自动驾驶中语义分割的快速推理是迫切需要的。本文采用了一种轻量级的卷积神经网络——双侧分割网络(BiSeNet)。然而,传统的BiSeNet的性能不太可靠,模型量化可能导致更差的结果。因此,我们提出了一种增强训练策略,以显著提高语义分割任务的性能。首先,在训练阶段使用大量数据增强,包括CutOut、可变形变形和逐步硬示例挖掘,以提高特征表示学习的性能。其次,在模型训练中也使用L1和L2范数正则化,以防止可能的过拟合问题。然后,对TensorFlow-Lite模型进行后量化,显著减小模型尺寸和计算复杂度。综合实验验证了该方法在自动驾驶应用中优于其他先进方法的有效性和效率。
{"title":"Augmented-Training-Aware Bisenet for Real-Time Semantic Segmentation","authors":"Chih-Chung Hsu, Cheih Lee, Shen-Chieh Tai, Yun Jiang","doi":"10.1109/ICMEW56448.2022.9859497","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859497","url":null,"abstract":"Semantic segmentation techniques have become an attractive research field for autonomous driving. However, it is well-known that the computational complexity of the conventional semantic segmentation is relatively high compared to other computer vision applications. Fast inference of the semantic segmentation for autonomous driving is highly desired. A lightweight convolutional neural network, the Bilateral segmentation network (BiSeNet), is adopted in this paper. However, the performance of the conventional BiSeNet is not so reliable that the model quantization could lead to an even worse result. Therefore, we proposed an augmented training strategy to significantly improve the semantic segmentation task’s performance. First, heavy data augmentation, including CutOut, deformable distortion, and step-wise hard example mining, is used in the training phase to boost the performance of the feature representation learning. Second, the L1 and L2 norm regularization are also used in the model training to prevent the possible overfitting issue. Then, the post-quantization is performed on the TensorFlow-Lite model to significantly reduce the model size and computational complexity. The comprehensive experiments verified that the proposed method is effective and efficient for autonomous driving applications over other state-of-the-art methods.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128703239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Unified Video Summarization for Video Anomalies Through Deep Learning 基于深度学习的视频异常统一摘要
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859320
K. Muchtar, Muhammad Rizky Munggaran, Adhiguna Mahendra, Khairul Anwar, Chih-Yang Lin
Over the last ten years, integrated video surveillance systems have become increasingly important in protecting public safety. Because a single surveillance camera continuously collects events in a specific field of view at all times of day and night, a system that can create a summary that concisely captures key elements of the incoming frames is required. To be more specific, due to time constraints, the enormous amount of video footage cannot be properly examined for analysis. As a result, it is vital to compile a summary of what happened on the scene and look for anomalous events in the footage. A unified approach for detecting and summarizing anomalous events is proposed. To detect the event and compute the anomaly scores, a 3D deep learning approach is used. Afterward, the scores are utilized to visualize and localize the anomalous regions. Finally, the blob analysis technique is used to extract the anomalous regions. To verify the results, quantitative and qualitative evaluations are provided. Experiments indicate that the proposed summarizing method keeps crucial information while producing competitive results. More qualitative results can be found through our project channel: https://youtu.be/eMPMjiGlCQI
在过去的十年中,综合视频监控系统在保护公共安全方面变得越来越重要。由于单个监控摄像机在白天和黑夜的任何时间连续收集特定视场中的事件,因此需要一个能够创建摘要的系统,该系统可以简洁地捕获传入帧的关键元素。更具体地说,由于时间的限制,大量的视频片段无法进行适当的检查和分析。因此,对现场发生的事情进行总结并在镜头中寻找异常事件至关重要。提出了一种统一的异常事件检测和汇总方法。为了检测事件并计算异常分数,使用了3D深度学习方法。然后,利用分数来可视化和定位异常区域。最后,利用斑点分析技术提取异常区域。为了验证结果,提供了定量和定性评价。实验结果表明,该方法在保留关键信息的同时,能产生具有竞争力的结果。更多的定性结果可以通过我们的项目渠道:https://youtu.be/eMPMjiGlCQI找到
{"title":"A Unified Video Summarization for Video Anomalies Through Deep Learning","authors":"K. Muchtar, Muhammad Rizky Munggaran, Adhiguna Mahendra, Khairul Anwar, Chih-Yang Lin","doi":"10.1109/ICMEW56448.2022.9859320","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859320","url":null,"abstract":"Over the last ten years, integrated video surveillance systems have become increasingly important in protecting public safety. Because a single surveillance camera continuously collects events in a specific field of view at all times of day and night, a system that can create a summary that concisely captures key elements of the incoming frames is required. To be more specific, due to time constraints, the enormous amount of video footage cannot be properly examined for analysis. As a result, it is vital to compile a summary of what happened on the scene and look for anomalous events in the footage. A unified approach for detecting and summarizing anomalous events is proposed. To detect the event and compute the anomaly scores, a 3D deep learning approach is used. Afterward, the scores are utilized to visualize and localize the anomalous regions. Finally, the blob analysis technique is used to extract the anomalous regions. To verify the results, quantitative and qualitative evaluations are provided. Experiments indicate that the proposed summarizing method keeps crucial information while producing competitive results. More qualitative results can be found through our project channel: https://youtu.be/eMPMjiGlCQI","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115990454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantification of Artist Representativity within an Art Movement 艺术运动中艺术家代表性的量化
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859412
Yu-xin Zhang, Fan Tang, Weiming Dong, Changsheng Xu
Knowing the representative artists can help the public better understand the characteristics of an art movement. In this paper, we propose the concept of artist representativity to assess how an artist can represent the characteristics of an art movement. We begin by presenting a novel approach to learn art-movement-related representations of artworks that enable the style and content features of artworks to be expressed. We then propose an artwork-based artist representation method, which considers the importance and quantity imbalance of artworks. Finally, we develop an artist representativity calculating method based on bi-level graph-based learning. Experiments demonstrate the effectiveness of our approach in predicting the artist representativity within an art movement.
了解代表艺术家可以帮助公众更好地了解艺术运动的特征。在本文中,我们提出了艺术家代表性的概念,以评估艺术家如何代表艺术运动的特征。我们首先提出一种新颖的方法来学习与艺术运动相关的艺术品表征,使艺术作品的风格和内容特征得以表达。然后,我们提出了一种基于艺术品的艺术家表征方法,该方法考虑了艺术品的重要性和数量的不平衡。最后,我们提出了一种基于双层图学习的艺术家代表性计算方法。实验证明了我们的方法在预测艺术运动中的艺术家代表性方面的有效性。
{"title":"Quantification of Artist Representativity within an Art Movement","authors":"Yu-xin Zhang, Fan Tang, Weiming Dong, Changsheng Xu","doi":"10.1109/ICMEW56448.2022.9859412","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859412","url":null,"abstract":"Knowing the representative artists can help the public better understand the characteristics of an art movement. In this paper, we propose the concept of artist representativity to assess how an artist can represent the characteristics of an art movement. We begin by presenting a novel approach to learn art-movement-related representations of artworks that enable the style and content features of artworks to be expressed. We then propose an artwork-based artist representation method, which considers the importance and quantity imbalance of artworks. Finally, we develop an artist representativity calculating method based on bi-level graph-based learning. Experiments demonstrate the effectiveness of our approach in predicting the artist representativity within an art movement.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126649939","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1