首页 > 最新文献

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

英文 中文
FLNet: Graph Constrained Floor Layout Generation FLNet:图形约束地板布局生成
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859350
Abhinav Upadhyay, Alpana Dubey, Veenu Arora, Mani Suma Kuriakose, Shaurya Agarawal
In this work, we propose a generative-based approach, FLNet, to synthesize floor layout plans guided by user constraints. Our approach considers user inputs in the form of boundary, room types, and spatial relationships and generates the layout design satisfying these requirements. We evaluated our approach on floor plans data, RPLAN, consisting of 80,000 vector-graphics floor plans of residential buildings designed by professional architects. We perform both qualitative and quantitative analysis along three metrics - Layout generation accuracy, Realism, and Quality to evaluate the generated layout designs. We compare our approach with the existing baselines and outperform on all these metrics. The layout designs generated by our approach are more realistic and of better quality.
在这项工作中,我们提出了一种基于生成的方法,FLNet,在用户约束的指导下合成地板布局平面图。我们的方法以边界、房间类型和空间关系的形式考虑用户输入,并生成满足这些要求的布局设计。我们在平面图数据RPLAN上评估了我们的方法,该数据由专业建筑师设计的80,000个住宅建筑的矢量图平面图组成。我们根据三个指标进行定性和定量分析-布局生成准确性,现实性和质量来评估生成的布局设计。我们将我们的方法与现有的基线进行比较,并在所有这些指标上表现出色。我们的方法生成的布局设计更真实,质量更好。
{"title":"FLNet: Graph Constrained Floor Layout Generation","authors":"Abhinav Upadhyay, Alpana Dubey, Veenu Arora, Mani Suma Kuriakose, Shaurya Agarawal","doi":"10.1109/ICMEW56448.2022.9859350","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859350","url":null,"abstract":"In this work, we propose a generative-based approach, FLNet, to synthesize floor layout plans guided by user constraints. Our approach considers user inputs in the form of boundary, room types, and spatial relationships and generates the layout design satisfying these requirements. We evaluated our approach on floor plans data, RPLAN, consisting of 80,000 vector-graphics floor plans of residential buildings designed by professional architects. We perform both qualitative and quantitative analysis along three metrics - Layout generation accuracy, Realism, and Quality to evaluate the generated layout designs. We compare our approach with the existing baselines and outperform on all these metrics. The layout designs generated by our approach are more realistic and of better quality.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131634148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
3D-DSPnet: Product Disassembly Sequence Planning 3D-DSPnet:产品拆卸顺序规划
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859434
Abhinav Upadhyay, Bharat Ladrecha, Alpana Dubey, Suma Mani Kuriakose, P. Goenka
Product Disassembly has become an area of active research as it supports sustainable development by aiding effective end-of-life (EOL) stage strategies like reuse, re-manufacturing, recycling, etc. In this work, we propose a new approach, 3D-DSPNet, that can utilize 3D data from CAD assembly models to generate a feasible disassembly sequence. Our approach uses Graph-based learning to process the graph representation of CAD models. Currently, the available 3D CAD model datasets lack ground truth disassembly sequences. We propose and curate a new dataset, the 3D-DSP dataset, which includes ground truth information about the disassembly sequence for 3D product models. We carry out evaluation and analysis of results to explain the efficacy of the proposed method. Our approach significantly outperforms the existing baseline. We develop an Autodesk Fusion 360 plug-in that generates disassembly sequence animation, allowing intuitive analysis of the disassembly plan.
产品拆解已经成为一个活跃的研究领域,因为它通过帮助有效的生命周期结束(EOL)阶段策略(如再利用、再制造、再循环等)来支持可持续发展。在这项工作中,我们提出了一种新的方法,3D- dspnet,它可以利用CAD装配模型的3D数据来生成可行的拆卸序列。我们的方法使用基于图的学习来处理CAD模型的图表示。目前,现有的三维CAD模型数据集缺乏ground truth拆卸序列。我们提出并策划了一个新的数据集,3D- dsp数据集,其中包括关于3D产品模型拆卸序列的真实信息。我们对结果进行了评价和分析,以解释所提出方法的有效性。我们的方法明显优于现有的基线。我们开发了一个Autodesk Fusion 360插件,生成拆卸序列动画,允许直观地分析拆卸计划。
{"title":"3D-DSPnet: Product Disassembly Sequence Planning","authors":"Abhinav Upadhyay, Bharat Ladrecha, Alpana Dubey, Suma Mani Kuriakose, P. Goenka","doi":"10.1109/ICMEW56448.2022.9859434","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859434","url":null,"abstract":"Product Disassembly has become an area of active research as it supports sustainable development by aiding effective end-of-life (EOL) stage strategies like reuse, re-manufacturing, recycling, etc. In this work, we propose a new approach, 3D-DSPNet, that can utilize 3D data from CAD assembly models to generate a feasible disassembly sequence. Our approach uses Graph-based learning to process the graph representation of CAD models. Currently, the available 3D CAD model datasets lack ground truth disassembly sequences. We propose and curate a new dataset, the 3D-DSP dataset, which includes ground truth information about the disassembly sequence for 3D product models. We carry out evaluation and analysis of results to explain the efficacy of the proposed method. Our approach significantly outperforms the existing baseline. We develop an Autodesk Fusion 360 plug-in that generates disassembly sequence animation, allowing intuitive analysis of the disassembly plan.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CDTNET: Cross-Domain Transformer Based on Attributes for Person Re-Identification CDTNET:基于属性的跨域人员再识别转换器
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859330
Mengyuan Guan, Suncheng Xiang, Ting Liu, Yuzhuo Fu
Unsupervised Domain Adaptation (UDA) Person reidentification (ReID) strives towards fine-tuning the model trained on a labelled source-domain dataset to a target-domain dataset, which has grown by leaps and bounds due to the advancement of deep convolution neural network (CNN). However, traditional CNN-based methods mainly focus on learning small discriminative features in local pedestrian region, which fails to exploit the potential of rich structural patterns and suffers from information loss on details caused by convolution operators. To tackle the challenge, this work attempts to exploit the valuable fine-grained attributes based on Transformers. Inspired by this, we propose a Cross-Domain Transformer network CDTnet to enhance the robust feature learning in connection with pedestrian attributes. As far as we are aware, we are among the first attempt to adopt a pure transformer for cross-domain ReID research. All-inclusive experiments conducted on several ReID benchmarks demonstrate that our method can reach a comparable yield with reference to the state-of-the-arts.
无监督域自适应(UDA)人再识别(ReID)致力于将在标记的源域数据集上训练的模型微调到目标域数据集,由于深度卷积神经网络(CNN)的进步,该模型得到了突飞猛进的发展。然而,传统的基于cnn的方法主要集中在学习局部行人区域的小特征,无法挖掘丰富结构模式的潜力,并且存在卷积算子导致细节信息丢失的问题。为了应对这一挑战,本工作试图利用基于transformer的有价值的细粒度属性。受此启发,我们提出了一种跨域变压器网络CDTnet来增强与行人属性相关的鲁棒特征学习。据我们所知,我们是最早尝试采用纯变压器进行跨域ReID研究的公司之一。在几个ReID基准上进行的全面实验表明,我们的方法可以达到与最先进的产量相当的产量。
{"title":"CDTNET: Cross-Domain Transformer Based on Attributes for Person Re-Identification","authors":"Mengyuan Guan, Suncheng Xiang, Ting Liu, Yuzhuo Fu","doi":"10.1109/ICMEW56448.2022.9859330","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859330","url":null,"abstract":"Unsupervised Domain Adaptation (UDA) Person reidentification (ReID) strives towards fine-tuning the model trained on a labelled source-domain dataset to a target-domain dataset, which has grown by leaps and bounds due to the advancement of deep convolution neural network (CNN). However, traditional CNN-based methods mainly focus on learning small discriminative features in local pedestrian region, which fails to exploit the potential of rich structural patterns and suffers from information loss on details caused by convolution operators. To tackle the challenge, this work attempts to exploit the valuable fine-grained attributes based on Transformers. Inspired by this, we propose a Cross-Domain Transformer network CDTnet to enhance the robust feature learning in connection with pedestrian attributes. As far as we are aware, we are among the first attempt to adopt a pure transformer for cross-domain ReID research. All-inclusive experiments conducted on several ReID benchmarks demonstrate that our method can reach a comparable yield with reference to the state-of-the-arts.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116945360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
CPS: Full-Song and Style-Conditioned Music Generation with Linear Transformer 使用线性变压器的全歌曲和风格条件音乐生成
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859286
Weipeng Wang, Xiaobing Li, Cong Jin, Di Lu, Qingwen Zhou, Tie Yun
Many deep music generation algorithms have recently been able to produce good-sounding music, but there have been few studies on controlled generation. In this process, the human sense of participation is usually very weak, and it is difficult to integrate one’s own musical motivation into the creation. In this study, we will introduce CPS (Compound word with style), a model that can specify a target style and generate a complete musical composition from scratch. We first added the genre meta-information to the music representation and distinguished it from other low-level music representations, thus strengthening the influence of the control signal. We modeled with the linear transformer, while used an adaptive strategy with different settings for different types of music tokens to reduce the probability of disharmonic music. The experiments show that, when compared to the baseline model, our model performs better in terms of basic music metrics as well as metrics for evaluating controlled ability.
近年来,许多深度音乐生成算法都能够产生好听的音乐,但对控制生成的研究却很少。在这个过程中,人的参与感通常很弱,很难将自己的音乐动机融入到创作中。在本研究中,我们将介绍CPS (Compound word with style),这是一个可以指定目标风格并从头生成完整音乐作品的模型。我们首先在音乐表征中加入体裁元信息,并将其与其他低级音乐表征区分开来,从而加强控制信号的影响。我们使用线性变压器建模,同时对不同类型的音乐符号使用不同设置的自适应策略来减少不和谐音乐的概率。实验表明,与基线模型相比,我们的模型在基本音乐指标以及评估控制能力的指标方面表现得更好。
{"title":"CPS: Full-Song and Style-Conditioned Music Generation with Linear Transformer","authors":"Weipeng Wang, Xiaobing Li, Cong Jin, Di Lu, Qingwen Zhou, Tie Yun","doi":"10.1109/ICMEW56448.2022.9859286","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859286","url":null,"abstract":"Many deep music generation algorithms have recently been able to produce good-sounding music, but there have been few studies on controlled generation. In this process, the human sense of participation is usually very weak, and it is difficult to integrate one’s own musical motivation into the creation. In this study, we will introduce CPS (Compound word with style), a model that can specify a target style and generate a complete musical composition from scratch. We first added the genre meta-information to the music representation and distinguished it from other low-level music representations, thus strengthening the influence of the control signal. We modeled with the linear transformer, while used an adaptive strategy with different settings for different types of music tokens to reduce the probability of disharmonic music. The experiments show that, when compared to the baseline model, our model performs better in terms of basic music metrics as well as metrics for evaluating controlled ability.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131095293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fire and Gun Detection Based on Sematic Embeddings 基于语义嵌入的火力和火炮检测
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859303
Yunbin Deng, Ryan Campbell, Piyush Kumar
It is critical that real-time gun and fire detection from video be accurate to protect life, property and the environment. Recent advances in deep machine learning have greatly improved detection accuracy in this domain. In this paper, a semantic embedding-based method is developed for zero-shot gun and fire detection. Using a pre-trained Contrastive Language-Image Pre-Training (CLIP) model, input images and arbitrary texts can be mapped to semantic vectors and their similarity can be computed. By defining object classes using the semantic vector of each classes’ description, highly accurate object detection accuracy can be achieved without training any new model. Evaluation of this method on public domain FireNet and IMFDB datasets demonstrates fire and gun detection accuracy of 99.8% and 97.3%, respectively, which significantly outperforms state of the art FireNet and you look only once (YOLO) algorithms. Semantic embedding enables open set semantic search in video and simplifies deploying and maintaining object detection applications.
从视频中实时检测枪支和火灾是至关重要的,以保护生命,财产和环境。深度机器学习的最新进展大大提高了该领域的检测精度。本文提出了一种基于语义嵌入的零弹火炮和火力探测方法。使用预训练的对比语言图像预训练(CLIP)模型,可以将输入图像和任意文本映射到语义向量上,并计算它们的相似度。通过使用每个类描述的语义向量定义对象类,可以在不训练任何新模型的情况下实现高精度的对象检测精度。在公共领域FireNet和IMFDB数据集上对该方法的评估表明,火灾和枪支检测准确率分别为99.8%和97.3%,明显优于最先进的FireNet和你只看一次(YOLO)算法。语义嵌入支持视频中的开放集语义搜索,简化了目标检测应用程序的部署和维护。
{"title":"Fire and Gun Detection Based on Sematic Embeddings","authors":"Yunbin Deng, Ryan Campbell, Piyush Kumar","doi":"10.1109/ICMEW56448.2022.9859303","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859303","url":null,"abstract":"It is critical that real-time gun and fire detection from video be accurate to protect life, property and the environment. Recent advances in deep machine learning have greatly improved detection accuracy in this domain. In this paper, a semantic embedding-based method is developed for zero-shot gun and fire detection. Using a pre-trained Contrastive Language-Image Pre-Training (CLIP) model, input images and arbitrary texts can be mapped to semantic vectors and their similarity can be computed. By defining object classes using the semantic vector of each classes’ description, highly accurate object detection accuracy can be achieved without training any new model. Evaluation of this method on public domain FireNet and IMFDB datasets demonstrates fire and gun detection accuracy of 99.8% and 97.3%, respectively, which significantly outperforms state of the art FireNet and you look only once (YOLO) algorithms. Semantic embedding enables open set semantic search in video and simplifies deploying and maintaining object detection applications.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131248700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Bottleneck Detection in Crowded Video Scenes Utilizing Lagrangian Motion Analysis Via Density and Arc Length Measures 基于密度和弧长测量的拉格朗日运动分析在拥挤视频场景中的瓶颈检测
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859348
Maik Simon, Erik Bochinski, Markus Küchhold, T. Sikora
Bottleneck situations can occur in overcrowded areas such as entrances or narrowed passages and are associated with a great danger to the life and health of involved people. The automated detection of such bottlenecks is the first crucial step to mitigate these dangers. In this work, we utilize the dynamics of motions using the Lagrangian approach from the analysis of dynamic systems to analyze profiles of groups of people. The derived features, which are observed by the long-term dependent motion dynamics, are described by two-dimensional Lagrangian fields. We extend the underlying Lagrangian framework by a novel measure to capture the density of motion and hence people in the context of crowd analysis. Further, we show how this novel density measure can be combined with the established arc length measure for the detection of bottlenecks in videos. Experimental evaluations show a 5% improvement over the state-of-the-art for spatiotemporal bottleneck detection.
瓶颈情况可能发生在入口处或狭窄通道等拥挤区域,对相关人员的生命和健康构成极大危险。自动检测此类瓶颈是减轻这些危险的第一个关键步骤。在这项工作中,我们利用动态系统分析中的拉格朗日方法来分析人群的概况。导出的特征是由长期依赖运动动力学观察到的,用二维拉格朗日场来描述。我们通过一种新的测量方法扩展了潜在的拉格朗日框架,以捕捉运动的密度,从而在人群分析的背景下捕捉人。此外,我们展示了如何将这种新颖的密度测量与已建立的弧长测量相结合,以检测视频中的瓶颈。实验评估表明,在时空瓶颈检测方面,该方法比最先进的方法提高了5%。
{"title":"Bottleneck Detection in Crowded Video Scenes Utilizing Lagrangian Motion Analysis Via Density and Arc Length Measures","authors":"Maik Simon, Erik Bochinski, Markus Küchhold, T. Sikora","doi":"10.1109/ICMEW56448.2022.9859348","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859348","url":null,"abstract":"Bottleneck situations can occur in overcrowded areas such as entrances or narrowed passages and are associated with a great danger to the life and health of involved people. The automated detection of such bottlenecks is the first crucial step to mitigate these dangers. In this work, we utilize the dynamics of motions using the Lagrangian approach from the analysis of dynamic systems to analyze profiles of groups of people. The derived features, which are observed by the long-term dependent motion dynamics, are described by two-dimensional Lagrangian fields. We extend the underlying Lagrangian framework by a novel measure to capture the density of motion and hence people in the context of crowd analysis. Further, we show how this novel density measure can be combined with the established arc length measure for the detection of bottlenecks in videos. Experimental evaluations show a 5% improvement over the state-of-the-art for spatiotemporal bottleneck detection.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114488150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Decentralized Federated Learning with Enhanced Privacy Preservation 增强隐私保护的去中心化联邦学习
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859507
Sheng-Po Tseng, Jan-Yue Lin, Wei-Chien Cheng, L. Yeh, Chih-Ya Shen
We present a decentralized federated learning (FL) framework based on blockchain. In traditional federated learning, it is necessary that a third-party centralized server aggregates all the gradients which participant in the upload, but such a trusted third-party may not always exist. We address this issue with the decentralized blockchain and encrypt the neural network model parameters and gradients.
我们提出了一个基于区块链的去中心化联邦学习(FL)框架。在传统的联邦学习中,需要第三方集中式服务器聚合参与上传的所有梯度,但这样的可信第三方可能并不总是存在。我们用去中心化的区块链解决了这个问题,并加密了神经网络模型参数和梯度。
{"title":"Decentralized Federated Learning with Enhanced Privacy Preservation","authors":"Sheng-Po Tseng, Jan-Yue Lin, Wei-Chien Cheng, L. Yeh, Chih-Ya Shen","doi":"10.1109/ICMEW56448.2022.9859507","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859507","url":null,"abstract":"We present a decentralized federated learning (FL) framework based on blockchain. In traditional federated learning, it is necessary that a third-party centralized server aggregates all the gradients which participant in the upload, but such a trusted third-party may not always exist. We address this issue with the decentralized blockchain and encrypt the neural network model parameters and gradients.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133676947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Surveillance Video Anomaly Detection with Feature Enhancement and Consistency Frame Prediction 基于特征增强和一致性帧预测的监控视频异常检测
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859414
Beiji Zou, Min Wang, Lingzi Jiang, Yue Zhang, Shu Liu
Surveillance video anomaly detection is a challenging problem because of the diversity of abnormal events. The current prediction-based methods outperform reconstruction-based methods. But the former has the following issues: 1) Using optical flow to represent motion will affect real-time detection. 2) Distinguishing abnormal events only by local relationships will lead to ambiguity. 3) Semantic information and spatiotemporal constraint are not fully utilized. To address these problems, we propose FECP-Net: a network with feature enhancement and consistency frame prediction for surveillance video anomaly detection. We use the RGB difference between consecutive frames rather than optical flow to realize real-time detection. Meanwhile, we design a feature enhancement module to enrich semantics and global context information in features. In addition, we add spatiotemporal consistency constraint and consistency loss to strengthen consistency predictions. Extensive experiments on standard benchmarks demonstrate the effectiveness of our method.
由于监控视频异常事件的多样性,异常检测是一个具有挑战性的问题。目前基于预测的方法优于基于重建的方法。但前者存在以下问题:1)用光流表示运动会影响实时检测。2)仅通过局部关系来区分异常事件会导致歧义。3)语义信息和时空约束没有得到充分利用。为了解决这些问题,我们提出了FECP-Net:一个具有特征增强和一致性帧预测的监控视频异常检测网络。我们使用连续帧之间的RGB差而不是光流来实现实时检测。同时,我们设计了一个特征增强模块来丰富特征中的语义和全局上下文信息。此外,我们还增加了时空一致性约束和一致性损失来增强一致性预测。在标准基准测试上的大量实验证明了我们的方法的有效性。
{"title":"Surveillance Video Anomaly Detection with Feature Enhancement and Consistency Frame Prediction","authors":"Beiji Zou, Min Wang, Lingzi Jiang, Yue Zhang, Shu Liu","doi":"10.1109/ICMEW56448.2022.9859414","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859414","url":null,"abstract":"Surveillance video anomaly detection is a challenging problem because of the diversity of abnormal events. The current prediction-based methods outperform reconstruction-based methods. But the former has the following issues: 1) Using optical flow to represent motion will affect real-time detection. 2) Distinguishing abnormal events only by local relationships will lead to ambiguity. 3) Semantic information and spatiotemporal constraint are not fully utilized. To address these problems, we propose FECP-Net: a network with feature enhancement and consistency frame prediction for surveillance video anomaly detection. We use the RGB difference between consecutive frames rather than optical flow to realize real-time detection. Meanwhile, we design a feature enhancement module to enrich semantics and global context information in features. In addition, we add spatiotemporal consistency constraint and consistency loss to strengthen consistency predictions. Extensive experiments on standard benchmarks demonstrate the effectiveness of our method.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129972457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Augmentation for Efficient Self-Supervised Visual Representation Learning 基于多增强的高效自监督视觉表征学习
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859465
Van-Nhiem Tran, Chi-En Huang, Shenyao Liu, Kai-Lin Yang, Timothy Ko, Yung-Hui Li
In recent years, self-supervised learning has been studied to deal with the limitation of available labeled-dataset. Among the major components of self-supervised learning, the data augmentation pipeline is one key factor in enhancing the resulting performance. However, most researchers manually designed the augmentation pipeline, and the limited collections of transformation may cause the lack of robustness of the learned feature representation. In this work, we proposed Multi-Augmentations for Self-Supervised Representation Learning (MA-SSRL), which fully searched for various augmentation policies to build the entire pipeline to improve the robustness of the learned feature representation. MA-SSRL successfully learns the invariant feature representation and presents an efficient, effective, and adaptable data augmentation pipeline for self-supervised pre-training on different distribution and domain datasets. MA-SSRL outperforms the previous state-of-the-art methods on transfer and semi-supervised benchmarks while requiring fewer training epochs. Code available on GitHub1.
近年来,人们研究了自监督学习来解决可用标记数据集的局限性。在自监督学习的主要组成部分中,数据增强管道是提高结果性能的关键因素之一。然而,大多数研究人员手工设计了增强管道,并且有限的变换集合可能导致学习到的特征表示缺乏鲁棒性。在这项工作中,我们提出了自我监督表示学习的多增强(multi - augmentation for Self-Supervised Representation Learning, MA-SSRL),它充分搜索各种增强策略来构建整个管道,以提高学习到的特征表示的鲁棒性。MA-SSRL成功地学习了不变特征表示,为不同分布和领域数据集的自监督预训练提供了一种高效、有效、适应性强的数据增强管道。MA-SSRL在迁移和半监督基准测试上优于以前最先进的方法,同时需要更少的训练周期。代码可在GitHub1。
{"title":"Multi-Augmentation for Efficient Self-Supervised Visual Representation Learning","authors":"Van-Nhiem Tran, Chi-En Huang, Shenyao Liu, Kai-Lin Yang, Timothy Ko, Yung-Hui Li","doi":"10.1109/ICMEW56448.2022.9859465","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859465","url":null,"abstract":"In recent years, self-supervised learning has been studied to deal with the limitation of available labeled-dataset. Among the major components of self-supervised learning, the data augmentation pipeline is one key factor in enhancing the resulting performance. However, most researchers manually designed the augmentation pipeline, and the limited collections of transformation may cause the lack of robustness of the learned feature representation. In this work, we proposed Multi-Augmentations for Self-Supervised Representation Learning (MA-SSRL), which fully searched for various augmentation policies to build the entire pipeline to improve the robustness of the learned feature representation. MA-SSRL successfully learns the invariant feature representation and presents an efficient, effective, and adaptable data augmentation pipeline for self-supervised pre-training on different distribution and domain datasets. MA-SSRL outperforms the previous state-of-the-art methods on transfer and semi-supervised benchmarks while requiring fewer training epochs. Code available on GitHub1.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129542332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GolfPose: Golf Swing Analyses with a Monocular Camera Based Human Pose Estimation GolfPose:高尔夫挥杆分析与单目相机为基础的人体姿态估计
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859415
Zhongyu Jiang, Haorui Ji, Samuel Menaker, Jenq-Neng Hwang
With the rapid developments of computer vision and deep learning technologies, artificial intelligence takes a more and more important role in sports analyses. In this paper, to attain the objective of automated golf swing analyses, we propose a lightweight temporal-based 2D human pose estimation (HPE) method, called GolfPose, which achieves improved performance than the state-of-the-art image-based HPE methods. Unlike traditional image-based methods, our temporal-based method, designed for efficient and effective golf swing analyses, takes advantage of the temporal information to improve the estimation accuracy of fast-moving and partially self-occluded keypoints. Furthermore, in order to make sure the golf swing analyses can run on mobile devices, we optimize the model architecture to achieve real-time inference. With around 10% of the parameters and half of the GFLOPs used in the state-of-the-art HRNet, our proposed GolfPose model can achieve 9.16 mean pixel error (MPE) in our golf swing dataset, compared with 9.20 MPE for HRNet. Furthermore, the proposed temporal-based method, facilitated with golf club detection(GCD), significantly improves the accuracy of keypoints on the golf club from 13.98 to 9.21 MPE.
随着计算机视觉和深度学习技术的迅速发展,人工智能在体育分析中发挥着越来越重要的作用。在本文中,为了实现高尔夫挥杆自动分析的目标,我们提出了一种轻量级的基于时间的二维人体姿态估计(HPE)方法,称为GolfPose,它比最先进的基于图像的HPE方法具有更高的性能。与传统的基于图像的方法不同,我们的基于时间的方法利用时间信息来提高快速运动和部分自遮挡的关键点的估计精度,从而实现高效的高尔夫挥杆分析。此外,为了确保高尔夫挥杆分析可以在移动设备上运行,我们优化了模型架构以实现实时推理。在最先进的HRNet中使用了大约10%的参数和一半的gflop,我们提出的GolfPose模型可以在我们的高尔夫挥杆数据集中实现9.16的平均像素误差(MPE),而HRNet的平均像素误差为9.20。此外,该方法结合高尔夫球杆检测(GCD),将高尔夫球杆上关键点的准确率从13.98 MPE显著提高到9.21 MPE。
{"title":"GolfPose: Golf Swing Analyses with a Monocular Camera Based Human Pose Estimation","authors":"Zhongyu Jiang, Haorui Ji, Samuel Menaker, Jenq-Neng Hwang","doi":"10.1109/ICMEW56448.2022.9859415","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859415","url":null,"abstract":"With the rapid developments of computer vision and deep learning technologies, artificial intelligence takes a more and more important role in sports analyses. In this paper, to attain the objective of automated golf swing analyses, we propose a lightweight temporal-based 2D human pose estimation (HPE) method, called GolfPose, which achieves improved performance than the state-of-the-art image-based HPE methods. Unlike traditional image-based methods, our temporal-based method, designed for efficient and effective golf swing analyses, takes advantage of the temporal information to improve the estimation accuracy of fast-moving and partially self-occluded keypoints. Furthermore, in order to make sure the golf swing analyses can run on mobile devices, we optimize the model architecture to achieve real-time inference. With around 10% of the parameters and half of the GFLOPs used in the state-of-the-art HRNet, our proposed GolfPose model can achieve 9.16 mean pixel error (MPE) in our golf swing dataset, compared with 9.20 MPE for HRNet. Furthermore, the proposed temporal-based method, facilitated with golf club detection(GCD), significantly improves the accuracy of keypoints on the golf club from 13.98 to 9.21 MPE.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129730450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1