首页 > 最新文献

2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)最新文献

英文 中文
Fabric Defect Detection VIA Unsupervised Neural Networks 基于无监督神经网络的织物缺陷检测
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859266
Kuan-Hsien Liu, Song-Jie Chen, Ching-Hsiang Chiu, Tsung-Jung Liu
Surface defect detection is a necessary process for quality control in the industry. Currently, popular neural network based defect detection systems usually need to use a large number of defect samples for training, and it takes a lot of manpower to make marks and clean the subsequent data. This is a time-consuming process, and it makes the whole system less effective. In this paper, a deep neural network based model for fabric surface defect detection is proposed and it only uses positive clean samples for training. Since the proposed model does not collect negative defective samples for learning, the landing time of whole system is greatly reduced. In the experiment, we use RTX3080 in the TensorRT model with 250 FPS, and the detection accuracy is 99%, which is suitable for production lines with real time requirements.
表面缺陷检测是工业生产中质量控制的必要环节。目前流行的基于神经网络的缺陷检测系统通常需要使用大量的缺陷样本进行训练,并且需要耗费大量的人力对后续数据进行标记和清理。这是一个耗时的过程,而且会降低整个系统的效率。本文提出了一种基于深度神经网络的织物表面缺陷检测模型,该模型只使用正清洁样本进行训练。在实验中,我们在250 FPS的TensorRT模型中使用RTX3080,检测准确率达到99%,适用于有实时性要求的生产线。
{"title":"Fabric Defect Detection VIA Unsupervised Neural Networks","authors":"Kuan-Hsien Liu, Song-Jie Chen, Ching-Hsiang Chiu, Tsung-Jung Liu","doi":"10.1109/ICMEW56448.2022.9859266","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859266","url":null,"abstract":"Surface defect detection is a necessary process for quality control in the industry. Currently, popular neural network based defect detection systems usually need to use a large number of defect samples for training, and it takes a lot of manpower to make marks and clean the subsequent data. This is a time-consuming process, and it makes the whole system less effective. In this paper, a deep neural network based model for fabric surface defect detection is proposed and it only uses positive clean samples for training. Since the proposed model does not collect negative defective samples for learning, the landing time of whole system is greatly reduced. In the experiment, we use RTX3080 in the TensorRT model with 250 FPS, and the detection accuracy is 99%, which is suitable for production lines with real time requirements.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128584412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Non-Local Spatiotemporal Correlation Attention for Action Recognition 非局部时空相关注意在动作识别中的应用
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859314
Manh-Hung Ha, O. Chen
To well perceive human actions, it may be favorable only to consider useful clues of human and scene context during the recognition process. Deep Neural Networks (DNNs) used to build up blocks associate with local neighborhood correlation computations at spatial and temporal domains individually. In this work, we develop a DNN which consists of a 3D convolutional neural network, Non-Local SpatioTemporal Correlation Attention (NSTCA) module, and classifier to retrieve meaningful semantic context for effective action identification. Particularly, the proposed NSTCA module extracts advantageous visual clues of both spatial and temporal features via transposed feature correlation computations rather than individual spatial and temporal attention computations. In the experiments, the dataset of traffic police was fulfilled for analysis and comparison. The experimental outcome exhibits that the proposed DNN obtains an average accuracy of 98.2% which is superior to those from the conventional DNNs. Therefore, the DNN proposed herein can be widely applied to discern various actions of subjects in video scenes.
为了更好地感知人类的行为,在识别过程中只考虑人类和场景背景的有用线索可能是有利的。深度神经网络(Deep Neural Networks, dnn)用于在空间和时间域分别构建与局部邻域相关的块。在这项工作中,我们开发了一个由三维卷积神经网络、非局部时空相关注意(NSTCA)模块和分类器组成的深度神经网络,以检索有意义的语义上下文,从而进行有效的动作识别。特别是,NSTCA模块通过转置特征相关计算而不是单独的时空注意力计算来提取空间和时间特征的有利视觉线索。在实验中,实现了交警数据集进行分析和比较。实验结果表明,所提出的深度神经网络的平均准确率为98.2%,优于传统的深度神经网络。因此,本文提出的深度神经网络可以广泛应用于视频场景中识别主体的各种动作。
{"title":"Non-Local Spatiotemporal Correlation Attention for Action Recognition","authors":"Manh-Hung Ha, O. Chen","doi":"10.1109/ICMEW56448.2022.9859314","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859314","url":null,"abstract":"To well perceive human actions, it may be favorable only to consider useful clues of human and scene context during the recognition process. Deep Neural Networks (DNNs) used to build up blocks associate with local neighborhood correlation computations at spatial and temporal domains individually. In this work, we develop a DNN which consists of a 3D convolutional neural network, Non-Local SpatioTemporal Correlation Attention (NSTCA) module, and classifier to retrieve meaningful semantic context for effective action identification. Particularly, the proposed NSTCA module extracts advantageous visual clues of both spatial and temporal features via transposed feature correlation computations rather than individual spatial and temporal attention computations. In the experiments, the dataset of traffic police was fulfilled for analysis and comparison. The experimental outcome exhibits that the proposed DNN obtains an average accuracy of 98.2% which is superior to those from the conventional DNNs. Therefore, the DNN proposed herein can be widely applied to discern various actions of subjects in video scenes.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121711189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Intelligentwarning System Monitoring Vehicle Surrounding and Driver’s Behavior 智能预警系统监测车辆周围环境和驾驶员行为
Pub Date : 2022-07-18 DOI: 10.1109/ICMEW56448.2022.9859326
Tomoya Sawada, Mitsuki Nakamura
A driving assistance system can warn drivers to danger and plays an important role in avoiding serious accidents. However, there are few works considering bidirectional interaction between the system and users. In this paper, we propose a novel system named Intelligent Warning System (IWS) that can warn drivers with appropriate timing and warning level according to surrounding environment and drivers’ behavior. A contribution of IWS includes following two factors: 1) A light-weight object detection method for setting an appropriate warning level depends on potential risks of surrounding objects. 2) A time-series learning method of driver’s facial orientation for setting an appropriate warning timing depends on driver’s behavior with user-friendly interaction. Experimental results suggest that subjects want to use IWS for their daily driving and they realize the difference of its warning style that is adapted by their behaviors, especially for safety confirmation of approaching objects.
驾驶辅助系统可以提醒驾驶员注意危险,在避免严重事故中起着重要作用。然而,很少有作品考虑到系统与用户之间的双向交互。本文提出了一种智能预警系统(Intelligent Warning system, IWS),该系统可以根据周围环境和驾驶员的行为,适时地对驾驶员进行预警。IWS的贡献包括以下两个因素:1)轻量级的物体检测方法,根据周围物体的潜在风险设置适当的预警级别。2)基于驾驶员行为的驾驶员面部方位时间序列学习方法,通过用户友好交互设置合适的预警时机。实验结果表明,被试希望在日常驾驶中使用IWS,并且意识到其行为适应的警告方式的差异,特别是对于接近物体的安全确认。
{"title":"Intelligentwarning System Monitoring Vehicle Surrounding and Driver’s Behavior","authors":"Tomoya Sawada, Mitsuki Nakamura","doi":"10.1109/ICMEW56448.2022.9859326","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859326","url":null,"abstract":"A driving assistance system can warn drivers to danger and plays an important role in avoiding serious accidents. However, there are few works considering bidirectional interaction between the system and users. In this paper, we propose a novel system named Intelligent Warning System (IWS) that can warn drivers with appropriate timing and warning level according to surrounding environment and drivers’ behavior. A contribution of IWS includes following two factors: 1) A light-weight object detection method for setting an appropriate warning level depends on potential risks of surrounding objects. 2) A time-series learning method of driver’s facial orientation for setting an appropriate warning timing depends on driver’s behavior with user-friendly interaction. Experimental results suggest that subjects want to use IWS for their daily driving and they realize the difference of its warning style that is adapted by their behaviors, especially for safety confirmation of approaching objects.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129845476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAIVconf: Face Enhancement for AI-Based Video Conference with Low Bit-Rate FAIVconf:基于人工智能的低比特率视频会议的人脸增强
Pub Date : 2022-07-08 DOI: 10.1109/ICMEW56448.2022.9859370
Z. Li, Sheng-fu Lin, Shan Liu, Songnan Li, Xue Lin, Wei Wang, Wei Jiang
Recently, high-quality video conferencing with fewer transmission bits becomes a very hot and challenging problem. We propose FAIVConf, a specially designed video compression framework for video conferencing, based on the effective neural human face generation techniques. FAIVConf brings together several designs to improve the system robustness in real video conference scenarios: face swapping to avoid artifacts in background animation; facial blurring to decrease transmission bit-rate and maintain quality of extracted facial landmarks; and dynamic source update for face view interpolation to accommodate a large range of head poses. Our method achieves significant bit-rate reduction in video conference and gives much better visual quality under the same bit-rate compared with H.264 and H.265 coding schemes.
近年来,少传输位的高质量视频会议成为一个非常热门和具有挑战性的问题。基于有效的神经人脸生成技术,提出了专为视频会议设计的视频压缩框架FAIVConf。FAIVConf汇集了几种设计来提高系统在真实视频会议场景中的鲁棒性:人脸交换以避免背景动画中的伪像;人脸模糊,以降低传输比特率和保持提取的人脸特征的质量;动态源更新的面部视图插值,以适应大范围的头部姿态。与H.264和H.265编码方案相比,我们的方法在视频会议中实现了显着的比特率降低,并且在相同比特率下提供了更好的视觉质量。
{"title":"FAIVconf: Face Enhancement for AI-Based Video Conference with Low Bit-Rate","authors":"Z. Li, Sheng-fu Lin, Shan Liu, Songnan Li, Xue Lin, Wei Wang, Wei Jiang","doi":"10.1109/ICMEW56448.2022.9859370","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859370","url":null,"abstract":"Recently, high-quality video conferencing with fewer transmission bits becomes a very hot and challenging problem. We propose FAIVConf, a specially designed video compression framework for video conferencing, based on the effective neural human face generation techniques. FAIVConf brings together several designs to improve the system robustness in real video conference scenarios: face swapping to avoid artifacts in background animation; facial blurring to decrease transmission bit-rate and maintain quality of extracted facial landmarks; and dynamic source update for face view interpolation to accommodate a large range of head poses. Our method achieves significant bit-rate reduction in video conference and gives much better visual quality under the same bit-rate compared with H.264 and H.265 coding schemes.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127609594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Package Theft Detection from Smart Home Security Cameras 智能家庭安全摄像头的包裹盗窃检测
Pub Date : 2022-05-24 DOI: 10.1109/ICMEW56448.2022.9859522
Hung-Min Hsu, Xinyu Yuan, Baohua Zhu, Zhongwei Cheng, Lin Chen
Package theft detection has been a challenging task mainly due to lack of training data and a wide variety of package theft cases in reality. In this paper, we propose a new Global and Local Fusion Package Theft Detection Embedding (GLF-PTDE) framework to generate package theft scores for each segment within a video to fulfill the real-world requirements on package theft detection. Moreover, we construct a novel Package Theft Detection dataset to facilitate the research on this task. Our method achieves 80% AUC performance on the newly proposed dataset, showing the effectiveness of the proposed GLF-PTDE framework and its robustness in different real scenes for package theft detection.
由于缺乏训练数据和现实中各种各样的包裹盗窃案件,包裹盗窃检测一直是一项具有挑战性的任务。在本文中,我们提出了一个新的全局和局部融合包盗窃检测嵌入(GLF-PTDE)框架来生成视频中每个片段的包盗窃分数,以满足现实世界对包盗窃检测的要求。此外,我们还构建了一个新的包裹盗窃检测数据集来促进这项任务的研究。我们的方法在新提出的数据集上达到了80%的AUC性能,表明了所提出的GLF-PTDE框架在不同真实场景下的有效性和鲁棒性。
{"title":"Package Theft Detection from Smart Home Security Cameras","authors":"Hung-Min Hsu, Xinyu Yuan, Baohua Zhu, Zhongwei Cheng, Lin Chen","doi":"10.1109/ICMEW56448.2022.9859522","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859522","url":null,"abstract":"Package theft detection has been a challenging task mainly due to lack of training data and a wide variety of package theft cases in reality. In this paper, we propose a new Global and Local Fusion Package Theft Detection Embedding (GLF-PTDE) framework to generate package theft scores for each segment within a video to fulfill the real-world requirements on package theft detection. Moreover, we construct a novel Package Theft Detection dataset to facilitate the research on this task. Our method achieves 80% AUC performance on the newly proposed dataset, showing the effectiveness of the proposed GLF-PTDE framework and its robustness in different real scenes for package theft detection.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131947903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Perceptual Evaluation on Audio-Visual Dataset of 360 Content 360内容视听数据集的感知评价
Pub Date : 2022-05-16 DOI: 10.1109/ICMEW56448.2022.9859426
R. F. Fela, Andréas Pastor, P. Callet, N. Zacharov, Toinon Vigier, Søren Forchhammer
To open up new possibilities to assess the multimodal perceptual quality of omnidirectional media formats, we proposed a novel open source 360 audiovisual (AV) quality dataset. The dataset consists of high-quality 360 video clips in equirectangular (ERP) format and higher-order ambisonic (4th order) along with the subjective scores. Three subjective quality experiments were conducted for audio, video, and AV with the procedures detailed in this paper. Using the data from subjective tests, we demonstrated that this dataset can be used to quantify perceived audio, video, and audiovisual quality. The diversity and discriminability of subjective scores were also analyzed. Finally, we investigated how our dataset correlates with various objective quality metrics of audio and video. Evidence from the results of this study implies that the proposed dataset can benefit future studies on multimodal quality evaluation of 360 content.
为了开辟新的可能性来评估全方位媒体格式的多模态感知质量,我们提出了一个新的开源360视听(AV)质量数据集。该数据集由等长(ERP)格式和高阶双声(4阶)格式的高质量360视频片段以及主观分数组成。分别对音频、视频和AV进行主观质量实验,实验步骤详细介绍。使用主观测试的数据,我们证明了这个数据集可以用来量化感知的音频、视频和视听质量。分析了主观评分的多样性和可辨别性。最后,我们研究了我们的数据集如何与音频和视频的各种客观质量指标相关联。本研究结果的证据表明,所提出的数据集有助于360内容的多模式质量评估的未来研究。
{"title":"Perceptual Evaluation on Audio-Visual Dataset of 360 Content","authors":"R. F. Fela, Andréas Pastor, P. Callet, N. Zacharov, Toinon Vigier, Søren Forchhammer","doi":"10.1109/ICMEW56448.2022.9859426","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859426","url":null,"abstract":"To open up new possibilities to assess the multimodal perceptual quality of omnidirectional media formats, we proposed a novel open source 360 audiovisual (AV) quality dataset. The dataset consists of high-quality 360 video clips in equirectangular (ERP) format and higher-order ambisonic (4th order) along with the subjective scores. Three subjective quality experiments were conducted for audio, video, and AV with the procedures detailed in this paper. Using the data from subjective tests, we demonstrated that this dataset can be used to quantify perceived audio, video, and audiovisual quality. The diversity and discriminability of subjective scores were also analyzed. Finally, we investigated how our dataset correlates with various objective quality metrics of audio and video. Evidence from the results of this study implies that the proposed dataset can benefit future studies on multimodal quality evaluation of 360 content.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128121105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
PAMI-AD: An Activity Detector Exploiting Part-Attention and Motion Information in Surveillance Videos PAMI-AD:一种利用监控视频中部分注意力和运动信息的活动检测器
Pub Date : 2022-03-08 DOI: 10.1109/ICMEW56448.2022.9859481
Yunhao Du, Zhihang Tong, Jun-Jun Wan, Binyu Zhang, Yanyun Zhao
Activity detection in surveillance videos is a challenging task caused by small objects, complex activity categories, its untrimmed nature, etc. Existing methods are generally limited in performance due to inaccurate proposals, poor classifiers or inadequate post-processing method. In this work, we propose a comprehensive and effective activity detection system in untrimmed surveillance videos for person-centered and vehicle-centered activities. It consists of four modules, i.e., object localizer, proposal filter, activity classifier and activity refiner. For person-centered activities, a novel part-attention mechanism is proposed to explore detailed features in different body parts. As for vehicle-centered activities, we propose a localization masking method to jointly encode motion and foreground attention features. We conduct experiments on the large-scale activity detection datasets VIRAT, and achieve the best results for both groups of activities. Furthermore, our team won the 1st place in the TRECVID 2021 ActEV challenge.
由于监控视频的对象小、活动类别复杂、未经修剪等特点,活动检测是一项具有挑战性的任务。由于建议不准确、分类器较差或后处理方法不完善,现有方法的性能普遍受到限制。在这项工作中,我们提出了一个以人为中心和以车辆为中心的活动,在未修剪的监控视频中全面有效的活动检测系统。它由对象定位器、建议过滤器、活动分类器和活动细化器四个模块组成。对于以人为中心的活动,提出了一种新的部分注意机制来探索不同身体部位的详细特征。对于以车辆为中心的活动,我们提出了一种定位掩蔽方法来联合编码运动和前景注意特征。我们在大规模活动检测数据集VIRAT上进行了实验,两组活动都获得了最佳的结果。此外,我们的团队在TRECVID 2021 ActEV挑战赛中获得了第一名。
{"title":"PAMI-AD: An Activity Detector Exploiting Part-Attention and Motion Information in Surveillance Videos","authors":"Yunhao Du, Zhihang Tong, Jun-Jun Wan, Binyu Zhang, Yanyun Zhao","doi":"10.1109/ICMEW56448.2022.9859481","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859481","url":null,"abstract":"Activity detection in surveillance videos is a challenging task caused by small objects, complex activity categories, its untrimmed nature, etc. Existing methods are generally limited in performance due to inaccurate proposals, poor classifiers or inadequate post-processing method. In this work, we propose a comprehensive and effective activity detection system in untrimmed surveillance videos for person-centered and vehicle-centered activities. It consists of four modules, i.e., object localizer, proposal filter, activity classifier and activity refiner. For person-centered activities, a novel part-attention mechanism is proposed to explore detailed features in different body parts. As for vehicle-centered activities, we propose a localization masking method to jointly encode motion and foreground attention features. We conduct experiments on the large-scale activity detection datasets VIRAT, and achieve the best results for both groups of activities. Furthermore, our team won the 1st place in the TRECVID 2021 ActEV challenge.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132774801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Unsupervised Severely Deformed Mesh Reconstruction (DMR) From A Single-View Image for Longline Fishing 单视图延绳钓图像的无监督严重变形网格重建(DMR)
Pub Date : 2022-01-23 DOI: 10.1109/ICMEW56448.2022.9859312
J. Mei, Jingxiang Yu, S. Romain, Craig S. Rose, Kelsey Magrane, Graeme LeeSon, Jenq-Neng Hwang
Much progress has been made in the supervised learning of 3D reconstruction of rigid objects from multi-view images or a video. However, it is more challenging to reconstruct severely deformed objects from a single-view RGB image in an unsupervised manner. Training-based methods, such as specific category-level training, have been shown to successfully reconstruct rigid objects and slightly deformed objects like birds from a single-view image. However, they cannot effectively handle severely deformed objects and neither can be applied to some downstream tasks in the real world due to the inconsistent semantic meaning of vertices, which are crucial in defining the adopted 3D templates of objects to be reconstructed. In this work, we introduce a template-based method to infer 3D shapes from a single-view image and apply the reconstructed mesh to a downstream task, i.e., absolute length measurement. Without using 3D ground truth, our method faithfully reconstructs 3D meshes and achieves state-of-the-art accuracy in a length measurement task on a severely deformed fish dataset.
基于多视图图像或视频的刚体三维重建的监督学习已经取得了很大的进展。然而,从单视图RGB图像中以无监督的方式重建严重变形的物体更具挑战性。基于训练的方法,如特定类别级别的训练,已经被证明可以成功地从单视图图像中重建刚性物体和轻微变形的物体,如鸟类。然而,由于顶点的语义不一致,它们不能有效地处理严重变形的物体,也不能应用于现实世界中的一些下游任务,而这对于定义要重构的物体所采用的3D模板至关重要。在这项工作中,我们引入了一种基于模板的方法,从单视图图像中推断3D形状,并将重建的网格应用于下游任务,即绝对长度测量。在不使用三维地面真实值的情况下,我们的方法忠实地重建了三维网格,并在严重变形的鱼类数据集上实现了最先进的长度测量任务精度。
{"title":"Unsupervised Severely Deformed Mesh Reconstruction (DMR) From A Single-View Image for Longline Fishing","authors":"J. Mei, Jingxiang Yu, S. Romain, Craig S. Rose, Kelsey Magrane, Graeme LeeSon, Jenq-Neng Hwang","doi":"10.1109/ICMEW56448.2022.9859312","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859312","url":null,"abstract":"Much progress has been made in the supervised learning of 3D reconstruction of rigid objects from multi-view images or a video. However, it is more challenging to reconstruct severely deformed objects from a single-view RGB image in an unsupervised manner. Training-based methods, such as specific category-level training, have been shown to successfully reconstruct rigid objects and slightly deformed objects like birds from a single-view image. However, they cannot effectively handle severely deformed objects and neither can be applied to some downstream tasks in the real world due to the inconsistent semantic meaning of vertices, which are crucial in defining the adopted 3D templates of objects to be reconstructed. In this work, we introduce a template-based method to infer 3D shapes from a single-view image and apply the reconstructed mesh to a downstream task, i.e., absolute length measurement. Without using 3D ground truth, our method faithfully reconstructs 3D meshes and achieves state-of-the-art accuracy in a length measurement task on a severely deformed fish dataset.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130505732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dual-Neighborhood Deep Fusion Network for Point Cloud Analysis 用于点云分析的双邻域深度融合网络
Pub Date : 2021-08-20 DOI: 10.1109/ICMEW56448.2022.9859382
Guoquan Xu, Hezhi Cao, Yifan Zhang, Jianwei Wan, Ke Xu, Yanxin Ma
Recently, deep neural networks have made remarkable achievements in 3D point cloud analysis. However, the current shape descriptors are inadequate for capturing the information thoroughly. To handle this problem, a feature representation learning method, named Dual-Neighborhood Deep Fusion Network (DNDFN), is proposed to serve as an improved point cloud encoder for the task of point cloud analysis. Specifically, the traditional local neighborhood ignores the long-distance dependency and DNDFN utilizes an adaptive key neighborhood replenishment mechanism to overcome the limitation. Furthermore, the transmission of information between points depends on the unique potential relationship between them, so a convolution for capturing the relationship is proposed. Extensive experiments on existing benchmarks especially non-idealized datasets verify the effectiveness of DNDFN and DNDFN achieves the state of the arts.
近年来,深度神经网络在三维点云分析方面取得了令人瞩目的成就。然而,目前的形状描述符不足以完全捕获信息。为了解决这一问题,提出了一种特征表示学习方法——双邻域深度融合网络(DNDFN),作为点云分析任务的改进点云编码器。传统的局部邻域忽略了远距离依赖,DNDFN利用自适应关键邻域补充机制克服了这一局限性。此外,点之间的信息传递依赖于点之间唯一的潜在关系,因此提出了一种捕获这种关系的卷积。在现有基准,特别是非理想化数据集上的大量实验验证了DNDFN的有效性,DNDFN达到了最先进的水平。
{"title":"Dual-Neighborhood Deep Fusion Network for Point Cloud Analysis","authors":"Guoquan Xu, Hezhi Cao, Yifan Zhang, Jianwei Wan, Ke Xu, Yanxin Ma","doi":"10.1109/ICMEW56448.2022.9859382","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859382","url":null,"abstract":"Recently, deep neural networks have made remarkable achievements in 3D point cloud analysis. However, the current shape descriptors are inadequate for capturing the information thoroughly. To handle this problem, a feature representation learning method, named Dual-Neighborhood Deep Fusion Network (DNDFN), is proposed to serve as an improved point cloud encoder for the task of point cloud analysis. Specifically, the traditional local neighborhood ignores the long-distance dependency and DNDFN utilizes an adaptive key neighborhood replenishment mechanism to overcome the limitation. Furthermore, the transmission of information between points depends on the unique potential relationship between them, so a convolution for capturing the relationship is proposed. Extensive experiments on existing benchmarks especially non-idealized datasets verify the effectiveness of DNDFN and DNDFN achieves the state of the arts.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125286570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Deep Drift-Diffusion Model for Image Aesthetic Score Distribution Prediction 图像美学评分分布预测的深度漂移-扩散模型
Pub Date : 2020-08-30 DOI: 10.1109/ICMEW56448.2022.9859450
Xin Jin, Xiqiao Li, Heng Huang, Xiaodong Li, Xinghui Zhou
The task of aesthetic quality assessment is complicated due to its subjectivity. In recent years, the target representation of image aesthetic quality has changed from a one-dimensional binary classification label or numerical score to a multi-dimensional score distribution. According to current methods, the ground truth score distributions are straightforwardly regressed. However, the subjectivity of aesthetics is not taken into account, that is to say, the psychological processes of human beings are not taken into consideration, which limits the performance of the task. In this paper, we propose a Deep Drift-Diffusion (DDD) model inspired by psychologists to predict aesthetic score distribution from images. The DDD model can describe the psychological process of aesthetic perception instead of traditional modelling of the results of assessment. We use deep convolution neural networks to regress the parameters of the drift-diffusion model. The experimental results in large scale aesthetic image datasets reveal that our novel DDD model is simple but efficient, which outperforms the state-of-the-art methods in aesthetic score distribution prediction. Besides, different psychological processes can also be predicted by our model. Our work applies drift-diffusion psychological model into score distribution prediction of visual aesthetics, and has the potential of inspiring more attentions to model the psychology process of aesthetic perception.
审美素质评价具有主观性,任务复杂。近年来,图像审美质量的目标表征已经从一维的二元分类标签或数值分数向多维的分数分布转变。根据目前的方法,地面真值分布是直接回归的。但是,没有考虑到美学的主观性,也就是说,没有考虑到人的心理过程,这就限制了任务的执行。本文在心理学家的启发下,提出了一种深度漂移-扩散(DDD)模型来预测图像的审美评分分布。DDD模型可以描述审美感知的心理过程,而不是传统的评价结果模型。我们使用深度卷积神经网络对漂移扩散模型的参数进行回归。在大规模美学图像数据集上的实验结果表明,该模型简单有效,在美学评分分布预测方面优于现有方法。此外,我们的模型还可以预测不同的心理过程。本研究将漂移-扩散心理学模型应用于视觉美学的得分分布预测,具有启发人们对审美心理过程建模的潜力。
{"title":"A Deep Drift-Diffusion Model for Image Aesthetic Score Distribution Prediction","authors":"Xin Jin, Xiqiao Li, Heng Huang, Xiaodong Li, Xinghui Zhou","doi":"10.1109/ICMEW56448.2022.9859450","DOIUrl":"https://doi.org/10.1109/ICMEW56448.2022.9859450","url":null,"abstract":"The task of aesthetic quality assessment is complicated due to its subjectivity. In recent years, the target representation of image aesthetic quality has changed from a one-dimensional binary classification label or numerical score to a multi-dimensional score distribution. According to current methods, the ground truth score distributions are straightforwardly regressed. However, the subjectivity of aesthetics is not taken into account, that is to say, the psychological processes of human beings are not taken into consideration, which limits the performance of the task. In this paper, we propose a Deep Drift-Diffusion (DDD) model inspired by psychologists to predict aesthetic score distribution from images. The DDD model can describe the psychological process of aesthetic perception instead of traditional modelling of the results of assessment. We use deep convolution neural networks to regress the parameters of the drift-diffusion model. The experimental results in large scale aesthetic image datasets reveal that our novel DDD model is simple but efficient, which outperforms the state-of-the-art methods in aesthetic score distribution prediction. Besides, different psychological processes can also be predicted by our model. Our work applies drift-diffusion psychological model into score distribution prediction of visual aesthetics, and has the potential of inspiring more attentions to model the psychology process of aesthetic perception.","PeriodicalId":106759,"journal":{"name":"2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123971935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2022 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1