首页 > 最新文献

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

英文 中文
Multi-Style Transfer Generative Adversarial Network for Text Images 文本图像的多风格迁移生成对抗网络
Honghui Yuan, Keiji Yanai
In recent years, neural style transfer have shown impressive results in deep learning. In particular, for text style transfer, recent researches have successfully completed the transition from the text font domain to the text style domain. However, for text style transfer, multiple style transfer often requires learning many models, and generating multiple styles images of texts in a single model remains an unsolved problem. In this paper, we propose a multiple style transformation network for text style transfer, which can generate multiple styles of text images in a single model and control the style of texts in a simple way. The main idea is to add conditions to the transfer network so that all the styles can be trained effectively in the network, and to control the generation of each text style through the conditions. We also optimize the network so that the conditional information can be transmitted effectively in the network. The advantage of the proposed network is that multiple styles of text can be generated with only one model and that it is possible to control the generation of text styles. We have tested the proposed network on a large number of texts, and have demonstrated that it works well when generating multiple styles of text at the same time.
近年来,神经风格迁移在深度学习中取得了令人印象深刻的成果。特别是在文本样式转移方面,近年来的研究已经成功地完成了从文本字体领域到文本样式领域的过渡。然而,对于文本风格迁移,多风格迁移往往需要学习多个模型,在一个模型中生成文本的多风格图像仍然是一个未解决的问题。本文提出了一种用于文本样式转换的多样式转换网络,该网络可以在一个模型中生成多种样式的文本图像,并以简单的方式控制文本的样式。其主要思想是在传递网络中添加条件,使所有的样式都能在网络中得到有效的训练,并通过条件控制每个文本样式的生成。我们还对网络进行了优化,使条件信息能够在网络中有效地传递。该网络的优点是,只需一个模型就可以生成多种样式的文本,并且可以控制文本样式的生成。我们已经在大量文本上测试了所提出的网络,并证明它在同时生成多种风格的文本时效果良好。
{"title":"Multi-Style Transfer Generative Adversarial Network for Text Images","authors":"Honghui Yuan, Keiji Yanai","doi":"10.1109/MIPR51284.2021.00017","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00017","url":null,"abstract":"In recent years, neural style transfer have shown impressive results in deep learning. In particular, for text style transfer, recent researches have successfully completed the transition from the text font domain to the text style domain. However, for text style transfer, multiple style transfer often requires learning many models, and generating multiple styles images of texts in a single model remains an unsolved problem. In this paper, we propose a multiple style transformation network for text style transfer, which can generate multiple styles of text images in a single model and control the style of texts in a simple way. The main idea is to add conditions to the transfer network so that all the styles can be trained effectively in the network, and to control the generation of each text style through the conditions. We also optimize the network so that the conditional information can be transmitted effectively in the network. The advantage of the proposed network is that multiple styles of text can be generated with only one model and that it is possible to control the generation of text styles. We have tested the proposed network on a large number of texts, and have demonstrated that it works well when generating multiple styles of text at the same time.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116114927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transformer based Neural Network for Fine-Grained Classification of Vehicle Color 基于变压器的车辆颜色细粒度分类神经网络
Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma
The development of vehicle color recognition technology is of great significance for vehicle identification and the development of the intelligent transportation system. However, the small variety of colors and the influence of the illumination in the environment make fine-grained vehicle color recognition a challenge task. Insufficient training data and small color categories in previous datasets causes the low recognition accuracy and the inflexibility of practical using. Meanwhile, the inefficient feature learning also leads to poor recognition performance of the previous methods. Therefore, we collect a rear shooting dataset from vehicle bayonet monitoring for fine-grained vehicle color recognition. Its images can be divided into 11 main-categories and 75 color subcategories according to the proposed labeling algorithm which can eliminate the influence of illumination and assign the color annotation for each image. We propose a novel recognition model which can effectively identify the vehicle colors. We skillfully interpolate the Transformer into recognition model to enhance the feature learning capacity of conventional neural networks, and specially design a hierarchical loss function through in-depth analysis of the proposed dataset. We evaluate the designed recognition model on the dataset and it can achieve accuracy of 97.77%, which is superior to the traditional approaches.
车辆颜色识别技术的发展对车辆识别和智能交通系统的发展具有重要意义。然而,由于车辆颜色种类少,且受环境光照的影响,使得细粒度车辆颜色识别成为一项具有挑战性的任务。以前的数据集训练数据不足,颜色分类少,导致识别精度低,在实际使用中缺乏灵活性。同时,低效的特征学习也导致了以往方法的识别性能不佳。因此,我们从车辆刺刀监测中收集后方拍摄数据集,用于细粒度车辆颜色识别。根据所提出的标注算法,可以将图像划分为11个主要类别和75个颜色子类别,该算法可以消除光照的影响,并为每张图像分配颜色注释。提出了一种新的识别模型,可以有效地识别车辆颜色。我们巧妙地将Transformer插值到识别模型中,以增强传统神经网络的特征学习能力,并通过对所提出的数据集的深入分析,专门设计了一个分层损失函数。我们在数据集上对设计的识别模型进行了评估,其准确率达到97.77%,优于传统方法。
{"title":"Transformer based Neural Network for Fine-Grained Classification of Vehicle Color","authors":"Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma","doi":"10.1109/MIPR51284.2021.00025","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00025","url":null,"abstract":"The development of vehicle color recognition technology is of great significance for vehicle identification and the development of the intelligent transportation system. However, the small variety of colors and the influence of the illumination in the environment make fine-grained vehicle color recognition a challenge task. Insufficient training data and small color categories in previous datasets causes the low recognition accuracy and the inflexibility of practical using. Meanwhile, the inefficient feature learning also leads to poor recognition performance of the previous methods. Therefore, we collect a rear shooting dataset from vehicle bayonet monitoring for fine-grained vehicle color recognition. Its images can be divided into 11 main-categories and 75 color subcategories according to the proposed labeling algorithm which can eliminate the influence of illumination and assign the color annotation for each image. We propose a novel recognition model which can effectively identify the vehicle colors. We skillfully interpolate the Transformer into recognition model to enhance the feature learning capacity of conventional neural networks, and specially design a hierarchical loss function through in-depth analysis of the proposed dataset. We evaluate the designed recognition model on the dataset and it can achieve accuracy of 97.77%, which is superior to the traditional approaches.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121564860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated Cloud-based System for Endangered Language Documentation and Application 濒危语言文献与应用集成云系统
Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall
Nearly half of the world languages are considered endangered and need to be documented, analyzed, and revitalized. However, existing linguistics tools lack the accessibility to effectively analyze languages such as Blackfoot in which relative pitch movement is significant, e.g., words with the same sound sequence but convey different meanings when changing in pitches. To address this issue, we present a novel form of audio analysis with perceptual scale, and develop a consolidated and interactive toolset called MeTILDA (Melodic Transcription in Language Documentation and Analysis) to effectively capture perceived changes in pitch movement and to host other existing desktop-based linguistic tools on the cloud to enable collaboration, data-sharing, and data reuse among multiple linguistic tools.
世界上近一半的语言被认为濒临灭绝,需要记录、分析和振兴。然而,现有的语言学工具缺乏有效分析Blackfoot等相对音高运动显著的语言的可访问性,例如具有相同音序列但在音高变化时传达不同含义的单词。为了解决这个问题,我们提出了一种具有感知尺度的新形式的音频分析,并开发了一个名为MeTILDA(语言文档和分析中的旋律转录)的整合和交互式工具集,以有效地捕获音调运动的感知变化,并在云端托管其他现有的基于桌面的语言工具,以实现协作,数据共享和多个语言工具之间的数据重用。
{"title":"Integrated Cloud-based System for Endangered Language Documentation and Application","authors":"Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall","doi":"10.1109/MIPR51284.2021.00044","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00044","url":null,"abstract":"Nearly half of the world languages are considered endangered and need to be documented, analyzed, and revitalized. However, existing linguistics tools lack the accessibility to effectively analyze languages such as Blackfoot in which relative pitch movement is significant, e.g., words with the same sound sequence but convey different meanings when changing in pitches. To address this issue, we present a novel form of audio analysis with perceptual scale, and develop a consolidated and interactive toolset called MeTILDA (Melodic Transcription in Language Documentation and Analysis) to effectively capture perceived changes in pitch movement and to host other existing desktop-based linguistic tools on the cloud to enable collaboration, data-sharing, and data reuse among multiple linguistic tools.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129088140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions 考虑类别与区域相互关系的变压器预测人类行为
Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa
Recently, studies on human behavior have been frequently conducted. Predicting human mobility is one area of interest. However, it is difficult since human activities are the result of various factors such as periodicity, changes of preferences, and geographical effects. When predicting human mobility, it is essential to capture these factors.Humans may go to particular areas to visit a store of a desired category. Also, since stores of a particular category tend to open in specific areas, trajectories of visited geographical regions are helpful in understanding the purpose of visits. Therefore, the purposes of visiting stores of a desired category and of visiting a region affect each other. Capturing this mutual dependency enables to predict with higher accuracy than modeling only the superficial trajectory sequence. To capture it, a mechanism that can dynamically adjust the important categories depending on region was necessary, but the conventional methods, which can only perform static operations, have structural limitations.In the proposed model, we used the Transformer to address this problem. However, since a default Transformer can only capture unidirectional relationships, the proposed model uses mutually connected Transformers to capture the mutual relationships between categories and regions.Furthermore, most human activities have a weekly periodicity, and it is highly possible that only a part of a trajectory is important to predict human mobility. Therefore, we propose an encoder that captures the periodicity of human mobility and an attention mechanism to extract the important part of the trajectory.In our experiments, we predict whether a user will visit stores in specific categories and regions taking the trajectory sequence as input. By comparing our model with existing models, we show that the model outperforms state-of-the-art (SOTA) models in similar tasks in this experimental setup.
最近,关于人类行为的研究频繁进行。预测人类的流动性是一个有趣的领域。然而,由于人类活动是周期性、偏好变化和地理影响等各种因素的结果,因此很难做到这一点。在预测人类流动性时,必须捕捉这些因素。人们可能会去特定的区域去参观一个想要的类别的商店。此外,由于特定类别的商店往往在特定区域开业,因此访问地理区域的轨迹有助于了解访问的目的。因此,访问所需类别商店的目的和访问一个地区的目的是相互影响的。捕获这种相互依赖关系能够比仅对表面轨迹序列建模更准确地进行预测。为了捕获它,需要一种可以根据区域动态调整重要类别的机制,但传统方法只能执行静态操作,具有结构局限性。在建议的模型中,我们使用Transformer来解决这个问题。然而,由于默认的Transformer只能捕获单向关系,因此建议的模型使用相互连接的Transformer来捕获类别和区域之间的相互关系。此外,大多数人类活动都有每周的周期性,很可能只有轨迹的一部分对预测人类的流动性很重要。因此,我们提出了一种捕捉人类移动周期性的编码器和一种提取轨迹重要部分的注意机制。在我们的实验中,我们以轨迹序列作为输入,预测用户是否会访问特定类别和地区的商店。通过将我们的模型与现有模型进行比较,我们表明该模型在该实验设置的类似任务中优于最先进的(SOTA)模型。
{"title":"Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions","authors":"Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00029","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00029","url":null,"abstract":"Recently, studies on human behavior have been frequently conducted. Predicting human mobility is one area of interest. However, it is difficult since human activities are the result of various factors such as periodicity, changes of preferences, and geographical effects. When predicting human mobility, it is essential to capture these factors.Humans may go to particular areas to visit a store of a desired category. Also, since stores of a particular category tend to open in specific areas, trajectories of visited geographical regions are helpful in understanding the purpose of visits. Therefore, the purposes of visiting stores of a desired category and of visiting a region affect each other. Capturing this mutual dependency enables to predict with higher accuracy than modeling only the superficial trajectory sequence. To capture it, a mechanism that can dynamically adjust the important categories depending on region was necessary, but the conventional methods, which can only perform static operations, have structural limitations.In the proposed model, we used the Transformer to address this problem. However, since a default Transformer can only capture unidirectional relationships, the proposed model uses mutually connected Transformers to capture the mutual relationships between categories and regions.Furthermore, most human activities have a weekly periodicity, and it is highly possible that only a part of a trajectory is important to predict human mobility. Therefore, we propose an encoder that captures the periodicity of human mobility and an attention mechanism to extract the important part of the trajectory.In our experiments, we predict whether a user will visit stores in specific categories and regions taking the trajectory sequence as input. By comparing our model with existing models, we show that the model outperforms state-of-the-art (SOTA) models in similar tasks in this experimental setup.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism 京都观光地图2.0用户体验导向旅游
Jing Xu, Junjie Sun, Taishan Li, Qiang Ma
We present Kyoto sightseeing map 2.0, a web-based application, for user-experience oriented tourism through discovering and exploring sightseeing resources from User Generated Content (UGC). It focuses on adapting the massive content analysis of information from UGC, to give an additional source of information from user experience to travelers in their search information process. It decreases and bridges the information gap of sightseeing resources, especially Point of Interest (POIs), caused by the map provided by government or tourism firms from the perspective of publicity and marketing. On the one hand, Kyoto sightseeing map 2.0 offers the aesthetics quality results of photos taken in tourist spots over time in Kyoto based on the UGC by aesthetics quality assessment (AQA) with Multi-level Spatially-Pooled (MLSP) to tourists. On the other hand, the user can also use two sets of POI photos generated by the user data displayed on the map as a reference. Our application, for user-experience oriented tourism, help them make well-informed decisions of their trip based on UGC.
我们提出了京都观光地图2.0,这是一个基于web的应用程序,通过发现和探索用户生成内容(UGC)中的观光资源,以用户体验为导向的旅游。它专注于对来自UGC的海量信息进行内容分析,为旅行者在搜索信息过程中提供来自用户体验的额外信息来源。它从宣传和营销的角度缩小和弥补了政府或旅游公司提供的地图所造成的观光资源特别是景点的信息差距。一方面,京都观光地图2.0通过多层次空间池(MLSP)的美学质量评估(AQA),将基于UGC的京都旅游景点随时间拍摄的照片美学质量结果提供给游客。另一方面,用户也可以使用地图上显示的用户数据生成的两组POI照片作为参考。我们针对用户体验导向旅游的应用程序,帮助他们根据UGC做出明智的旅行决策。
{"title":"Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism","authors":"Jing Xu, Junjie Sun, Taishan Li, Qiang Ma","doi":"10.1109/MIPR51284.2021.00045","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00045","url":null,"abstract":"We present Kyoto sightseeing map 2.0, a web-based application, for user-experience oriented tourism through discovering and exploring sightseeing resources from User Generated Content (UGC). It focuses on adapting the massive content analysis of information from UGC, to give an additional source of information from user experience to travelers in their search information process. It decreases and bridges the information gap of sightseeing resources, especially Point of Interest (POIs), caused by the map provided by government or tourism firms from the perspective of publicity and marketing. On the one hand, Kyoto sightseeing map 2.0 offers the aesthetics quality results of photos taken in tourist spots over time in Kyoto based on the UGC by aesthetics quality assessment (AQA) with Multi-level Spatially-Pooled (MLSP) to tourists. On the other hand, the user can also use two sets of POI photos generated by the user data displayed on the map as a reference. Our application, for user-experience oriented tourism, help them make well-informed decisions of their trip based on UGC.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129277787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Socially Aware Multimodal Deep Neural Networks for Fake News Classification 虚假新闻分类的社会感知多模态深度神经网络
Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li
The importance of fake news detection and classification on Online Social Networks (OSN) has recently increased and drawn attention. Training machine learning models for this task requires different types of attributes or modalities for the target OSN. Existing methods mainly rely on social media text, which carries rich semantic information and can roughly explain the discrepancy between normal and multiple fake news types. However, the structural characteristics of OSNs are overlooked. This paper aims to exploit such structural characteristics and further boost the fake news classification performance on OSN. Using deep neural networks, we build a novel multimodal classifier that incorporates relaying features, textual features, and network feature concatenated with each other in a late fusion manner. Experimental results on benchmark datasets demonstrate that our socially aware architecture outperforms existing models on fake news classification.
最近,在线社交网络(OSN)上的假新闻检测和分类的重要性日益增加,并引起了人们的关注。为此任务训练机器学习模型需要目标OSN的不同类型的属性或模态。现有的方法主要依赖于社交媒体文本,社交媒体文本承载着丰富的语义信息,可以大致解释正常新闻与多种假新闻类型之间的差异。然而,osn的结构特点却被忽视了。本文旨在利用这种结构特征,进一步提升OSN上的假新闻分类性能。利用深度神经网络,我们构建了一种新的多模态分类器,该分类器将中继特征、文本特征和网络特征以后期融合的方式相互连接。在基准数据集上的实验结果表明,我们的社会意识架构在假新闻分类上优于现有模型。
{"title":"Socially Aware Multimodal Deep Neural Networks for Fake News Classification","authors":"Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li","doi":"10.1109/MIPR51284.2021.00048","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00048","url":null,"abstract":"The importance of fake news detection and classification on Online Social Networks (OSN) has recently increased and drawn attention. Training machine learning models for this task requires different types of attributes or modalities for the target OSN. Existing methods mainly rely on social media text, which carries rich semantic information and can roughly explain the discrepancy between normal and multiple fake news types. However, the structural characteristics of OSNs are overlooked. This paper aims to exploit such structural characteristics and further boost the fake news classification performance on OSN. Using deep neural networks, we build a novel multimodal classifier that incorporates relaying features, textual features, and network feature concatenated with each other in a late fusion manner. Experimental results on benchmark datasets demonstrate that our socially aware architecture outperforms existing models on fake news classification.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129773955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Dynamic Local Geometry Capture in 3D Point Cloud Classification 三维点云分类中的动态局部几何捕获
Shivanand Venkanna Sheshappanavar, C. Kambhamettu
With the advent of PointNet, the popularity of deep neural networks has increased in point cloud analysis. PointNet’s successor, PointNet++, partitions the input point cloud and recursively applies PointNet to capture local geometry. PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. Due to its uniform scale in all directions, a ball lacks orientation and is ineffective in capturing complex local neighborhoods. Few recent models replace a fixed-sized ball with a fixed-sized ellipsoid or a fixed-sized cuboid to capture local neighborhoods. However, these methods are not still fully effective in capturing varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model. Our model, along with dynamically oriented and scaled ellipsoid querying, achieves 92.1% classification accuracy on the ModelNet40 dataset. We achieve state-of-the-art 3D classification results on all six variants of the real-world ScanObjectNN dataset with an accuracy of 82.0% on the most challenging variant.
随着PointNet的出现,深度神经网络在点云分析中的应用日益普及。PointNet的后继产品PointNet++对输入点云进行分区,并递归地应用PointNet来捕获局部几何图形。PointNet++模型在其集合抽象层中使用球查询进行局部几何捕获。基于PointNet++的单尺度分组的几个模型继续使用固定半径球的球查询。由于球在所有方向上的尺度都是均匀的,因此球缺乏方向性,无法捕获复杂的局部邻域。最近很少有模型用固定大小的椭球或固定大小的长方体代替固定大小的球来捕捉局部邻域。然而,这些方法仍然不能完全有效地从物体表面的不同局部邻域捕获不同的几何比例。为了更好地捕获局部几何形状,提出了一种基于唯一局部信息的动态定向和缩放椭球体的新技术。我们还提出了一个基于单尺度分组模型的单集抽象ReducedPointNet++。我们的模型与动态定向和缩放椭球查询一起,在ModelNet40数据集上实现了92.1%的分类精度。我们在真实世界ScanObjectNN数据集的所有六个变体上实现了最先进的3D分类结果,在最具挑战性的变体上准确率达到82.0%。
{"title":"Dynamic Local Geometry Capture in 3D Point Cloud Classification","authors":"Shivanand Venkanna Sheshappanavar, C. Kambhamettu","doi":"10.1109/MIPR51284.2021.00031","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00031","url":null,"abstract":"With the advent of PointNet, the popularity of deep neural networks has increased in point cloud analysis. PointNet’s successor, PointNet++, partitions the input point cloud and recursively applies PointNet to capture local geometry. PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. Due to its uniform scale in all directions, a ball lacks orientation and is ineffective in capturing complex local neighborhoods. Few recent models replace a fixed-sized ball with a fixed-sized ellipsoid or a fixed-sized cuboid to capture local neighborhoods. However, these methods are not still fully effective in capturing varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model. Our model, along with dynamically oriented and scaled ellipsoid querying, achieves 92.1% classification accuracy on the ModelNet40 dataset. We achieve state-of-the-art 3D classification results on all six variants of the real-world ScanObjectNN dataset with an accuracy of 82.0% on the most challenging variant.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134147919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Brain-Machine-Ratio Model for Designer and AI Collaboration 设计师与人工智能协作的脑机比例模型
Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang
Recently, artificial intelligence is profoundly changing design practice. The relationship between designers and applied artificial intelligence urgently needs a framework and theory to describe and measure. Thus, this article establishes the Brain-Machine-Ratio (BMR) model, which examines the collaborative relationship between the designers and artificial intelligence with the ratio of human and machine labor in the process of design work. The core approach is modeling the proportion of human and AI in seven design tasks on the time dimension. Based on both qualitative and quantitative evaluation, we proposed the concept and statistics of the Brain-Machine-Ratio model and deduced the further collaborative relationship between designers and artificial intelligence.
近年来,人工智能正在深刻地改变着设计实践。设计者与应用人工智能之间的关系,迫切需要一个框架和理论来描述和衡量。因此,本文建立了脑机比(Brain-Machine-Ratio, BMR)模型,以设计工作过程中人与机器劳动的比例来考察设计师与人工智能之间的协同关系。其核心方法是在时间维度上对七个设计任务中人与人工智能的比例进行建模。在定性和定量评价的基础上,提出了脑机比模型的概念和统计,并推导了设计师与人工智能之间的进一步协作关系。
{"title":"The Brain-Machine-Ratio Model for Designer and AI Collaboration","authors":"Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang","doi":"10.1109/MIPR51284.2021.00058","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00058","url":null,"abstract":"Recently, artificial intelligence is profoundly changing design practice. The relationship between designers and applied artificial intelligence urgently needs a framework and theory to describe and measure. Thus, this article establishes the Brain-Machine-Ratio (BMR) model, which examines the collaborative relationship between the designers and artificial intelligence with the ratio of human and machine labor in the process of design work. The core approach is modeling the proportion of human and AI in seven design tasks on the time dimension. Based on both qualitative and quantitative evaluation, we proposed the concept and statistics of the Brain-Machine-Ratio model and deduced the further collaborative relationship between designers and artificial intelligence.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129736961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Introduction to the JPEG Fake Media Initiative 介绍JPEG假媒体倡议
F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi
Recent advances in media creation and modification allow to produce near realistic media assets that are almost indistinguishable from original assets to the human eye. These developments open opportunities for creative production of new media in the entertainment and art industry. However, the intentional or unintentional spread of manipulated media, i.e., modified media with the intention to induce misinterpretation, also imposes risks such as social unrest, spread of rumours for political gain or encouraging hate crimes. The clear and transparent annotation of media modifications is considered to be a crucial element in many usage scenarios bringing trust to the users. This has already triggered various organizations to develop mechanisms that can detect and annotate modified media assets when they are shared. However, these annotations should be attached to the media in a secure way to prevent them of being compromised. In addition, to achieve a wide adoption of such an annotation ecosystem, interoperability is essential and this clearly calls for a standard. This paper presents an initiative by the JPEG Committee called JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall support usage scenarios that are in good faith as well as those with malicious intent. This paper gives an overview of the current state of this initiative and introduces already identified use cases and requirements.
媒体创作和修改的最新进展允许生产接近真实的媒体资产,几乎与原始资产的人眼无法区分。这些发展为娱乐和艺术行业的新媒体创意生产提供了机会。然而,被操纵的媒体有意或无意地传播,即意图引起误解的修改媒体,也会带来社会动荡、传播谣言以获取政治利益或鼓励仇恨犯罪等风险。媒体修改的清晰和透明的注释被认为是在许多使用场景中为用户带来信任的关键因素。这已经促使各种组织开发机制,以便在共享时检测和注释修改过的媒体资产。但是,应该以安全的方式将这些注释附加到媒体上,以防止它们被破坏。此外,为了实现这种注释生态系统的广泛采用,互操作性是必不可少的,这显然需要一个标准。本文介绍了JPEG委员会提出的一项倡议,称为JPEG假媒体。JPEG Fake Media的范围是创建一个标准,可以促进媒体资产创建和修改的安全可靠的注释。该标准应支持善意和恶意的使用场景。本文概述了该计划的当前状态,并介绍了已经确定的用例和需求。
{"title":"An Introduction to the JPEG Fake Media Initiative","authors":"F. Temmermans, Deepayan Bhowmik, Fernando Pereira, T. Ebrahimi","doi":"10.1109/MIPR51284.2021.00075","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00075","url":null,"abstract":"Recent advances in media creation and modification allow to produce near realistic media assets that are almost indistinguishable from original assets to the human eye. These developments open opportunities for creative production of new media in the entertainment and art industry. However, the intentional or unintentional spread of manipulated media, i.e., modified media with the intention to induce misinterpretation, also imposes risks such as social unrest, spread of rumours for political gain or encouraging hate crimes. The clear and transparent annotation of media modifications is considered to be a crucial element in many usage scenarios bringing trust to the users. This has already triggered various organizations to develop mechanisms that can detect and annotate modified media assets when they are shared. However, these annotations should be attached to the media in a secure way to prevent them of being compromised. In addition, to achieve a wide adoption of such an annotation ecosystem, interoperability is essential and this clearly calls for a standard. This paper presents an initiative by the JPEG Committee called JPEG Fake Media. The scope of JPEG Fake Media is the creation of a standard that can facilitate the secure and reliable annotation of media asset creation and modifications. The standard shall support usage scenarios that are in good faith as well as those with malicious intent. This paper gives an overview of the current state of this initiative and introduces already identified use cases and requirements.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114900662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Topic-Enhanced Memory Networks: Time-series Behavior Prediction based on Changing Intrinsic Consciousnesses 动态主题增强记忆网络:基于改变内在意识的时间序列行为预测
Ryoko Nakamura, Hirofumi Sano, Aozora Inagaki, Ryoichi Osawa, T. Takagi, Isshu Munemasa
In the field of behavior prediction, methods have been developed to predict the state of the user by using the previous state or time-series of recorded behavior histories. However, so far, there has been no effort to capture time series reflecting the intrinsic consciousnesses and changes thereof of users. Here, we propose a model that captures changes in intrinsic consciousnesses of the user, called Dynamic Topic-Enhanced Memory Networks (DTEMN), for location-based advertising. In comparative experiments, we used DTEMN to predict places where users will visit in the future. The results show capturing changes in intrinsic consciousnesses using DTEMN is effective in improving prediction performance. In addition, we show an improvement in interpretability when simultaneously learning topics expressed as multiple intrinsic consciousnesses.
在行为预测领域,已经开发了通过使用记录的行为历史的先前状态或时间序列来预测用户状态的方法。然而,到目前为止,还没有努力捕捉反映用户内在意识及其变化的时间序列。在这里,我们提出了一个模型来捕捉用户内在意识的变化,称为动态主题增强记忆网络(DTEMN),用于基于位置的广告。在对比实验中,我们使用DTEMN来预测用户将来会访问的地方。结果表明,利用DTEMN捕捉内在意识的变化可以有效地提高预测性能。此外,当同时学习以多重内在意识表达的主题时,我们显示了可解释性的改善。
{"title":"Dynamic Topic-Enhanced Memory Networks: Time-series Behavior Prediction based on Changing Intrinsic Consciousnesses","authors":"Ryoko Nakamura, Hirofumi Sano, Aozora Inagaki, Ryoichi Osawa, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00035","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00035","url":null,"abstract":"In the field of behavior prediction, methods have been developed to predict the state of the user by using the previous state or time-series of recorded behavior histories. However, so far, there has been no effort to capture time series reflecting the intrinsic consciousnesses and changes thereof of users. Here, we propose a model that captures changes in intrinsic consciousnesses of the user, called Dynamic Topic-Enhanced Memory Networks (DTEMN), for location-based advertising. In comparative experiments, we used DTEMN to predict places where users will visit in the future. The results show capturing changes in intrinsic consciousnesses using DTEMN is effective in improving prediction performance. In addition, we show an improvement in interpretability when simultaneously learning topics expressed as multiple intrinsic consciousnesses.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"90 30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129849069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1