首页 > 最新文献

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

英文 中文
Multi-Style Transfer Generative Adversarial Network for Text Images 文本图像的多风格迁移生成对抗网络
Honghui Yuan, Keiji Yanai
In recent years, neural style transfer have shown impressive results in deep learning. In particular, for text style transfer, recent researches have successfully completed the transition from the text font domain to the text style domain. However, for text style transfer, multiple style transfer often requires learning many models, and generating multiple styles images of texts in a single model remains an unsolved problem. In this paper, we propose a multiple style transformation network for text style transfer, which can generate multiple styles of text images in a single model and control the style of texts in a simple way. The main idea is to add conditions to the transfer network so that all the styles can be trained effectively in the network, and to control the generation of each text style through the conditions. We also optimize the network so that the conditional information can be transmitted effectively in the network. The advantage of the proposed network is that multiple styles of text can be generated with only one model and that it is possible to control the generation of text styles. We have tested the proposed network on a large number of texts, and have demonstrated that it works well when generating multiple styles of text at the same time.
近年来,神经风格迁移在深度学习中取得了令人印象深刻的成果。特别是在文本样式转移方面,近年来的研究已经成功地完成了从文本字体领域到文本样式领域的过渡。然而,对于文本风格迁移,多风格迁移往往需要学习多个模型,在一个模型中生成文本的多风格图像仍然是一个未解决的问题。本文提出了一种用于文本样式转换的多样式转换网络,该网络可以在一个模型中生成多种样式的文本图像,并以简单的方式控制文本的样式。其主要思想是在传递网络中添加条件,使所有的样式都能在网络中得到有效的训练,并通过条件控制每个文本样式的生成。我们还对网络进行了优化,使条件信息能够在网络中有效地传递。该网络的优点是,只需一个模型就可以生成多种样式的文本,并且可以控制文本样式的生成。我们已经在大量文本上测试了所提出的网络,并证明它在同时生成多种风格的文本时效果良好。
{"title":"Multi-Style Transfer Generative Adversarial Network for Text Images","authors":"Honghui Yuan, Keiji Yanai","doi":"10.1109/MIPR51284.2021.00017","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00017","url":null,"abstract":"In recent years, neural style transfer have shown impressive results in deep learning. In particular, for text style transfer, recent researches have successfully completed the transition from the text font domain to the text style domain. However, for text style transfer, multiple style transfer often requires learning many models, and generating multiple styles images of texts in a single model remains an unsolved problem. In this paper, we propose a multiple style transformation network for text style transfer, which can generate multiple styles of text images in a single model and control the style of texts in a simple way. The main idea is to add conditions to the transfer network so that all the styles can be trained effectively in the network, and to control the generation of each text style through the conditions. We also optimize the network so that the conditional information can be transmitted effectively in the network. The advantage of the proposed network is that multiple styles of text can be generated with only one model and that it is possible to control the generation of text styles. We have tested the proposed network on a large number of texts, and have demonstrated that it works well when generating multiple styles of text at the same time.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116114927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Transformer based Neural Network for Fine-Grained Classification of Vehicle Color 基于变压器的车辆颜色细粒度分类神经网络
Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma
The development of vehicle color recognition technology is of great significance for vehicle identification and the development of the intelligent transportation system. However, the small variety of colors and the influence of the illumination in the environment make fine-grained vehicle color recognition a challenge task. Insufficient training data and small color categories in previous datasets causes the low recognition accuracy and the inflexibility of practical using. Meanwhile, the inefficient feature learning also leads to poor recognition performance of the previous methods. Therefore, we collect a rear shooting dataset from vehicle bayonet monitoring for fine-grained vehicle color recognition. Its images can be divided into 11 main-categories and 75 color subcategories according to the proposed labeling algorithm which can eliminate the influence of illumination and assign the color annotation for each image. We propose a novel recognition model which can effectively identify the vehicle colors. We skillfully interpolate the Transformer into recognition model to enhance the feature learning capacity of conventional neural networks, and specially design a hierarchical loss function through in-depth analysis of the proposed dataset. We evaluate the designed recognition model on the dataset and it can achieve accuracy of 97.77%, which is superior to the traditional approaches.
车辆颜色识别技术的发展对车辆识别和智能交通系统的发展具有重要意义。然而,由于车辆颜色种类少,且受环境光照的影响,使得细粒度车辆颜色识别成为一项具有挑战性的任务。以前的数据集训练数据不足,颜色分类少,导致识别精度低,在实际使用中缺乏灵活性。同时,低效的特征学习也导致了以往方法的识别性能不佳。因此,我们从车辆刺刀监测中收集后方拍摄数据集,用于细粒度车辆颜色识别。根据所提出的标注算法,可以将图像划分为11个主要类别和75个颜色子类别,该算法可以消除光照的影响,并为每张图像分配颜色注释。提出了一种新的识别模型,可以有效地识别车辆颜色。我们巧妙地将Transformer插值到识别模型中,以增强传统神经网络的特征学习能力,并通过对所提出的数据集的深入分析,专门设计了一个分层损失函数。我们在数据集上对设计的识别模型进行了评估,其准确率达到97.77%,优于传统方法。
{"title":"Transformer based Neural Network for Fine-Grained Classification of Vehicle Color","authors":"Yingjin Wang, Chuanming Wang, Yuchao Zheng, Huiyuan Fu, Huadong Ma","doi":"10.1109/MIPR51284.2021.00025","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00025","url":null,"abstract":"The development of vehicle color recognition technology is of great significance for vehicle identification and the development of the intelligent transportation system. However, the small variety of colors and the influence of the illumination in the environment make fine-grained vehicle color recognition a challenge task. Insufficient training data and small color categories in previous datasets causes the low recognition accuracy and the inflexibility of practical using. Meanwhile, the inefficient feature learning also leads to poor recognition performance of the previous methods. Therefore, we collect a rear shooting dataset from vehicle bayonet monitoring for fine-grained vehicle color recognition. Its images can be divided into 11 main-categories and 75 color subcategories according to the proposed labeling algorithm which can eliminate the influence of illumination and assign the color annotation for each image. We propose a novel recognition model which can effectively identify the vehicle colors. We skillfully interpolate the Transformer into recognition model to enhance the feature learning capacity of conventional neural networks, and specially design a hierarchical loss function through in-depth analysis of the proposed dataset. We evaluate the designed recognition model on the dataset and it can achieve accuracy of 97.77%, which is superior to the traditional approaches.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121564860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism 京都观光地图2.0用户体验导向旅游
Jing Xu, Junjie Sun, Taishan Li, Qiang Ma
We present Kyoto sightseeing map 2.0, a web-based application, for user-experience oriented tourism through discovering and exploring sightseeing resources from User Generated Content (UGC). It focuses on adapting the massive content analysis of information from UGC, to give an additional source of information from user experience to travelers in their search information process. It decreases and bridges the information gap of sightseeing resources, especially Point of Interest (POIs), caused by the map provided by government or tourism firms from the perspective of publicity and marketing. On the one hand, Kyoto sightseeing map 2.0 offers the aesthetics quality results of photos taken in tourist spots over time in Kyoto based on the UGC by aesthetics quality assessment (AQA) with Multi-level Spatially-Pooled (MLSP) to tourists. On the other hand, the user can also use two sets of POI photos generated by the user data displayed on the map as a reference. Our application, for user-experience oriented tourism, help them make well-informed decisions of their trip based on UGC.
我们提出了京都观光地图2.0,这是一个基于web的应用程序,通过发现和探索用户生成内容(UGC)中的观光资源,以用户体验为导向的旅游。它专注于对来自UGC的海量信息进行内容分析,为旅行者在搜索信息过程中提供来自用户体验的额外信息来源。它从宣传和营销的角度缩小和弥补了政府或旅游公司提供的地图所造成的观光资源特别是景点的信息差距。一方面,京都观光地图2.0通过多层次空间池(MLSP)的美学质量评估(AQA),将基于UGC的京都旅游景点随时间拍摄的照片美学质量结果提供给游客。另一方面,用户也可以使用地图上显示的用户数据生成的两组POI照片作为参考。我们针对用户体验导向旅游的应用程序,帮助他们根据UGC做出明智的旅行决策。
{"title":"Kyoto Sightseeing Map 2.0 for User-Experience Oriented Tourism","authors":"Jing Xu, Junjie Sun, Taishan Li, Qiang Ma","doi":"10.1109/MIPR51284.2021.00045","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00045","url":null,"abstract":"We present Kyoto sightseeing map 2.0, a web-based application, for user-experience oriented tourism through discovering and exploring sightseeing resources from User Generated Content (UGC). It focuses on adapting the massive content analysis of information from UGC, to give an additional source of information from user experience to travelers in their search information process. It decreases and bridges the information gap of sightseeing resources, especially Point of Interest (POIs), caused by the map provided by government or tourism firms from the perspective of publicity and marketing. On the one hand, Kyoto sightseeing map 2.0 offers the aesthetics quality results of photos taken in tourist spots over time in Kyoto based on the UGC by aesthetics quality assessment (AQA) with Multi-level Spatially-Pooled (MLSP) to tourists. On the other hand, the user can also use two sets of POI photos generated by the user data displayed on the map as a reference. Our application, for user-experience oriented tourism, help them make well-informed decisions of their trip based on UGC.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129277787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Integrated Cloud-based System for Endangered Language Documentation and Application 濒危语言文献与应用集成云系统
Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall
Nearly half of the world languages are considered endangered and need to be documented, analyzed, and revitalized. However, existing linguistics tools lack the accessibility to effectively analyze languages such as Blackfoot in which relative pitch movement is significant, e.g., words with the same sound sequence but convey different meanings when changing in pitches. To address this issue, we present a novel form of audio analysis with perceptual scale, and develop a consolidated and interactive toolset called MeTILDA (Melodic Transcription in Language Documentation and Analysis) to effectively capture perceived changes in pitch movement and to host other existing desktop-based linguistic tools on the cloud to enable collaboration, data-sharing, and data reuse among multiple linguistic tools.
世界上近一半的语言被认为濒临灭绝,需要记录、分析和振兴。然而,现有的语言学工具缺乏有效分析Blackfoot等相对音高运动显著的语言的可访问性,例如具有相同音序列但在音高变化时传达不同含义的单词。为了解决这个问题,我们提出了一种具有感知尺度的新形式的音频分析,并开发了一个名为MeTILDA(语言文档和分析中的旋律转录)的整合和交互式工具集,以有效地捕获音调运动的感知变化,并在云端托管其他现有的基于桌面的语言工具,以实现协作,数据共享和多个语言工具之间的数据重用。
{"title":"Integrated Cloud-based System for Endangered Language Documentation and Application","authors":"Min Chen, Jignasha Borad, Mizuki Miyashita, James Randall","doi":"10.1109/MIPR51284.2021.00044","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00044","url":null,"abstract":"Nearly half of the world languages are considered endangered and need to be documented, analyzed, and revitalized. However, existing linguistics tools lack the accessibility to effectively analyze languages such as Blackfoot in which relative pitch movement is significant, e.g., words with the same sound sequence but convey different meanings when changing in pitches. To address this issue, we present a novel form of audio analysis with perceptual scale, and develop a consolidated and interactive toolset called MeTILDA (Melodic Transcription in Language Documentation and Analysis) to effectively capture perceived changes in pitch movement and to host other existing desktop-based linguistic tools on the cloud to enable collaboration, data-sharing, and data reuse among multiple linguistic tools.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129088140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions 考虑类别与区域相互关系的变压器预测人类行为
Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa
Recently, studies on human behavior have been frequently conducted. Predicting human mobility is one area of interest. However, it is difficult since human activities are the result of various factors such as periodicity, changes of preferences, and geographical effects. When predicting human mobility, it is essential to capture these factors.Humans may go to particular areas to visit a store of a desired category. Also, since stores of a particular category tend to open in specific areas, trajectories of visited geographical regions are helpful in understanding the purpose of visits. Therefore, the purposes of visiting stores of a desired category and of visiting a region affect each other. Capturing this mutual dependency enables to predict with higher accuracy than modeling only the superficial trajectory sequence. To capture it, a mechanism that can dynamically adjust the important categories depending on region was necessary, but the conventional methods, which can only perform static operations, have structural limitations.In the proposed model, we used the Transformer to address this problem. However, since a default Transformer can only capture unidirectional relationships, the proposed model uses mutually connected Transformers to capture the mutual relationships between categories and regions.Furthermore, most human activities have a weekly periodicity, and it is highly possible that only a part of a trajectory is important to predict human mobility. Therefore, we propose an encoder that captures the periodicity of human mobility and an attention mechanism to extract the important part of the trajectory.In our experiments, we predict whether a user will visit stores in specific categories and regions taking the trajectory sequence as input. By comparing our model with existing models, we show that the model outperforms state-of-the-art (SOTA) models in similar tasks in this experimental setup.
最近,关于人类行为的研究频繁进行。预测人类的流动性是一个有趣的领域。然而,由于人类活动是周期性、偏好变化和地理影响等各种因素的结果,因此很难做到这一点。在预测人类流动性时,必须捕捉这些因素。人们可能会去特定的区域去参观一个想要的类别的商店。此外,由于特定类别的商店往往在特定区域开业,因此访问地理区域的轨迹有助于了解访问的目的。因此,访问所需类别商店的目的和访问一个地区的目的是相互影响的。捕获这种相互依赖关系能够比仅对表面轨迹序列建模更准确地进行预测。为了捕获它,需要一种可以根据区域动态调整重要类别的机制,但传统方法只能执行静态操作,具有结构局限性。在建议的模型中,我们使用Transformer来解决这个问题。然而,由于默认的Transformer只能捕获单向关系,因此建议的模型使用相互连接的Transformer来捕获类别和区域之间的相互关系。此外,大多数人类活动都有每周的周期性,很可能只有轨迹的一部分对预测人类的流动性很重要。因此,我们提出了一种捕捉人类移动周期性的编码器和一种提取轨迹重要部分的注意机制。在我们的实验中,我们以轨迹序列作为输入,预测用户是否会访问特定类别和地区的商店。通过将我们的模型与现有模型进行比较,我们表明该模型在该实验设置的类似任务中优于最先进的(SOTA)模型。
{"title":"Predicting Human Behavior with Transformer Considering the Mutual Relationship between Categories and Regions","authors":"Ryoichi Osawa, Keiichi Suekane, Ryoko Nakamura, Aozora Inagaki, T. Takagi, Isshu Munemasa","doi":"10.1109/MIPR51284.2021.00029","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00029","url":null,"abstract":"Recently, studies on human behavior have been frequently conducted. Predicting human mobility is one area of interest. However, it is difficult since human activities are the result of various factors such as periodicity, changes of preferences, and geographical effects. When predicting human mobility, it is essential to capture these factors.Humans may go to particular areas to visit a store of a desired category. Also, since stores of a particular category tend to open in specific areas, trajectories of visited geographical regions are helpful in understanding the purpose of visits. Therefore, the purposes of visiting stores of a desired category and of visiting a region affect each other. Capturing this mutual dependency enables to predict with higher accuracy than modeling only the superficial trajectory sequence. To capture it, a mechanism that can dynamically adjust the important categories depending on region was necessary, but the conventional methods, which can only perform static operations, have structural limitations.In the proposed model, we used the Transformer to address this problem. However, since a default Transformer can only capture unidirectional relationships, the proposed model uses mutually connected Transformers to capture the mutual relationships between categories and regions.Furthermore, most human activities have a weekly periodicity, and it is highly possible that only a part of a trajectory is important to predict human mobility. Therefore, we propose an encoder that captures the periodicity of human mobility and an attention mechanism to extract the important part of the trajectory.In our experiments, we predict whether a user will visit stores in specific categories and regions taking the trajectory sequence as input. By comparing our model with existing models, we show that the model outperforms state-of-the-art (SOTA) models in similar tasks in this experimental setup.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129276660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Brain-Machine-Ratio Model for Designer and AI Collaboration 设计师与人工智能协作的脑机比例模型
Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang
Recently, artificial intelligence is profoundly changing design practice. The relationship between designers and applied artificial intelligence urgently needs a framework and theory to describe and measure. Thus, this article establishes the Brain-Machine-Ratio (BMR) model, which examines the collaborative relationship between the designers and artificial intelligence with the ratio of human and machine labor in the process of design work. The core approach is modeling the proportion of human and AI in seven design tasks on the time dimension. Based on both qualitative and quantitative evaluation, we proposed the concept and statistics of the Brain-Machine-Ratio model and deduced the further collaborative relationship between designers and artificial intelligence.
近年来,人工智能正在深刻地改变着设计实践。设计者与应用人工智能之间的关系,迫切需要一个框架和理论来描述和衡量。因此,本文建立了脑机比(Brain-Machine-Ratio, BMR)模型,以设计工作过程中人与机器劳动的比例来考察设计师与人工智能之间的协同关系。其核心方法是在时间维度上对七个设计任务中人与人工智能的比例进行建模。在定性和定量评价的基础上,提出了脑机比模型的概念和统计,并推导了设计师与人工智能之间的进一步协作关系。
{"title":"The Brain-Machine-Ratio Model for Designer and AI Collaboration","authors":"Ling Fan, Yifang Bao, Shuyu Gong, Sida Yan, Harry J. Wang","doi":"10.1109/MIPR51284.2021.00058","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00058","url":null,"abstract":"Recently, artificial intelligence is profoundly changing design practice. The relationship between designers and applied artificial intelligence urgently needs a framework and theory to describe and measure. Thus, this article establishes the Brain-Machine-Ratio (BMR) model, which examines the collaborative relationship between the designers and artificial intelligence with the ratio of human and machine labor in the process of design work. The core approach is modeling the proportion of human and AI in seven design tasks on the time dimension. Based on both qualitative and quantitative evaluation, we proposed the concept and statistics of the Brain-Machine-Ratio model and deduced the further collaborative relationship between designers and artificial intelligence.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129736961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Local Geometry Capture in 3D Point Cloud Classification 三维点云分类中的动态局部几何捕获
Shivanand Venkanna Sheshappanavar, C. Kambhamettu
With the advent of PointNet, the popularity of deep neural networks has increased in point cloud analysis. PointNet’s successor, PointNet++, partitions the input point cloud and recursively applies PointNet to capture local geometry. PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. Due to its uniform scale in all directions, a ball lacks orientation and is ineffective in capturing complex local neighborhoods. Few recent models replace a fixed-sized ball with a fixed-sized ellipsoid or a fixed-sized cuboid to capture local neighborhoods. However, these methods are not still fully effective in capturing varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model. Our model, along with dynamically oriented and scaled ellipsoid querying, achieves 92.1% classification accuracy on the ModelNet40 dataset. We achieve state-of-the-art 3D classification results on all six variants of the real-world ScanObjectNN dataset with an accuracy of 82.0% on the most challenging variant.
随着PointNet的出现,深度神经网络在点云分析中的应用日益普及。PointNet的后继产品PointNet++对输入点云进行分区,并递归地应用PointNet来捕获局部几何图形。PointNet++模型在其集合抽象层中使用球查询进行局部几何捕获。基于PointNet++的单尺度分组的几个模型继续使用固定半径球的球查询。由于球在所有方向上的尺度都是均匀的,因此球缺乏方向性,无法捕获复杂的局部邻域。最近很少有模型用固定大小的椭球或固定大小的长方体代替固定大小的球来捕捉局部邻域。然而,这些方法仍然不能完全有效地从物体表面的不同局部邻域捕获不同的几何比例。为了更好地捕获局部几何形状,提出了一种基于唯一局部信息的动态定向和缩放椭球体的新技术。我们还提出了一个基于单尺度分组模型的单集抽象ReducedPointNet++。我们的模型与动态定向和缩放椭球查询一起,在ModelNet40数据集上实现了92.1%的分类精度。我们在真实世界ScanObjectNN数据集的所有六个变体上实现了最先进的3D分类结果,在最具挑战性的变体上准确率达到82.0%。
{"title":"Dynamic Local Geometry Capture in 3D Point Cloud Classification","authors":"Shivanand Venkanna Sheshappanavar, C. Kambhamettu","doi":"10.1109/MIPR51284.2021.00031","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00031","url":null,"abstract":"With the advent of PointNet, the popularity of deep neural networks has increased in point cloud analysis. PointNet’s successor, PointNet++, partitions the input point cloud and recursively applies PointNet to capture local geometry. PointNet++ model uses ball querying for local geometry capture in its set abstraction layers. Several models based on single scale grouping of PointNet++ continue to use ball querying with a fixed-radius ball. Due to its uniform scale in all directions, a ball lacks orientation and is ineffective in capturing complex local neighborhoods. Few recent models replace a fixed-sized ball with a fixed-sized ellipsoid or a fixed-sized cuboid to capture local neighborhoods. However, these methods are not still fully effective in capturing varying geometry proportions from different local neighborhoods on the object surface. We propose a novel technique of dynamically oriented and scaled ellipsoid based on unique local information to capture the local geometry better. We also propose ReducedPointNet++, a single set abstraction based single scale grouping model. Our model, along with dynamically oriented and scaled ellipsoid querying, achieves 92.1% classification accuracy on the ModelNet40 dataset. We achieve state-of-the-art 3D classification results on all six variants of the real-world ScanObjectNN dataset with an accuracy of 82.0% on the most challenging variant.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"383 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134147919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Socially Aware Multimodal Deep Neural Networks for Fake News Classification 虚假新闻分类的社会感知多模态深度神经网络
Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li
The importance of fake news detection and classification on Online Social Networks (OSN) has recently increased and drawn attention. Training machine learning models for this task requires different types of attributes or modalities for the target OSN. Existing methods mainly rely on social media text, which carries rich semantic information and can roughly explain the discrepancy between normal and multiple fake news types. However, the structural characteristics of OSNs are overlooked. This paper aims to exploit such structural characteristics and further boost the fake news classification performance on OSN. Using deep neural networks, we build a novel multimodal classifier that incorporates relaying features, textual features, and network feature concatenated with each other in a late fusion manner. Experimental results on benchmark datasets demonstrate that our socially aware architecture outperforms existing models on fake news classification.
最近,在线社交网络(OSN)上的假新闻检测和分类的重要性日益增加,并引起了人们的关注。为此任务训练机器学习模型需要目标OSN的不同类型的属性或模态。现有的方法主要依赖于社交媒体文本,社交媒体文本承载着丰富的语义信息,可以大致解释正常新闻与多种假新闻类型之间的差异。然而,osn的结构特点却被忽视了。本文旨在利用这种结构特征,进一步提升OSN上的假新闻分类性能。利用深度神经网络,我们构建了一种新的多模态分类器,该分类器将中继特征、文本特征和网络特征以后期融合的方式相互连接。在基准数据集上的实验结果表明,我们的社会意识架构在假新闻分类上优于现有模型。
{"title":"Socially Aware Multimodal Deep Neural Networks for Fake News Classification","authors":"Saed Rezayi, Saber Soleymani, H. Arabnia, Sheng Li","doi":"10.1109/MIPR51284.2021.00048","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00048","url":null,"abstract":"The importance of fake news detection and classification on Online Social Networks (OSN) has recently increased and drawn attention. Training machine learning models for this task requires different types of attributes or modalities for the target OSN. Existing methods mainly rely on social media text, which carries rich semantic information and can roughly explain the discrepancy between normal and multiple fake news types. However, the structural characteristics of OSNs are overlooked. This paper aims to exploit such structural characteristics and further boost the fake news classification performance on OSN. Using deep neural networks, we build a novel multimodal classifier that incorporates relaying features, textual features, and network feature concatenated with each other in a late fusion manner. Experimental results on benchmark datasets demonstrate that our socially aware architecture outperforms existing models on fake news classification.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129773955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learning-based Tensor Decomposition with Adaptive Rank Penalty for CNNs Compression 基于自适应秩惩罚的学习张量分解在cnn压缩中的应用
Deli Yu, Peipei Yang, Cheng-Lin Liu
Low-rank tensor decomposition is a widely-used strategy to compress convolutional neural networks (CNNs). Existing learning-based decomposition methods encourage low-rank filter weights via regularizer of filters’ pair-wise force or nuclear norm during training. However, these methods can not obtain the satisfactory low-rank structure. We propose a new method with an adaptive rank penalty to learn more compact CNNs. Specifically, we transform rank constraint into a differentiable one and impose its adaptive violation-aware penalty on filters. Moreover, this paper is the first work to integrate the learning-based decomposition and group decomposition to make a better trade-off, especially for the tough task of compression of 1×1 convolution.The obtained low-rank model can be easily decomposed while nearly keeping the full accuracy without additional fine-tuning process. The effectiveness is verified by compression experiments of VGG and ResNet on CIFAR-10 and ILSVRC-2012. Our method can reduce about 65% parameters of ResNet-110 with 0.04% Top-1 accuracy drop on CIFAR-10, and reduce about 60% parameters of ResNet-50 with 0.57% Top-1 accuracy drop on ILSVRC-2012.
低秩张量分解是卷积神经网络压缩的一种常用策略。现有的基于学习的分解方法在训练过程中通过滤波器成对力或核范数的正则化器来鼓励低秩滤波器权重。然而,这些方法都不能得到满意的低阶结构。我们提出了一种新的自适应秩惩罚方法来学习更紧凑的cnn。具体而言,我们将秩约束转化为可微约束,并对过滤器施加其自适应违例感知惩罚。此外,本文是第一个将基于学习的分解和分组分解相结合的工作,以更好地权衡,特别是对于1×1卷积压缩的艰巨任务。所得到的低秩模型可以很容易地分解,而无需额外的微调过程,几乎可以保持全部精度。在CIFAR-10和ILSVRC-2012上进行了VGG和ResNet压缩实验,验证了其有效性。该方法在CIFAR-10上可减少约65%的ResNet-110参数,Top-1精度下降0.04%;在ILSVRC-2012上可减少约60%的ResNet-50参数,Top-1精度下降0.57%。
{"title":"Learning-based Tensor Decomposition with Adaptive Rank Penalty for CNNs Compression","authors":"Deli Yu, Peipei Yang, Cheng-Lin Liu","doi":"10.1109/MIPR51284.2021.00014","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00014","url":null,"abstract":"Low-rank tensor decomposition is a widely-used strategy to compress convolutional neural networks (CNNs). Existing learning-based decomposition methods encourage low-rank filter weights via regularizer of filters’ pair-wise force or nuclear norm during training. However, these methods can not obtain the satisfactory low-rank structure. We propose a new method with an adaptive rank penalty to learn more compact CNNs. Specifically, we transform rank constraint into a differentiable one and impose its adaptive violation-aware penalty on filters. Moreover, this paper is the first work to integrate the learning-based decomposition and group decomposition to make a better trade-off, especially for the tough task of compression of 1×1 convolution.The obtained low-rank model can be easily decomposed while nearly keeping the full accuracy without additional fine-tuning process. The effectiveness is verified by compression experiments of VGG and ResNet on CIFAR-10 and ILSVRC-2012. Our method can reduce about 65% parameters of ResNet-110 with 0.04% Top-1 accuracy drop on CIFAR-10, and reduce about 60% parameters of ResNet-50 with 0.57% Top-1 accuracy drop on ILSVRC-2012.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128078325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Fact-checking Assistant System for Textual Documents* 文本文件事实核查助理系统*
Tomoya Furuta, Yumiko Suzuki
This paper proposes a system for identifying which parts of textual documents the editors should do fact-checking. Using our system, we can reduce editors’ time and efforts by identifying descriptions that need fact-checking. To accomplish this purpose, we construct a machine-learning-based classifier of sentences, which classifies a part of documents into four classes: according to the necessity of fact-checking. We assume that there are typical descriptions which contain misinformation. Therefore, if we collect the documents and their revised documents, and labels whether their revisions are corrections or not, we can construct the classifier by learning the dataset. To construct this classifier, we build a dataset that includes a set of sentences which are revised more than once, from Wikipedia edit history. The labels indicate the degree of sentence corrections by editors. We develop a Web-based system for demonstrating our proposed approach. When we input texts, the system predicts which parts of the texts the editors should re-confirm the facts.
本文提出了一个系统,用于识别编辑应该对文本文件的哪些部分进行事实核查。使用我们的系统,我们可以通过识别需要事实核查的描述来减少编辑的时间和努力。为了实现这一目的,我们构建了一个基于机器学习的句子分类器,它将部分文档分为四类:根据事实检查的必要性。我们假设有包含错误信息的典型描述。因此,如果我们收集文档及其修订文档,并标记其修订是否为更正,我们可以通过学习数据集来构建分类器。为了构建这个分类器,我们建立了一个数据集,其中包括一组来自维基百科编辑历史的多次修改的句子。标签表示编辑对句子的修改程度。我们开发了一个基于web的系统来演示我们提出的方法。当我们输入文本时,系统会预测编辑应该重新确认文本的哪些部分。
{"title":"A Fact-checking Assistant System for Textual Documents*","authors":"Tomoya Furuta, Yumiko Suzuki","doi":"10.1109/MIPR51284.2021.00046","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00046","url":null,"abstract":"This paper proposes a system for identifying which parts of textual documents the editors should do fact-checking. Using our system, we can reduce editors’ time and efforts by identifying descriptions that need fact-checking. To accomplish this purpose, we construct a machine-learning-based classifier of sentences, which classifies a part of documents into four classes: according to the necessity of fact-checking. We assume that there are typical descriptions which contain misinformation. Therefore, if we collect the documents and their revised documents, and labels whether their revisions are corrections or not, we can construct the classifier by learning the dataset. To construct this classifier, we build a dataset that includes a set of sentences which are revised more than once, from Wikipedia edit history. The labels indicate the degree of sentence corrections by editors. We develop a Web-based system for demonstrating our proposed approach. When we input texts, the system predicts which parts of the texts the editors should re-confirm the facts.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123557809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1