首页 > 最新文献

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

英文 中文
Are Theme Songs Usable for Anime Retrieval? 动画检索是否可以使用主题曲?
Naoto Homma, Aiko Uemura, Tetsuro Kitahara
Japanese anime is well known as a multimedia-related popular culture but its retrieval techniques have not been fully developed. In this paper, we made an attempt of similarity-based retrieval of anime works using their theme songs. We hypothesized that similar anime works have similar theme songs because the atmosphere of anime works may be reflected in their theme songs. Under this hypothesis, we measured the audio-based and lyrics-based similarity among theme songs to search for anime works similar to a given query work. Experimental results show that the anime retrieval with audio-based and lyrics-based theme song similarity succeeded with average accuracy of 63% and 66%, respectively. This accuracy is higher than random selection, even though it is lower than an upper-bound accuracy based on manually prepared summary texts.
日本动漫是众所周知的多媒体相关的流行文化,但其检索技术还没有完全发展起来。在本文中,我们尝试了基于相似度的动漫作品主题歌检索。我们假设相似的动漫作品有相似的主题曲,因为动漫作品的氛围可能会在主题曲中体现出来。在此假设下,我们测量了主题曲之间基于音频和基于歌词的相似性,以搜索与给定查询作品相似的动漫作品。实验结果表明,基于音频和基于歌词的动画相似度检索成功,平均准确率分别为63%和66%。这种准确性高于随机选择,尽管它低于基于手动准备的摘要文本的上限准确性。
{"title":"Are Theme Songs Usable for Anime Retrieval?","authors":"Naoto Homma, Aiko Uemura, Tetsuro Kitahara","doi":"10.1109/MIPR51284.2021.00042","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00042","url":null,"abstract":"Japanese anime is well known as a multimedia-related popular culture but its retrieval techniques have not been fully developed. In this paper, we made an attempt of similarity-based retrieval of anime works using their theme songs. We hypothesized that similar anime works have similar theme songs because the atmosphere of anime works may be reflected in their theme songs. Under this hypothesis, we measured the audio-based and lyrics-based similarity among theme songs to search for anime works similar to a given query work. Experimental results show that the anime retrieval with audio-based and lyrics-based theme song similarity succeeded with average accuracy of 63% and 66%, respectively. This accuracy is higher than random selection, even though it is lower than an upper-bound accuracy based on manually prepared summary texts.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122102556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FPX-G: First Person Exploration for Graph FPX-G:第一人称探索图
Takahiro Komamizu, Shoi Ito, Yasuhiro Ogawa, K. Toyama
Data exploration is a fundamental user task in the information seeking process. In data exploration, users have ambiguous information needs, and they traverse across the data for gathering information. In this paper, a novel data exploration system, called FPX-G, is proposed that uses virtual reality (VR) technology. VR-based data exploration (or immersive analytics) is a recent trend in data analytics, and the existing work approaches involve aggregated information in an interactive and 3D manner. However, exploration for individual pieces of data scarcely has been approached. Traditional data exploration is done on 2D displays, therefore space is limited, and there is no depth. FPX-G fully utilizes 3D space to make individual piece of data visible in the user’s line of sight. In this paper, the data structure in FPX-G is designed as a graph, and the data exploration process is modeled as graph traversal. To utilize the capability of VR, FPX-G provides a first person view-based interface from which users can look at individual pieces of data and can walk through the data (like walking in a library). In addition to the walking mechanism, to deal with limited physical space in a room, FPX-G introduces eye-tracking technology for traversing data through a graph. A simulation-based evaluation reveals that FPX-G provides a significantly efficient interface for exploring data compared with the traditional 2D interface.
数据挖掘是信息搜索过程中的一项基本用户任务。在数据探索中,用户有模糊的信息需求,他们遍历数据以收集信息。本文提出了一种基于虚拟现实(VR)技术的新型数据探索系统FPX-G。基于vr的数据探索(或沉浸式分析)是数据分析的最新趋势,现有的工作方法涉及以交互式和3D方式汇总信息。然而,对单个数据块的探索几乎没有接触过。传统的数据探索是在2D显示器上完成的,因此空间有限,而且没有深度。FPX-G充分利用3D空间,使单个数据片段在用户的视线中可见。本文将FPX-G中的数据结构设计为图,将数据探索过程建模为图遍历。为了利用VR的功能,FPX-G提供了一个基于第一人称视角的界面,用户可以从中查看单个数据块,并可以在数据中行走(就像在图书馆中行走一样)。除了行走机制外,为了处理房间中有限的物理空间,FPX-G还引入了眼球追踪技术,用于通过图形遍历数据。基于仿真的评估表明,与传统的2D界面相比,FPX-G提供了一个非常有效的数据探索界面。
{"title":"FPX-G: First Person Exploration for Graph","authors":"Takahiro Komamizu, Shoi Ito, Yasuhiro Ogawa, K. Toyama","doi":"10.1109/MIPR51284.2021.00018","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00018","url":null,"abstract":"Data exploration is a fundamental user task in the information seeking process. In data exploration, users have ambiguous information needs, and they traverse across the data for gathering information. In this paper, a novel data exploration system, called FPX-G, is proposed that uses virtual reality (VR) technology. VR-based data exploration (or immersive analytics) is a recent trend in data analytics, and the existing work approaches involve aggregated information in an interactive and 3D manner. However, exploration for individual pieces of data scarcely has been approached. Traditional data exploration is done on 2D displays, therefore space is limited, and there is no depth. FPX-G fully utilizes 3D space to make individual piece of data visible in the user’s line of sight. In this paper, the data structure in FPX-G is designed as a graph, and the data exploration process is modeled as graph traversal. To utilize the capability of VR, FPX-G provides a first person view-based interface from which users can look at individual pieces of data and can walk through the data (like walking in a library). In addition to the walking mechanism, to deal with limited physical space in a room, FPX-G introduces eye-tracking technology for traversing data through a graph. A simulation-based evaluation reveals that FPX-G provides a significantly efficient interface for exploring data compared with the traditional 2D interface.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130351851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Topic Detection for Video Stream based on Geographical Relationships and its Interactive Viewing System 基于地理关系的视频流主题检测及其交互观看系统
Itsuki Hashimoto, Yuanyuan Wang, Yukiko Kawai, K. Sumiya
During the recent years of Internet TV spread, researches on recommending relevant information for TV programs have been actively conducted. The NHK’s Hybridcast provides a service that recommends relevant information on the same screen during the broadcast of a TV program. However, there is currently no service for recommending supplementary information based on the users’ viewing behavior. Based on this research background, we first extract geographic words (location names) and topics of each scene using closed captions of TV programs. Next, we analyze the user’s viewing behavior to extract the scenes selected by the user in the sequence. After that, we can detect the topics of the user’s selected scenes. Therefore, the supplementary information is recommended by generating queries based on geographical relationships using geographical words and topics. In this paper, we discuss our proposed system for supporting interactive viewing of TV programs, which is based on the viewing behavior of users and geographic relationships.
近年来,随着网络电视的普及,有关电视节目信息推荐的研究也在积极开展。NHK的Hybridcast提供了在电视节目播放期间在同一屏幕上推荐相关信息的服务。但是,目前还没有根据用户的观看行为推荐补充信息的服务。基于此研究背景,我们首先使用电视节目的封闭字幕提取每个场景的地理词(位置名称)和主题。接下来,我们分析用户的观看行为,提取用户在序列中选择的场景。之后,我们可以检测用户所选择场景的主题。因此,建议使用地理词和地理主题生成基于地理关系的查询来补充信息。在本文中,我们讨论了我们提出的基于用户观看行为和地理关系的交互式电视节目观看系统。
{"title":"Topic Detection for Video Stream based on Geographical Relationships and its Interactive Viewing System","authors":"Itsuki Hashimoto, Yuanyuan Wang, Yukiko Kawai, K. Sumiya","doi":"10.1109/MIPR51284.2021.00012","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00012","url":null,"abstract":"During the recent years of Internet TV spread, researches on recommending relevant information for TV programs have been actively conducted. The NHK’s Hybridcast provides a service that recommends relevant information on the same screen during the broadcast of a TV program. However, there is currently no service for recommending supplementary information based on the users’ viewing behavior. Based on this research background, we first extract geographic words (location names) and topics of each scene using closed captions of TV programs. Next, we analyze the user’s viewing behavior to extract the scenes selected by the user in the sequence. After that, we can detect the topics of the user’s selected scenes. Therefore, the supplementary information is recommended by generating queries based on geographical relationships using geographical words and topics. In this paper, we discuss our proposed system for supporting interactive viewing of TV programs, which is based on the viewing behavior of users and geographic relationships.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128080820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Exploring the Spatial-Visual Locality of Geo-tagged Urban Street Images 城市街道图像地理标记的空间视觉局部性研究
Abdullah Alfarrarjeh, Xiao Yang, A. A. Jabal, S. H. Kim, C. Shahabi
Urban street images have a unique property as they capture visual scenes that are distinctive to their geo-graphical regions. Such images are similar to their neighboring ones while dissimilar to faraway images. We refer to this characteristic of images as the spatial visual locality or the spatial locality of similar visual features. This study focuses on geo-tagged urban street images and hypothesizes that those images demonstrate a local similarity in a certain region but a dissimilarity across different regions, and provides different analysis methods to validate the hypothesis. The paper also evaluates the correctness of the hypothesis using three real geo-tagged street images collected from the Google Street View. Our experimental results demonstrate a high locality of similar visual features among urban street images.
城市街道图像具有独特的属性,因为它们捕捉到与地理区域不同的视觉场景。这些图像与邻近的图像相似,而与远处的图像不同。我们将图像的这种特征称为空间视觉局部性或相似视觉特征的空间局部性。本研究以地理标记的城市街道图像为研究对象,假设这些图像在某一区域表现出局部相似性,但在不同区域表现出不相似性,并提供不同的分析方法来验证这一假设。本文还使用从谷歌街景中收集的三张真实的地理标记街道图像来评估假设的正确性。我们的实验结果表明,在城市街道图像中,相似的视觉特征具有很高的局部性。
{"title":"Exploring the Spatial-Visual Locality of Geo-tagged Urban Street Images","authors":"Abdullah Alfarrarjeh, Xiao Yang, A. A. Jabal, S. H. Kim, C. Shahabi","doi":"10.1109/MIPR51284.2021.00023","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00023","url":null,"abstract":"Urban street images have a unique property as they capture visual scenes that are distinctive to their geo-graphical regions. Such images are similar to their neighboring ones while dissimilar to faraway images. We refer to this characteristic of images as the spatial visual locality or the spatial locality of similar visual features. This study focuses on geo-tagged urban street images and hypothesizes that those images demonstrate a local similarity in a certain region but a dissimilarity across different regions, and provides different analysis methods to validate the hypothesis. The paper also evaluates the correctness of the hypothesis using three real geo-tagged street images collected from the Google Street View. Our experimental results demonstrate a high locality of similar visual features among urban street images.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115635499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dance to Music: Generative Choreography with Music using Mixture Density Networks 随着音乐起舞:使用混合密度网络的音乐生成编舞
Rongfeng Li, Meng Zhao, Xianlin Zhang, Xueming Li
Choreography is usually done by professional choreographers, while the development of motion capture technology and artificial intelligence has made it possible for computers to choreograph with music. There are two main challenges in choreography: 1) how to get real and novel dance moves without relying on motion capture and manual production, and 2) how to use the appropriate music and motion features and matching algorithms to enhance the synchronization of music and dance. Focusing on these two targets above, we propose a framework based on Mixture Density Network (MDN) to synthesis dances that match the target music. The framework includes three steps: motion generation, motion screening and feature matching. In order to make the dance movements generated by the model applicable for choreography with music, we propose a parameter control algorithm and a coherence-based motion screening algorithm to improve the consistency of dance movements. Moreover, to achieve better unity of music and motions, we propose a multi-level music and motion feature matching algorithm, which combines global feature matching with local feature matching. Finally, our framework proved to be able to synthesis more coherent and creative choreography with music.
舞蹈编排通常由专业的编舞家完成,而动作捕捉技术和人工智能的发展使得计算机配合音乐编舞成为可能。编舞面临的主要挑战有两个:1)如何在不依赖动作捕捉和手工制作的情况下获得真实新颖的舞蹈动作;2)如何使用合适的音乐和动作特征以及匹配算法来增强音乐和舞蹈的同步性。针对上述两个目标,我们提出了一个基于混合密度网络(MDN)的框架来合成与目标音乐匹配的舞蹈。该框架包括三个步骤:运动生成、运动筛选和特征匹配。为了使模型生成的舞蹈动作适用于有音乐的编舞,我们提出了参数控制算法和基于相干的动作筛选算法来提高舞蹈动作的一致性。此外,为了更好地实现音乐与动作的统一,我们提出了一种将全局特征匹配与局部特征匹配相结合的多层次音乐与动作特征匹配算法。最后,我们的框架被证明能够将更连贯和创造性的舞蹈与音乐结合起来。
{"title":"Dance to Music: Generative Choreography with Music using Mixture Density Networks","authors":"Rongfeng Li, Meng Zhao, Xianlin Zhang, Xueming Li","doi":"10.1109/MIPR51284.2021.00065","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00065","url":null,"abstract":"Choreography is usually done by professional choreographers, while the development of motion capture technology and artificial intelligence has made it possible for computers to choreograph with music. There are two main challenges in choreography: 1) how to get real and novel dance moves without relying on motion capture and manual production, and 2) how to use the appropriate music and motion features and matching algorithms to enhance the synchronization of music and dance. Focusing on these two targets above, we propose a framework based on Mixture Density Network (MDN) to synthesis dances that match the target music. The framework includes three steps: motion generation, motion screening and feature matching. In order to make the dance movements generated by the model applicable for choreography with music, we propose a parameter control algorithm and a coherence-based motion screening algorithm to improve the consistency of dance movements. Moreover, to achieve better unity of music and motions, we propose a multi-level music and motion feature matching algorithm, which combines global feature matching with local feature matching. Finally, our framework proved to be able to synthesis more coherent and creative choreography with music.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115766441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented Tai-Chi Chuan Practice Tool with Pose Evaluation 增强太极拳练习工具与姿势评估
Y. Jan, Kuan-Wei Tseng, Peng-Yuan Kao, Y. Hung
Tai Chi Chuan (TCC) is a well-known Chinese martial art that promotes health. In addition to learning TCC from a coach in a classroom setting, learners usually use books or videos to practice on their own. However, since turning is a frequent movement in TCC, learners cannot watch a tutorial and practice TCC at the same time. Furthermore, it is difficult for users to determine whether their postures are correct. We propose an augmented reality TCC practice tool with pose evaluation to help people practice TCC on their own. The tool consists of an optical see-through head-mounted display, external cameras, digital compasses, and a server. Users learn TCC movements from surrounding virtual coaches in augmented reality and determine whether their postures are correct via an evaluation module. Study results show that the proposed tool provides a helpful learning environment for TCC and that the pose estimation and evaluation are robust and reliable.
太极拳(TCC)是中国著名的促进健康的武术。除了在课堂上向教练学习TCC之外,学习者通常还会使用书籍或视频来自己练习。然而,由于转身是TCC中频繁出现的动作,学习者不能一边看教程一边练习TCC。此外,用户很难判断自己的姿势是否正确。我们提出了一种带有姿态评估的增强现实TCC练习工具,以帮助人们自己练习TCC。该工具由光学透明头戴式显示器、外部摄像头、数字罗盘和服务器组成。用户在增强现实中从周围的虚拟教练那里学习TCC动作,并通过评估模块确定他们的姿势是否正确。研究结果表明,该工具为TCC提供了一个有益的学习环境,姿态估计和评估具有鲁棒性和可靠性。
{"title":"Augmented Tai-Chi Chuan Practice Tool with Pose Evaluation","authors":"Y. Jan, Kuan-Wei Tseng, Peng-Yuan Kao, Y. Hung","doi":"10.1109/MIPR51284.2021.00013","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00013","url":null,"abstract":"Tai Chi Chuan (TCC) is a well-known Chinese martial art that promotes health. In addition to learning TCC from a coach in a classroom setting, learners usually use books or videos to practice on their own. However, since turning is a frequent movement in TCC, learners cannot watch a tutorial and practice TCC at the same time. Furthermore, it is difficult for users to determine whether their postures are correct. We propose an augmented reality TCC practice tool with pose evaluation to help people practice TCC on their own. The tool consists of an optical see-through head-mounted display, external cameras, digital compasses, and a server. Users learn TCC movements from surrounding virtual coaches in augmented reality and determine whether their postures are correct via an evaluation module. Study results show that the proposed tool provides a helpful learning environment for TCC and that the pose estimation and evaluation are robust and reliable.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115003978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multimodal Machine Translation Enhancement by Fusing Multimodal-attention and Fine-grained Image Features 融合多模态注意和细粒度图像特征的多模态机器翻译增强
Lin Li, Turghun Tayir
With recent development of the multimodal machine translation (MMT) network architectures, recurrent models have effectively been replaced by attention mechanism and the translation results have been enhanced with the assistance of fine-grained image information. Although attention is a powerful and ubiquitous mechanism, different number of attention heads and granularity image features aligned by attention have an impact on the quality of multimodal machine translation. In order to address above problems, this paper proposes a multimodal machine translation enhancement by fusing multimodal-attention and fine-grained image features method which builds some submodels by introducing different granularity of image features to the multimodal-attention mechanism with different number of heads. Moreover, these sub-models are randomly fused and fusion models are obtained. The experimental results on the Multi30k dataset that the pruned attention heads lead to the improvement of translation results. Finally, our fusion model obtained the best results according to the automatic evaluation metrics BLEU compared with sub-models and some baselines.
近年来,随着多模态机器翻译(MMT)网络体系结构的发展,循环模型被注意机制有效地取代,并且在细粒度图像信息的辅助下,翻译结果得到了提高。虽然注意是一种强大而普遍的机制,但不同数量的注意头和由注意对齐的粒度图像特征会影响多模态机器翻译的质量。针对以上问题,本文提出了一种融合多模态注意和细粒度图像特征的多模态机器翻译增强方法,该方法通过将不同粒度的图像特征引入到不同头数的多模态注意机制中,构建一些子模型。然后对这些子模型进行随机融合,得到融合模型。在Multi30k数据集上的实验结果表明,注意头的修剪可以提高翻译结果。最后,与子模型和一些基线相比,我们的融合模型在自动评价指标BLEU上获得了最好的结果。
{"title":"Multimodal Machine Translation Enhancement by Fusing Multimodal-attention and Fine-grained Image Features","authors":"Lin Li, Turghun Tayir","doi":"10.1109/MIPR51284.2021.00050","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00050","url":null,"abstract":"With recent development of the multimodal machine translation (MMT) network architectures, recurrent models have effectively been replaced by attention mechanism and the translation results have been enhanced with the assistance of fine-grained image information. Although attention is a powerful and ubiquitous mechanism, different number of attention heads and granularity image features aligned by attention have an impact on the quality of multimodal machine translation. In order to address above problems, this paper proposes a multimodal machine translation enhancement by fusing multimodal-attention and fine-grained image features method which builds some submodels by introducing different granularity of image features to the multimodal-attention mechanism with different number of heads. Moreover, these sub-models are randomly fused and fusion models are obtained. The experimental results on the Multi30k dataset that the pruned attention heads lead to the improvement of translation results. Finally, our fusion model obtained the best results according to the automatic evaluation metrics BLEU compared with sub-models and some baselines.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126693184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Layout Structure Assisted Indoor Image Generation 布局结构辅助室内图像生成
Zhijie Qin, Wei Zhong, Fei Hu, Xinyan Yang, Long Ye, Qin Zhang
The existing methods can generate images in accord with scene graph, but the obtained images may appear blurs at the edges and disorders in the structure, due to the lacks of the structure information. In this paper, by considering the indoor images contain more layout structures than outdoor ones, we focus on the indoor image generation assisted with the layout structures. In the proposed method, through fusing the scene graph features together with the layout structure, the graph convolutional network is employed to convert the fused semantic information into the feature representation of scenes. Subsequently, a refined encoder-decoder network is also used for generating the final images. In the experiments, we compare the proposed method with the existing works on the indoor image dataset in terms of subjective and objective evaluations. The experimental results show that our method can achieve better IoU metric, and the visualized results also illustrate that the proposed approach can generate more clear indoor images with better layout structures.
现有的方法可以生成符合场景图的图像,但由于缺乏结构信息,得到的图像可能出现边缘模糊和结构混乱的情况。考虑到室内图像比室外图像包含更多的布局结构,本文重点研究了在布局结构辅助下的室内图像生成。该方法通过将场景图特征与布局结构融合,利用图卷积网络将融合后的语义信息转化为场景的特征表示。随后,还使用改进的编码器-解码器网络来生成最终图像。在实验中,我们将所提出的方法与现有的室内图像数据集的主观和客观评价进行了比较。实验结果表明,该方法可以获得更好的IoU度量,可视化结果也表明,该方法可以生成更清晰的室内图像和更好的布局结构。
{"title":"Layout Structure Assisted Indoor Image Generation","authors":"Zhijie Qin, Wei Zhong, Fei Hu, Xinyan Yang, Long Ye, Qin Zhang","doi":"10.1109/MIPR51284.2021.00061","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00061","url":null,"abstract":"The existing methods can generate images in accord with scene graph, but the obtained images may appear blurs at the edges and disorders in the structure, due to the lacks of the structure information. In this paper, by considering the indoor images contain more layout structures than outdoor ones, we focus on the indoor image generation assisted with the layout structures. In the proposed method, through fusing the scene graph features together with the layout structure, the graph convolutional network is employed to convert the fused semantic information into the feature representation of scenes. Subsequently, a refined encoder-decoder network is also used for generating the final images. In the experiments, we compare the proposed method with the existing works on the indoor image dataset in terms of subjective and objective evaluations. The experimental results show that our method can achieve better IoU metric, and the visualized results also illustrate that the proposed approach can generate more clear indoor images with better layout structures.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125900744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Style Transfer With Decorative Elements 文字风格转移与装饰元素
Yuting Ma, Fan Tang, Weiming Dong, Changsheng Xu
The text rendered by special effects can give a rich visual experience. Text stylization can help users migrate their favorite styles to specified texts, improving production efficiency and saving design cost. This paper proposes a novel text stylization framework, which can transfer mixed text styles, including font glyph and fine decorations, to user-specified texts. The transfer of decorative elements is difficult due to the text is obscured by decorative elements to a certain extent. Our method is divided into three stages: firstly, the position of decorative elements in the image is extracted and retained; secondly, the effects of font glyph and textures other than decorative elements are migrated; finally, a structure-aware strategy is used to reorganize the decorative elements to complete the entire stylization process. Experiments on open source text data sets demonstrated the advantages of our approach over other state- of-the-art style migration methods.
通过特效渲染的文字可以给人丰富的视觉体验。文本样式化可以帮助用户将自己喜欢的样式迁移到指定的文本中,提高制作效率,节约设计成本。本文提出了一种新的文本样式化框架,该框架可以将混合文本样式(包括字体字形和精细装饰)转移到用户指定的文本中。由于装饰元素在一定程度上遮蔽了文本,使得装饰元素的传递变得困难。我们的方法分为三个阶段:首先提取和保留图像中装饰元素的位置;其次,对除装饰元素外的字体字形和纹理效果进行迁移;最后,使用结构感知策略来重新组织装饰元素,以完成整个风格化过程。在开源文本数据集上的实验证明了我们的方法优于其他最先进的风格迁移方法。
{"title":"Text Style Transfer With Decorative Elements","authors":"Yuting Ma, Fan Tang, Weiming Dong, Changsheng Xu","doi":"10.1109/MIPR51284.2021.00062","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00062","url":null,"abstract":"The text rendered by special effects can give a rich visual experience. Text stylization can help users migrate their favorite styles to specified texts, improving production efficiency and saving design cost. This paper proposes a novel text stylization framework, which can transfer mixed text styles, including font glyph and fine decorations, to user-specified texts. The transfer of decorative elements is difficult due to the text is obscured by decorative elements to a certain extent. Our method is divided into three stages: firstly, the position of decorative elements in the image is extracted and retained; secondly, the effects of font glyph and textures other than decorative elements are migrated; finally, a structure-aware strategy is used to reorganize the decorative elements to complete the entire stylization process. Experiments on open source text data sets demonstrated the advantages of our approach over other state- of-the-art style migration methods.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126036887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Clustering Trajectories via Sparse Auto-encoders 基于稀疏自编码器的聚类轨迹
Xiaofeng Wu, Rui Zhang, Lin Li
With the development of satellite navigation, communication and positioning technology, more and more trajectory data are collected and stored. Exploring such trajectory data can help us understand human mobility. A typical task of group-level mobility modeling is trajectory clustering. However, trajectories usually vary in length and shape, also contain noises. These exert a negative influence on trajectory representation and thus hinder trajectory clustering. Therefore, this paper proposes a U-type robust sparse autoencoder model(uRSAA), which is robust against noise and form variety. Specifically, a sparsity penalty is applied to constrain the output to decrease the effect of noise. By introducing skip connections, our model can strengthen the data exchange and preserve the information. Experiments are conducted on both synthetic datasets and real datasets, and the results show that our model outperforms the existing models.
随着卫星导航、通信和定位技术的发展,越来越多的轨道数据被收集和存储。探索这些轨迹数据可以帮助我们理解人类的流动性。群体层级迁移建模的一个典型任务是轨迹聚类。然而,轨迹通常在长度和形状上变化,也包含噪声。这对轨迹表示产生了负面影响,从而阻碍了轨迹聚类。因此,本文提出了一种u型鲁棒稀疏自编码器模型(uRSAA),该模型对噪声和格式变化具有鲁棒性。具体来说,应用稀疏性惩罚来约束输出以减少噪声的影响。该模型通过引入跳接,加强了数据交换和信息保存。在合成数据集和真实数据集上进行了实验,结果表明我们的模型优于现有的模型。
{"title":"Clustering Trajectories via Sparse Auto-encoders","authors":"Xiaofeng Wu, Rui Zhang, Lin Li","doi":"10.1109/MIPR51284.2021.00049","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00049","url":null,"abstract":"With the development of satellite navigation, communication and positioning technology, more and more trajectory data are collected and stored. Exploring such trajectory data can help us understand human mobility. A typical task of group-level mobility modeling is trajectory clustering. However, trajectories usually vary in length and shape, also contain noises. These exert a negative influence on trajectory representation and thus hinder trajectory clustering. Therefore, this paper proposes a U-type robust sparse autoencoder model(uRSAA), which is robust against noise and form variety. Specifically, a sparsity penalty is applied to constrain the output to decrease the effect of noise. By introducing skip connections, our model can strengthen the data exchange and preserve the information. Experiments are conducted on both synthetic datasets and real datasets, and the results show that our model outperforms the existing models.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128426284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1