2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

英文中文

Exploring the Application of AI-generated Artworks for the Study of Aesthetic Processing 探索人工智能艺术作品在审美加工研究中的应用

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00073

Vanessa Utz, S. DiPaola

In this paper we outline the need for increased control over the stimuli that are used within the field of empirical aesthetics. Since artworks are highly complex stimuli and traditional man-made artworks vary across many different dimensions (such as color palette, subject matter, style) it is difficult to isolate the effect a single variable has on the aesthetic processing that occurs in a viewer. We therefore propose to explore the use of computer-generated artworks as stimuli instead due to the high degree of control that experimenters have over the generated output. We describe how computational creativity systems work by outlining our own cognitive based multi-module AI system, and then discuss the benefits of these systems as well as some preliminary work in this space. We conclude the paper by addressing the limitation of reduced ecological validity.

在本文中，我们概述了需要增加控制的刺激是在经验美学领域内使用。由于艺术品是高度复杂的刺激，而传统的人造艺术品在许多不同的维度上(如调色板、主题、风格)各不相同，因此很难将单个变量对观众审美过程的影响隔离开来。因此，我们建议探索使用计算机生成的艺术品作为刺激，因为实验者对生成的输出具有高度的控制。我们通过概述我们自己的基于认知的多模块AI系统来描述计算创造力系统是如何工作的，然后讨论这些系统的好处以及该领域的一些初步工作。我们通过解决降低生态效度的局限性来结束本文。

引用次数: 2

A Manifold Semantic Canonical Correlation Framework for Effective Feature Fusion 一种有效特征融合的流形语义典型相关框架

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00010

Zheng Guo, Lei Gao, L. Guan

In this paper, we present a manifold semantic canonical correlation (MSCC) framework with application to feature fusion. In the proposed framework, a manifold method is first employed to preserve the local structural information of multi-view feature spaces. Afterwards, a semantic canonical correlation algorithm is integrated with the manifold method to accomplish the task of feature fusion. Since the semantic canonical correlation algorithm is capable of measuring the global correlation across multiple variables, both the local structural information and the global correlation are incorporated into the proposed framework, resulting in a new feature representation of high quality. To demonstrate the effectiveness and the generality of the proposed solution, we conduct experiments on audio emotion recognition and object recognition by utilizing classic and deep neural network (DNN) based features, respectively. Experimental results show the superiority of the proposed solution on feature fusion.

本文提出了一种用于特征融合的流形语义典型相关框架。在该框架中，首先采用流形方法保存多视图特征空间的局部结构信息;然后，将语义典型相关算法与流形方法相结合，完成特征融合。由于语义典型相关算法能够测量多个变量之间的全局相关性，因此将局部结构信息和全局相关性都纳入到该框架中，从而获得了高质量的新特征表示。为了验证所提解决方案的有效性和通用性，我们分别利用经典和深度神经网络(DNN)特征对音频情感识别和对象识别进行了实验。实验结果表明了该方法在特征融合方面的优越性。

引用次数: 0

Distinguishing the "strong/weak" in the 60 Jingfang tones and their optimal distribution 辨析60个经房调中的“强/弱”及其最佳分布

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00063

Gen-Fang Chen

This paper first discusses the representation of infinite decimals, and then compares and analyzes the two different distributions of "strong/weak" of the 60 Jingfang tones, respectively, from Book of the Later Han and the work of Chen Yingshi, Interpreting of 60 Jingfang Tones. The paper asserts that the "strong/weak" distribution is based on the tuning system of the "Three-scale Rise/Fall Tuning" and constructs the optimal "strong/weak" distribution of infinite decimals according to the "Three-scale Rise/Fall Tuning" by using the least square method. Finally, it obtains the optimal distribution of "strong/weak" of the 60 Jingfang tones by using a dynamic planning algorithm of artificial intelligence.

本文首先讨论了无限小数的表示，然后分别比较分析了《后汉书》和陈英时《六十个经房音解译》中六十个经房音的强弱两种不同的分布。本文认为“强/弱”分布是基于“三尺度升/落调音”的调音系统，并利用最小二乘法根据“三尺度升/落调音”构造了无限小数的最优“强/弱”分布。最后，利用人工智能动态规划算法得到60个景方音的“强/弱”最优分布。

引用次数: 1

XM2A: Multi-Scale Multi-Head Attention with Cross-Talk for Multi-Variate Time Series Analysis 基于串扰的多尺度多头注意力多变量时间序列分析

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00030

Yash Garg, K. Candan

Advances in sensory technologies are enabling the capture of a diverse spectrum of real-world data streams. In-creasing availability of such data, especially in the form of multi-variate time series, allows for new opportunities for applications that rely on identifying and leveraging complex temporal patterns A particular challenge such algorithms face is that complex patterns consist of multiple simpler patterns of varying scales (temporal length). While several recent works (such as multi-head attention networks) recognized the fact complex patterns need to be understood in the form of multiple simpler patterns, we note that existing works lack the ability of represent the interactions across these constituting patterns. To tackle this limitation, in this paper, we propose a novel Multi-scale Multi-head Attention with Cross-Talk (XM2A) framework designed to represent multi-scale patterns that make up a complex pattern by configuring each attention head to learn a pattern at a particular scale and accounting for the co-existence of patterns at multiple scales through a cross-talking mechanism among the heads. Experiments show that XM2A outperforms state-of-the-art attention mechanisms, such as Transformer and MSMSA, on benchmark datasets, such as SADD, AUSLAN, and MOCAP.

传感技术的进步使人们能够捕捉到各种真实世界的数据流。这类数据的可用性日益增加，特别是以多变量时间序列的形式，为依赖于识别和利用复杂时间模式的应用程序提供了新的机会。这类算法面临的一个特殊挑战是，复杂模式由多个不同尺度(时间长度)的更简单模式组成。虽然最近的一些研究(如多头注意力网络)认识到复杂模式需要以多个更简单模式的形式来理解，但我们注意到，现有的研究缺乏表达这些构成模式之间相互作用的能力。为了解决这一限制，本文提出了一种新的多尺度多头注意串扰(XM2A)框架，通过配置每个注意头在特定尺度上学习模式，并通过头之间的串扰机制考虑模式在多个尺度上的共存，该框架旨在表示构成复杂模式的多尺度模式。实验表明，XM2A在基准数据集(如SADD、AUSLAN和MOCAP)上优于最先进的注意力机制(如Transformer和MSMSA)。

{"title":"XM2A: Multi-Scale Multi-Head Attention with Cross-Talk for Multi-Variate Time Series Analysis","authors":"Yash Garg, K. Candan","doi":"10.1109/MIPR51284.2021.00030","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00030","url":null,"abstract":"Advances in sensory technologies are enabling the capture of a diverse spectrum of real-world data streams. In-creasing availability of such data, especially in the form of multi-variate time series, allows for new opportunities for applications that rely on identifying and leveraging complex temporal patterns A particular challenge such algorithms face is that complex patterns consist of multiple simpler patterns of varying scales (temporal length). While several recent works (such as multi-head attention networks) recognized the fact complex patterns need to be understood in the form of multiple simpler patterns, we note that existing works lack the ability of represent the interactions across these constituting patterns. To tackle this limitation, in this paper, we propose a novel Multi-scale Multi-head Attention with Cross-Talk (XM2A) framework designed to represent multi-scale patterns that make up a complex pattern by configuring each attention head to learn a pattern at a particular scale and accounting for the co-existence of patterns at multiple scales through a cross-talking mechanism among the heads. Experiments show that XM2A outperforms state-of-the-art attention mechanisms, such as Transformer and MSMSA, on benchmark datasets, such as SADD, AUSLAN, and MOCAP.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114759827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Stochastic Observation Prediction for Efficient Reinforcement Learning in Robotics 机器人中高效强化学习的随机观察预测

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00027

Shisheng Wang, Hideki Nakayama

Although the recent progress of deep learning has enabled reinforcement learning (RL) algorithms to achieve human-level performance in retro video games within a short training time, the application of real-world robotics remains limited. The conventional RL procedure requires agents to interact with the environment. Meanwhile, the interactions with the physical world can not be easily parallelized or accelerated as in other tasks. Moreover, the gap between the real world and simulation makes it harder to transfer the policy trained in simulators to physical robots. Thus, we propose a model-based method to mitigate the interaction overheads for real-world robotic tasks. In particular, our model incorporates an autoencoder, a recurrent network, and a generative network to make stochastic predictions of observations. We conduct the experiments on a collision avoidance task for disc-like robots and show that the generative model can serve as a virtual RL environment. Our method has the benefit of lower interaction overheads as inference of deep neural networks on GPUs is faster than observing the transitions in the real environment, and it can replace the real RL environment with limited rollout length.

尽管深度学习的最新进展使强化学习(RL)算法能够在短时间内在复古视频游戏中实现人类水平的表现，但现实世界机器人的应用仍然有限。传统的RL程序要求代理与环境相互作用。同时，与物理世界的交互不能像其他任务那样容易并行化或加速。此外，现实世界和模拟之间的差距使得将模拟器中训练的策略转移到物理机器人中变得更加困难。因此，我们提出了一种基于模型的方法来减轻现实世界机器人任务的交互开销。特别是，我们的模型结合了一个自动编码器、一个循环网络和一个生成网络来对观察结果进行随机预测。我们对圆盘机器人的避碰任务进行了实验，并表明生成模型可以作为虚拟强化学习环境。我们的方法具有较低的交互开销的优点，因为深度神经网络在gpu上的推理比在真实环境中观察转换要快，并且它可以在有限的rollout长度下取代真实的RL环境。

{"title":"Stochastic Observation Prediction for Efficient Reinforcement Learning in Robotics","authors":"Shisheng Wang, Hideki Nakayama","doi":"10.1109/MIPR51284.2021.00027","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00027","url":null,"abstract":"Although the recent progress of deep learning has enabled reinforcement learning (RL) algorithms to achieve human-level performance in retro video games within a short training time, the application of real-world robotics remains limited. The conventional RL procedure requires agents to interact with the environment. Meanwhile, the interactions with the physical world can not be easily parallelized or accelerated as in other tasks. Moreover, the gap between the real world and simulation makes it harder to transfer the policy trained in simulators to physical robots. Thus, we propose a model-based method to mitigate the interaction overheads for real-world robotic tasks. In particular, our model incorporates an autoencoder, a recurrent network, and a generative network to make stochastic predictions of observations. We conduct the experiments on a collision avoidance task for disc-like robots and show that the generative model can serve as a virtual RL environment. Our method has the benefit of lower interaction overheads as inference of deep neural networks on GPUs is faster than observing the transitions in the real environment, and it can replace the real RL environment with limited rollout length.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116441713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Interactive Cooking Support System for Short Recipe Videos based on User Browsing Behavior 基于用户浏览行为的短食谱视频交互烹饪支持系统

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00011

Takuya Yonezawa, Yuanyuan Wang, Yukiko Kawai, K. Sumiya

Recently, short recipe videos such as Kurashiru and DELISH KITCHEN have become popular. These short recipe videos can help people learn many cooking skills in a brief time. However, it is difficult for users to understand all cooking operations by viewing these videos only once. These short recipe videos do not consider users’ cooking skills (cooking levels) since anyone may view the same video. Therefore, in this work, we propose an interactive cooking support system for short recipe videos by extracting and weighting cooking operations for each cooking genre based on user browsing behavior. The system then recommends various supplementary recipe videos based on the weights of cooking operations and user browsing behavior. Also, the system provides a user interface, called Dynamic Video Tag Cloud for visualizing the supplementary recipe videos, and the supplementary recipe videos can be dynamically changed based on the user browsing behavior. As a result, users can intuitively and easily understand cooking operations suited to their cooking favorites. Finally, we verified the effectiveness of the weighting of cooking operations and discussed the usefulness of our proposed user interface using the SUS score.

最近，《Kurashiru》和《delicious KITCHEN》等烹饪视频短片很受欢迎。这些简短的食谱视频可以帮助人们在短时间内学习许多烹饪技巧。然而，用户很难通过只看一次这些视频来理解所有的烹饪操作。这些简短的食谱视频不考虑用户的烹饪技巧(烹饪水平)，因为任何人都可以观看相同的视频。因此，在这项工作中，我们提出了一种基于用户浏览行为提取每种烹饪类型的烹饪操作并对其进行加权的交互式烹饪短视频支持系统。然后，系统根据烹饪操作的权重和用户浏览行为推荐各种补充食谱视频。此外，系统还提供了一个名为动态视频标签云的用户界面，用于对补充食谱视频进行可视化，并且可以根据用户的浏览行为对补充食谱视频进行动态更改。因此，用户可以直观、轻松地了解适合自己烹饪喜好的烹饪操作。最后，我们验证了烹饪操作加权的有效性，并使用SUS评分讨论了我们建议的用户界面的有用性。

{"title":"An Interactive Cooking Support System for Short Recipe Videos based on User Browsing Behavior","authors":"Takuya Yonezawa, Yuanyuan Wang, Yukiko Kawai, K. Sumiya","doi":"10.1109/MIPR51284.2021.00011","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00011","url":null,"abstract":"Recently, short recipe videos such as Kurashiru and DELISH KITCHEN have become popular. These short recipe videos can help people learn many cooking skills in a brief time. However, it is difficult for users to understand all cooking operations by viewing these videos only once. These short recipe videos do not consider users’ cooking skills (cooking levels) since anyone may view the same video. Therefore, in this work, we propose an interactive cooking support system for short recipe videos by extracting and weighting cooking operations for each cooking genre based on user browsing behavior. The system then recommends various supplementary recipe videos based on the weights of cooking operations and user browsing behavior. Also, the system provides a user interface, called Dynamic Video Tag Cloud for visualizing the supplementary recipe videos, and the supplementary recipe videos can be dynamically changed based on the user browsing behavior. As a result, users can intuitively and easily understand cooking operations suited to their cooking favorites. Finally, we verified the effectiveness of the weighting of cooking operations and discussed the usefulness of our proposed user interface using the SUS score.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123182768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rethinking of Intangible Cultural Heritage Teaching with Creative Programming in China 中国非物质文化遗产创意规划教学的再思考

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00056

Peng Tan, Yi Ji, Yuqing Xu

Creative programming has become a mature mode of innovation in foreign countries, It provides a new way of intangible cultural heritage teaching. This paper will explore the application of visual programming tool (Scratch) in the teaching of intangible cultural heritage (Cantonese Porcelain) from three teaching modules. The research shows that this integrated teaching method can effectively stimulate participant’s interest and creative thinking in learning of intangible cultural heritage, and provide a new way of thinking and practical reference for the current innovative teaching of intangible cultural heritage.

创意规划在国外已成为一种成熟的创新模式，为非物质文化遗产教学提供了一条新的途径。本文将从三个教学模块入手，探讨可视化编程工具(Scratch)在非物质文化遗产(广东瓷器)教学中的应用。研究表明，这种整合教学方法能够有效激发参与者对非物质文化遗产学习的兴趣和创造性思维，为当前非物质文化遗产创新教学提供新的思维方式和实践借鉴。

引用次数: 0

A Hybrid Image Segmentation Approach for Thermal Barrier Coating Quality Assessments 热障涂层质量评价的混合图像分割方法

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00033

Zanyah Ailsworth, Wei-bang Chen, Yongjin Lu, Xiaoliang Wang, Melissa Tsui, H. Al-Ghaib, Ben Zimmerman

Thermal barrier coating, a widely used advanced manufacturing technique in various industries, provides thermal insulation and surface protection to a substrate by spraying melted coating materials on to the surface of the substrate. As the melted coating materials solidify, it creates microstructures that affect the coating quality. An important coating quality assessment metric that determines its effectiveness is porosity, the quantity of microstructures within the coating. In this article, we aim to build a novel algorithm to determine the microstructures in a thermal barrier coating, which is used to calculate porosity. The hybrid approach combines the efficiency of thresholding-based techniques and the accuracy of convolutional neural network (CNN) based techniques to perform a binary semantic segmentation. We evaluate the performance of the proposed hybrid approach on coating images generated from two different types of coating powders. These images exhibit various texture features. The experimental results show that the proposed hybrid approach outperforms the thresholding-based approach and the CNN-based approach in terms of accuracy on both types of images. In addition, the time complexity of the hybrid approach is also greatly optimized compared to the CNN-based approach.

热障涂层是一种广泛应用于各个行业的先进制造技术，它通过将熔化的涂层材料喷涂到基材表面，为基材提供隔热和表面保护。当熔化的涂层材料凝固时，会产生影响涂层质量的微观组织。决定其有效性的一个重要的涂层质量评价指标是孔隙率，即涂层内微结构的数量。在本文中，我们旨在建立一种新的算法来确定热障涂层中的微观结构，并用于计算孔隙率。该混合方法结合了基于阈值技术的效率和基于卷积神经网络(CNN)技术的准确性来执行二值语义分割。我们评估了所提出的混合方法在两种不同类型的涂层粉末生成的涂层图像上的性能。这些图像呈现出不同的纹理特征。实验结果表明，本文提出的混合方法在两类图像上的准确率均优于基于阈值的方法和基于cnn的方法。此外，与基于cnn的方法相比，混合方法的时间复杂度也得到了很大的优化。

{"title":"A Hybrid Image Segmentation Approach for Thermal Barrier Coating Quality Assessments","authors":"Zanyah Ailsworth, Wei-bang Chen, Yongjin Lu, Xiaoliang Wang, Melissa Tsui, H. Al-Ghaib, Ben Zimmerman","doi":"10.1109/MIPR51284.2021.00033","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00033","url":null,"abstract":"Thermal barrier coating, a widely used advanced manufacturing technique in various industries, provides thermal insulation and surface protection to a substrate by spraying melted coating materials on to the surface of the substrate. As the melted coating materials solidify, it creates microstructures that affect the coating quality. An important coating quality assessment metric that determines its effectiveness is porosity, the quantity of microstructures within the coating. In this article, we aim to build a novel algorithm to determine the microstructures in a thermal barrier coating, which is used to calculate porosity. The hybrid approach combines the efficiency of thresholding-based techniques and the accuracy of convolutional neural network (CNN) based techniques to perform a binary semantic segmentation. We evaluate the performance of the proposed hybrid approach on coating images generated from two different types of coating powders. These images exhibit various texture features. The experimental results show that the proposed hybrid approach outperforms the thresholding-based approach and the CNN-based approach in terms of accuracy on both types of images. In addition, the time complexity of the hybrid approach is also greatly optimized compared to the CNN-based approach.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127425225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A multi-modal dataset for analyzing the imageability of concepts across modalities 一个多模态数据集，用于分析概念跨模态的可想象性

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-09-01 DOI: 10.1109/MIPR51284.2021.00039

Marc A. Kastner, Chihaya Matsuhira, I. Ide, S. Satoh

Recently, multi-modal applications bring a need for a human-like understanding of the perception differences across modalities. For example, while something might have a clear image in a visual context, it might be perceived as too technical in a textual context. Such differences related to a semantic gap make a transfer between modalities or a combination of modalities in multi-modal processing a difficult task. Imageability as a concept from Psycholinguistics gives promising insight to the human perception of vision and language. In order to understand cross-modal differences of semantics, we create and analyze a cross-modal dataset for imageability. We estimate three imageability values grounded in 1) a visual space from a large set of images, 2) a textual space from Web-trained word embeddings, and 3) a phonetic space based on word pronunciations. A subset of the corpus is evaluated with an existing imageability dictionary to ensure a basic generalization, but otherwise targets finding cross-modal differences and outliers. We visualize the dataset and analyze it regarding outliers and differences for each modality. As additional sources of knowledge, part-of-speech and etymological origin of all words are estimated and analyzed in context of the modalities. The dataset of multi-modal imageability values and a link to an interactive browser with visualizations are made available on the Web.

最近，多模态应用需要像人类一样理解不同模态的感知差异。例如，虽然某些内容在视觉上下文中可能具有清晰的图像，但在文本上下文中可能会被认为过于技术性。这种与语义间隙相关的差异使得在多模态加工中模态之间或模态组合之间的转换成为一项困难的任务。可想象性作为心理语言学的一个概念，对人类对视觉和语言的感知提供了有希望的见解。为了理解语义的跨模态差异，我们创建并分析了跨模态数据集的可想象性。我们估计了三个可想象性值，它们基于:1)来自大量图像的视觉空间，2)来自网络训练的词嵌入的文本空间，以及3)基于单词发音的语音空间。语料库的子集使用现有的可想象性字典进行评估，以确保基本的泛化，但其他目标是寻找跨模态差异和异常值。我们将数据集可视化，并对每个模态的异常值和差异进行分析。作为额外的知识来源，词性和词源起源的所有单词的估计和分析在模态的上下文中。Web上提供了多模态可成像性值的数据集和到具有可视化功能的交互式浏览器的链接。

{"title":"A multi-modal dataset for analyzing the imageability of concepts across modalities","authors":"Marc A. Kastner, Chihaya Matsuhira, I. Ide, S. Satoh","doi":"10.1109/MIPR51284.2021.00039","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00039","url":null,"abstract":"Recently, multi-modal applications bring a need for a human-like understanding of the perception differences across modalities. For example, while something might have a clear image in a visual context, it might be perceived as too technical in a textual context. Such differences related to a semantic gap make a transfer between modalities or a combination of modalities in multi-modal processing a difficult task. Imageability as a concept from Psycholinguistics gives promising insight to the human perception of vision and language. In order to understand cross-modal differences of semantics, we create and analyze a cross-modal dataset for imageability. We estimate three imageability values grounded in 1) a visual space from a large set of images, 2) a textual space from Web-trained word embeddings, and 3) a phonetic space based on word pronunciations. A subset of the corpus is evaluated with an existing imageability dictionary to ensure a basic generalization, but otherwise targets finding cross-modal differences and outliers. We visualize the dataset and analyze it regarding outliers and differences for each modality. As additional sources of knowledge, part-of-speech and etymological origin of all words are estimated and analyzed in context of the modalities. The dataset of multi-modal imageability values and a link to an interactive browser with visualizations are made available on the Web.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132985388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Do as we do: Multiple Person Video-To-Video Transfer 像我们一样做:多人视频到视频传输

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

Pub Date : 2021-04-10 DOI: 10.1109/MIPR51284.2021.00020

Mickael Cormier, Houraalsadat Mortazavi Moshkenan, Franz Lörch, J. Metzler, J. Beyerer

Our goal is to transfer the motion of real people from a source video to a target video with realistic results. While recent advances significantly improved image-to-image translations, only few works account for body motions and temporal consistency. However, those focus only on video retargeting for a single actor/ for single actors. In this work, we propose a marker-less approach for multiple-person video-to-video transfer using pose as an intermediate representation. Given a source video with multiple persons dancing or working out, our method transfers the body motion of all actors to a new set of actors in a different video. Differently from recent "do as I do" methods, we focus specifically on transferring multiple person at the same time and tackle the related identity switch problem. Our method is able to convincingly transfer body motion to the target video, while preserving specific features of the target video, such as feet touching the floor and relative position of the actors. The evaluation is performed with visual quality and appearance metrics using publicly available videos with the permission of their owners.

我们的目标是将真实人物的动作从源视频转移到目标视频，并获得逼真的效果。虽然最近的进展显著改善了图像到图像的翻译，但只有很少的作品考虑到身体运动和时间一致性。然而，这些只关注单个演员/单个演员的视频重定向。在这项工作中，我们提出了一种使用姿势作为中间表示的多人视频到视频传输的无标记方法。给定一个有多人跳舞或锻炼的源视频，我们的方法将所有演员的身体动作转移到另一个视频中的一组新演员身上。与最近的“照我做”方法不同，我们专注于多人同时转移，并解决相关的身份转换问题。我们的方法能够令人信服地将身体运动转移到目标视频中，同时保留目标视频的特定特征，例如脚接触地板和演员的相对位置。评估是通过视觉质量和外观指标进行的，使用公开可用的视频，并获得其所有者的许可。

{"title":"Do as we do: Multiple Person Video-To-Video Transfer","authors":"Mickael Cormier, Houraalsadat Mortazavi Moshkenan, Franz Lörch, J. Metzler, J. Beyerer","doi":"10.1109/MIPR51284.2021.00020","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00020","url":null,"abstract":"Our goal is to transfer the motion of real people from a source video to a target video with realistic results. While recent advances significantly improved image-to-image translations, only few works account for body motions and temporal consistency. However, those focus only on video retargeting for a single actor/ for single actors. In this work, we propose a marker-less approach for multiple-person video-to-video transfer using pose as an intermediate representation. Given a source video with multiple persons dancing or working out, our method transfers the body motion of all actors to a new set of actors in a different video. Differently from recent \"do as I do\" methods, we focus specifically on transferring multiple person at the same time and tackle the related identity switch problem. Our method is able to convincingly transfer body motion to the target video, while preserving specific features of the target video, such as feet touching the floor and relative position of the actors. The evaluation is performed with visual quality and appearance metrics using publicly available videos with the permission of their owners.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124731061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀