首页 > 最新文献

2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

英文 中文
Supplementing Omitted Named Entities in Cooking Procedural Text with Attached Images 用附加图像补充烹饪过程文本中遗漏的命名实体
Yixin Zhang, Yoko Yamakata, Keishi Tajima
In this research, we aim at supplementing named entities, such as food, omitted in the procedural text of recipe data. It helps users understand the recipe and is also necessary for the machine to understand the recipe data automatically. The contribution of this research is as follows. (1) We construct a dataset of Chinese recipes consisting of 12,548 recipes. To detect sentences in which food entities are omitted, we label named entities such as food, tool, and cooking actions in the procedural text by using the automatic recipe named entity recognition method. (2) We propose a method of recognizing food from the attached images. A procedural text of recipe data is often associated with an image, and the attached image often contains the food even when it is omitted in the procedural text. Tool entities in images in recipe data can be identified with high accuracy by conventional general object recognition techniques. On the other hand, the general object recognition methods in the literature, which assume that the properties of an object are constant, perform not well for food in recipe image data because food states change during cooking procedures. To solve this problem, we propose a method of obtaining food entity candidates from other steps that are similar to the target step, both in sentence similarity and image feature similarity. Among all the 246,195 procedural steps in our dataset, there are 16,593 steps in which the food entity is omitted in the procedural text. Our method is applied to supplement the food entities in these steps and achieves the accuracy of 67.55%.
在本研究中,我们旨在补充配方数据过程文本中遗漏的命名实体,如食品。它可以帮助用户理解配方,也是机器自动理解配方数据的必要条件。本研究的贡献如下。(1)我们构建了一个包含12548个中国食谱的数据集。为了检测省略食物实体的句子,我们使用自动配方命名实体识别方法在过程文本中标记食物、工具和烹饪动作等命名实体。(2)我们提出了一种从附图中识别食物的方法。食谱数据的过程文本通常与图像相关联,并且附加的图像通常包含食物,即使在过程文本中省略了它。传统的通用目标识别技术可以较好地识别配方数据中图像中的工具实体。另一方面,文献中的一般物体识别方法假设物体的属性是恒定的,由于食物在烹饪过程中状态会发生变化,因此对食谱图像数据中的食物识别效果不佳。为了解决这一问题,我们提出了一种从与目标步骤相似的其他步骤中获得候选食物实体的方法,包括句子相似度和图像特征相似度。在我们数据集中的所有246,195个过程步骤中,有16,593个步骤在过程文本中省略了食品实体。我们的方法用于这些步骤中食品实体的补充,准确率达到67.55%。
{"title":"Supplementing Omitted Named Entities in Cooking Procedural Text with Attached Images","authors":"Yixin Zhang, Yoko Yamakata, Keishi Tajima","doi":"10.1109/MIPR51284.2021.00037","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00037","url":null,"abstract":"In this research, we aim at supplementing named entities, such as food, omitted in the procedural text of recipe data. It helps users understand the recipe and is also necessary for the machine to understand the recipe data automatically. The contribution of this research is as follows. (1) We construct a dataset of Chinese recipes consisting of 12,548 recipes. To detect sentences in which food entities are omitted, we label named entities such as food, tool, and cooking actions in the procedural text by using the automatic recipe named entity recognition method. (2) We propose a method of recognizing food from the attached images. A procedural text of recipe data is often associated with an image, and the attached image often contains the food even when it is omitted in the procedural text. Tool entities in images in recipe data can be identified with high accuracy by conventional general object recognition techniques. On the other hand, the general object recognition methods in the literature, which assume that the properties of an object are constant, perform not well for food in recipe image data because food states change during cooking procedures. To solve this problem, we propose a method of obtaining food entity candidates from other steps that are similar to the target step, both in sentence similarity and image feature similarity. Among all the 246,195 procedural steps in our dataset, there are 16,593 steps in which the food entity is omitted in the procedural text. Our method is applied to supplement the food entities in these steps and achieves the accuracy of 67.55%.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130162051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Scale Context Interaction Learning network for Medical Image Segmentation 医学图像分割的多尺度上下文交互学习网络
Wenhao Fang, X. Han, Xu Qiao, Huiyan Jiang, Yenwei Chen
Semantic segmentation methods based on deep learning have provided the state-of-the-art performance in recent years. Based on deep learning, many Convolutional Neural Network (CNN) models have been proposed. Among them, U-Net with the simple encoder and decoder structure, can learn multi-scale features with various context information and has become one of the most popular neural network architectures for medical image segmentation. To reuse the features with the detail image structure in the encoder path, U-Net utilizes a skip-connection structure to simply copy the low-level features in the encoder to the decoder, and cannot explore the correlations between two paths and different scales. This study proposes a multi-scale context interaction learning network (MCIU-net) for medical image segmentation. First, to effectively fuse the features with detail structure in the encoder path and more semantic information in the decoder path, we conduct interaction learning on the corresponding scale via the bi-directional ConvLSTM (BConvLSTM) unit. Second, the interaction learning among all blocks of the decoder path is also employed for dynamically merging multi-scale contexts. We validate our proposed interaction learning network on three medical image datasets: retinal blood vessel segmentation, skin lesion segmentation, and lung segmentation, and demonstrate promising results compared with the state-of-the-art methods.
近年来,基于深度学习的语义分割方法提供了最先进的性能。基于深度学习,人们提出了许多卷积神经网络(CNN)模型。其中,U-Net以其简单的编码器和解码器结构,可以学习多种上下文信息的多尺度特征,成为医学图像分割中最流行的神经网络架构之一。为了重用编码器路径中具有细节图像结构的特征,U-Net采用了跳过连接结构,将编码器中的低级特征简单地复制到解码器中,而无法探索两条路径和不同尺度之间的相关性。提出了一种用于医学图像分割的多尺度上下文交互学习网络(MCIU-net)。首先,为了有效融合编码器路径中具有细节结构的特征和解码器路径中具有更多语义信息的特征,我们通过双向ConvLSTM (BConvLSTM)单元在相应尺度上进行交互学习。其次,还利用解码器路径各块之间的交互学习来动态合并多尺度上下文。我们在三个医学图像数据集上验证了我们提出的交互学习网络:视网膜血管分割、皮肤病变分割和肺分割,并与最先进的方法相比,展示了有希望的结果。
{"title":"Multi-Scale Context Interaction Learning network for Medical Image Segmentation","authors":"Wenhao Fang, X. Han, Xu Qiao, Huiyan Jiang, Yenwei Chen","doi":"10.1109/MIPR51284.2021.00036","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00036","url":null,"abstract":"Semantic segmentation methods based on deep learning have provided the state-of-the-art performance in recent years. Based on deep learning, many Convolutional Neural Network (CNN) models have been proposed. Among them, U-Net with the simple encoder and decoder structure, can learn multi-scale features with various context information and has become one of the most popular neural network architectures for medical image segmentation. To reuse the features with the detail image structure in the encoder path, U-Net utilizes a skip-connection structure to simply copy the low-level features in the encoder to the decoder, and cannot explore the correlations between two paths and different scales. This study proposes a multi-scale context interaction learning network (MCIU-net) for medical image segmentation. First, to effectively fuse the features with detail structure in the encoder path and more semantic information in the decoder path, we conduct interaction learning on the corresponding scale via the bi-directional ConvLSTM (BConvLSTM) unit. Second, the interaction learning among all blocks of the decoder path is also employed for dynamically merging multi-scale contexts. We validate our proposed interaction learning network on three medical image datasets: retinal blood vessel segmentation, skin lesion segmentation, and lung segmentation, and demonstrate promising results compared with the state-of-the-art methods.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126720384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Practice-Oriented Real-time Person Occurrence Search System 面向实践的实时人员发生搜索系统
S. Yamazaki, Hui Lam Ong, Jianquan Liu, Wei Jian Peh, Hong Yen Ong, Qinyu Huang, Xinlai Jiang
Face recognition is a potential technology to realize Person Occurrence Search (POS) application which retrieves all occurrences of a target person over multiple cameras. From the industry perspective, such a POS application requires a practice-oriented system that can respond to search requests in seconds, return search results nearly without false positives, and handle the variations of face angles and illumination in camera views. In this paper, we demonstrate a real-time person occurrence search system that adopts person re-identification for person occurrence tracking to achieve extremely low false positives. Our proposed system performs face detection and face clustering in an online manner to drastically reduce the response time of search requests from users. To retrieve person occurrence count and duration quickly, we design a process so-called Logical Occurrences that utilizes the maximum interval of detected time of faces to efficiently compute the occurrence count. Such a process can reduce the online computational complexity from O(M2) to O(M) by pre-computing elapsed time during the online face clustering. The proposed system is evaluated on a real data set which contains about 1 million of detected faces for search. In the experiments, our system responds to search requests within 2 seconds on average, and achieves 99.9% precision of search results over more than 200 actual search requests.
人脸识别是一种潜在的实现人物出现搜索(POS)应用的技术,它可以在多个摄像机中检索目标人物的所有出现情况。从行业的角度来看,这样的POS应用程序需要一个面向实践的系统,它可以在几秒钟内响应搜索请求,几乎没有误报地返回搜索结果,并处理相机视图中面部角度和光照的变化。在本文中,我们演示了一种实时人员出现搜索系统,该系统采用人员再识别进行人员出现跟踪,以达到极低的误报率。我们提出的系统以在线方式执行人脸检测和人脸聚类,从而大大缩短了用户搜索请求的响应时间。为了快速检索人的出现次数和持续时间,我们设计了一个所谓的“逻辑出现”过程,该过程利用人脸检测时间的最大间隔来有效地计算出现次数。该算法通过预先计算在线人脸聚类过程中的运行时间,将在线计算复杂度从0 (M2)降低到0 (M)。该系统在包含约100万张检测到的人脸的真实数据集上进行了评估。在实验中,我们的系统平均在2秒内响应搜索请求,在200多个实际搜索请求中,搜索结果的准确率达到99.9%。
{"title":"Practice-Oriented Real-time Person Occurrence Search System","authors":"S. Yamazaki, Hui Lam Ong, Jianquan Liu, Wei Jian Peh, Hong Yen Ong, Qinyu Huang, Xinlai Jiang","doi":"10.1109/MIPR51284.2021.00040","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00040","url":null,"abstract":"Face recognition is a potential technology to realize Person Occurrence Search (POS) application which retrieves all occurrences of a target person over multiple cameras. From the industry perspective, such a POS application requires a practice-oriented system that can respond to search requests in seconds, return search results nearly without false positives, and handle the variations of face angles and illumination in camera views. In this paper, we demonstrate a real-time person occurrence search system that adopts person re-identification for person occurrence tracking to achieve extremely low false positives. Our proposed system performs face detection and face clustering in an online manner to drastically reduce the response time of search requests from users. To retrieve person occurrence count and duration quickly, we design a process so-called Logical Occurrences that utilizes the maximum interval of detected time of faces to efficiently compute the occurrence count. Such a process can reduce the online computational complexity from O(M2) to O(M) by pre-computing elapsed time during the online face clustering. The proposed system is evaluated on a real data set which contains about 1 million of detected faces for search. In the experiments, our system responds to search requests within 2 seconds on average, and achieves 99.9% precision of search results over more than 200 actual search requests.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116847436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Violent Scene Detection of Film Videos Based on Multi-Task Learning of Temporal-Spatial Features 基于时空特征多任务学习的电影视频暴力场景检测
Z. Zheng, Wei Zhong, Long Ye, Li Fang, Qin Zhang
In this paper, we propose a new framework for the violent scene detection of film videos based on multi-task learning of temporal-spatial features. In the proposed framework, for the violent behavior representation of film clips, we employ a temporal excitation and aggregation network to extract the temporal-spatial deep features in the visual modality. And on the other hand, a recurrent neural network with local attention is utilized to extract the utterance-level representation of affective analysis in the audio modality. In the process of feature mapping, we concern the task of violent scene detection together with that of affective analysis and then propose a multi-task learning strategy to effectively predict the violent scene of film clips. To evaluate the effectiveness of the proposed method, the experiments are done on the task of violent scenes detection 2015. The experimental results show that our model outperforms most of the state of the art methods, validating the innovation of considering the task of violent scene detection jointly with the violence emotion analysis.
本文提出了一种基于时空特征多任务学习的电影视频暴力场景检测新框架。在本文提出的框架中,对于电影片段的暴力行为表征,我们采用时间激励和聚合网络来提取视觉模态中的时空深层特征。另一方面,利用具有局部注意的递归神经网络提取情态分析的话语级表示。在特征映射过程中,我们将暴力场景检测任务与情感分析任务结合起来,提出了一种多任务学习策略来有效地预测电影片段的暴力场景。为了评估该方法的有效性,在2015年的暴力场景检测任务上进行了实验。实验结果表明,我们的模型优于大多数最新的方法,验证了将暴力场景检测任务与暴力情绪分析联合考虑的创新。
{"title":"Violent Scene Detection of Film Videos Based on Multi-Task Learning of Temporal-Spatial Features","authors":"Z. Zheng, Wei Zhong, Long Ye, Li Fang, Qin Zhang","doi":"10.1109/MIPR51284.2021.00067","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00067","url":null,"abstract":"In this paper, we propose a new framework for the violent scene detection of film videos based on multi-task learning of temporal-spatial features. In the proposed framework, for the violent behavior representation of film clips, we employ a temporal excitation and aggregation network to extract the temporal-spatial deep features in the visual modality. And on the other hand, a recurrent neural network with local attention is utilized to extract the utterance-level representation of affective analysis in the audio modality. In the process of feature mapping, we concern the task of violent scene detection together with that of affective analysis and then propose a multi-task learning strategy to effectively predict the violent scene of film clips. To evaluate the effectiveness of the proposed method, the experiments are done on the task of violent scenes detection 2015. The experimental results show that our model outperforms most of the state of the art methods, validating the innovation of considering the task of violent scene detection jointly with the violence emotion analysis.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116775365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Hardness Prediction for More Reliable Attribute-based Person Re-identification
Lucas Florin, Andreas Specker, Arne Schumann, J. Beyerer
Recognition of person attributes in surveillance camera imagery is often used as an auxiliary cue in person re-identification approaches. Additionally, increasingly more attention is being payed to the cross modal task of person re-identification based purely on attribute queries. In both of these settings, the reliability of attribute predictions is crucial for success. However, the task attribute recognition is affected by several non-trivial challenges. These include common aspects, such as degraded image quality through low resolution, motion blur, lighting conditions and similar factors. Another important factor in the context of attribute recognition is, however, the lack of visibility due to occlusion through scene objects, other persons or self-occlusion or simply due to mis-cropped person detections. All these factors make attribute prediction challenging and the resulting detections everything but reliable. In order to improve their applicability to person re-identification, we propose to apply hardness prediction models and provide an additional hardness score with each attribute that measures the likelihood of the actual prediction to be reliable. We investigate several key aspects of hardness prediction in the context of attribute recognition and compare our resulting hardness predictor to several alternatives. Finally, we include the hardness prediction into an attribute-based re-identification task and show improvements in the resulting accuracy. Our code is available at https://github.com/Lucas-Florin/hardness-predictor-for-par.
监控摄像机图像中人物属性的识别常被用作人物再识别的辅助线索。此外,纯基于属性查询的跨模态人员再识别也越来越受到重视。在这两种情况下,属性预测的可靠性对成功至关重要。然而,任务属性识别受到一些重要挑战的影响。这些包括常见的方面,如低分辨率、运动模糊、照明条件和类似因素导致的图像质量下降。然而,在属性识别的背景下,另一个重要因素是由于场景物体、其他人或自身遮挡或仅仅由于错误裁剪的人检测而缺乏可见性。所有这些因素都使属性预测具有挑战性,并且结果检测并不可靠。为了提高其对人员再识别的适用性,我们建议应用硬度预测模型,并为每个属性提供额外的硬度分数,以衡量实际预测的可靠性。我们研究了属性识别背景下硬度预测的几个关键方面,并将我们得到的硬度预测器与几个替代方案进行了比较。最后,我们将硬度预测纳入到基于属性的再识别任务中,并展示了结果精度的改进。我们的代码可在https://github.com/Lucas-Florin/hardness-predictor-for-par上获得。
{"title":"Hardness Prediction for More Reliable Attribute-based Person Re-identification","authors":"Lucas Florin, Andreas Specker, Arne Schumann, J. Beyerer","doi":"10.1109/MIPR51284.2021.00077","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00077","url":null,"abstract":"Recognition of person attributes in surveillance camera imagery is often used as an auxiliary cue in person re-identification approaches. Additionally, increasingly more attention is being payed to the cross modal task of person re-identification based purely on attribute queries. In both of these settings, the reliability of attribute predictions is crucial for success. However, the task attribute recognition is affected by several non-trivial challenges. These include common aspects, such as degraded image quality through low resolution, motion blur, lighting conditions and similar factors. Another important factor in the context of attribute recognition is, however, the lack of visibility due to occlusion through scene objects, other persons or self-occlusion or simply due to mis-cropped person detections. All these factors make attribute prediction challenging and the resulting detections everything but reliable. In order to improve their applicability to person re-identification, we propose to apply hardness prediction models and provide an additional hardness score with each attribute that measures the likelihood of the actual prediction to be reliable. We investigate several key aspects of hardness prediction in the context of attribute recognition and compare our resulting hardness predictor to several alternatives. Finally, we include the hardness prediction into an attribute-based re-identification task and show improvements in the resulting accuracy. Our code is available at https://github.com/Lucas-Florin/hardness-predictor-for-par.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125256411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Cross-domain Person Re-Identification with Identity-preserving Style Transfer 保留身份风格迁移的跨域人物再认同
Shixing Chen, Caojin Zhang, Mingtao Dong, Chengcui Zhang
Although great successes have been achieved recently in person re-identification (re-ID), there are still two major obstacles restricting its real-world performance: large variety of camera styles and a limited number of samples for each identity. In this paper, we propose an efficient and scalable framework for cross-domain re-ID tasks. Single-model style transfer and pairwise comparison are seamlessly integrated in our framework through adversarial training. Moreover, we propose a novel identity-preserving loss to replace the content loss in style transfer and mathematically show that its minimization guarantees that the generated images have identical conditional distributions (conditioned on identity) as the real ones, which is critical for cross-domain person re-ID. Our model achieved state-of-the-art results in challenging cross-domain re-ID tasks.
尽管最近在个人再识别(re-ID)方面取得了巨大的成功,但仍然有两个主要障碍限制了它在现实世界的表现:各种各样的相机风格和每个身份的有限数量的样本。在本文中,我们提出了一个高效和可扩展的跨域重标识任务框架。单模型风格转移和两两比较通过对抗性训练无缝地集成在我们的框架中。此外,我们提出了一种新的身份保持损失来取代风格转移中的内容损失,并从数学上证明了它的最小化保证了生成的图像具有与真实图像相同的条件分布(以身份为条件),这对于跨域人重新识别至关重要。我们的模型在具有挑战性的跨域重新识别任务中取得了最先进的结果。
{"title":"Cross-domain Person Re-Identification with Identity-preserving Style Transfer","authors":"Shixing Chen, Caojin Zhang, Mingtao Dong, Chengcui Zhang","doi":"10.1109/MIPR51284.2021.00008","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00008","url":null,"abstract":"Although great successes have been achieved recently in person re-identification (re-ID), there are still two major obstacles restricting its real-world performance: large variety of camera styles and a limited number of samples for each identity. In this paper, we propose an efficient and scalable framework for cross-domain re-ID tasks. Single-model style transfer and pairwise comparison are seamlessly integrated in our framework through adversarial training. Moreover, we propose a novel identity-preserving loss to replace the content loss in style transfer and mathematically show that its minimization guarantees that the generated images have identical conditional distributions (conditioned on identity) as the real ones, which is critical for cross-domain person re-ID. Our model achieved state-of-the-art results in challenging cross-domain re-ID tasks.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126068778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design and Development of an Intelligent Pet-Type Quadruped Robot 智能宠物型四足机器人的设计与开发
Feng Gao, Chengjia Lei, Xingguo Long, Jin Wang, Peiheng Song
Inspired by the assistance that artificial intelligence offers to artistic creation, we apply AI technology to create the Open Monster C class number 01 (OM-C01), a quadruped robot dog as lifelike as an artwork. OM-C01 adopts a 2-DoF five-bar parallel mechanism to realize the thigh and shank bionic structure. We combine the visual learning system based on few-shot learning and incremental learning with GPT-2 pre-training language model to endow OM-C01 the same learning ability as a pet. OM-C01 can make decisions based on the facial expression as well as its emotional state, and shape a unique personality by updating the Q-table. Meanwhile, we implement a digital twin simulation environment for OM-C01 based on .NET WPF, which is convenient for designing various actions.
受人工智能对艺术创作的帮助启发,我们运用人工智能技术创造了开放怪物C类01 (OM-C01),这是一只像艺术品一样栩栩如生的四足机器狗。OM-C01采用二自由度五杆并联机构实现大腿和小腿仿生结构。我们将基于少次学习和增量学习的视觉学习系统与GPT-2预训练语言模型相结合,赋予OM-C01与宠物一样的学习能力。OM-C01可以根据面部表情和情绪状态做出决定,通过更新q表塑造独特的个性。同时,基于。net WPF实现了OM-C01的数字孪生仿真环境,方便了各种动作的设计。
{"title":"Design and Development of an Intelligent Pet-Type Quadruped Robot","authors":"Feng Gao, Chengjia Lei, Xingguo Long, Jin Wang, Peiheng Song","doi":"10.1109/MIPR51284.2021.00068","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00068","url":null,"abstract":"Inspired by the assistance that artificial intelligence offers to artistic creation, we apply AI technology to create the Open Monster C class number 01 (OM-C01), a quadruped robot dog as lifelike as an artwork. OM-C01 adopts a 2-DoF five-bar parallel mechanism to realize the thigh and shank bionic structure. We combine the visual learning system based on few-shot learning and incremental learning with GPT-2 pre-training language model to endow OM-C01 the same learning ability as a pet. OM-C01 can make decisions based on the facial expression as well as its emotional state, and shape a unique personality by updating the Q-table. Meanwhile, we implement a digital twin simulation environment for OM-C01 based on .NET WPF, which is convenient for designing various actions.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"334 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123183775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effect of Walkability on Rental Prices in Tokyo 东京可步行性对租房价格的影响
A. Bramson, Megumi Hori
In order to measure the role of walkability in determining the perceived quality of an area, and also to determine which kinds of amenities contribute the most to enhancing walkability, we perform a hedonistic regression of rental prices on 23 categories of establishments within various walking ranges from each station in central Tokyo. Using an integrated walking network, we collect the reachable nodes within various isochrones (<5min, <10min, <15min, 5-10min, 10-15min) from each station, and then by buffering the traversed edges we identify reachable stores for each one. We also collect selected similar rental properties within 15 minutes of each station to estimate variations in value for each area. Our regression model aims to uncover how much of the price variations can be explained by walkability, and also which kinds of establishment contribute the most to walkability’s benefit. We find that the number of convenience stores is a reliable indicator of neighborhood quality, but relationships of other establishments to walkability depend on distance from the station and often have counter-intuitive effects.
为了衡量可步行性在确定区域感知质量中的作用,并确定哪种设施对提高可步行性贡献最大,我们对东京市中心每个车站不同步行范围内的23类场所的租金价格进行了快乐回归。我们利用一个集成的步行网络,从每个站点收集不同等时线(<5min, <10min, <15min, 5-10min, 10-15min)内的可达节点,然后通过缓冲遍历的边来确定每个节点的可达存储。我们还收集了距离每个车站15分钟内的类似出租物业,以估计每个区域的价值变化。我们的回归模型旨在揭示有多少价格变化可以用步行性来解释,以及哪种类型的企业对步行性的好处贡献最大。我们发现便利商店的数量是社区质量的可靠指标,但其他设施与步行性的关系取决于与车站的距离,并且往往具有反直觉的效果。
{"title":"Effect of Walkability on Rental Prices in Tokyo","authors":"A. Bramson, Megumi Hori","doi":"10.1109/MIPR51284.2021.00054","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00054","url":null,"abstract":"In order to measure the role of walkability in determining the perceived quality of an area, and also to determine which kinds of amenities contribute the most to enhancing walkability, we perform a hedonistic regression of rental prices on 23 categories of establishments within various walking ranges from each station in central Tokyo. Using an integrated walking network, we collect the reachable nodes within various isochrones (<5min, <10min, <15min, 5-10min, 10-15min) from each station, and then by buffering the traversed edges we identify reachable stores for each one. We also collect selected similar rental properties within 15 minutes of each station to estimate variations in value for each area. Our regression model aims to uncover how much of the price variations can be explained by walkability, and also which kinds of establishment contribute the most to walkability’s benefit. We find that the number of convenience stores is a reliable indicator of neighborhood quality, but relationships of other establishments to walkability depend on distance from the station and often have counter-intuitive effects.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Portable Musical Simulation System Based on Unified Temperament 基于统一气质的智能便携式音乐模拟系统
Lin Gan, Li Lv, Cuicui Wang, Mu Zhang
This study builds a digital system of a portable musical instrument based on Unified Temperament. The system utilizes Equal-temperament, which integrates different modes of playing on the Musical Pad. By using the visualized and digitalized system, people without musical training will be able to give a musical performance. The Musical Pad simulates different musical instruments, including keyboard, woodwind, string, and other orchestral instruments. Therefore, music lovers can cooperate to play a variety of parts in polyphonic music. The system is suitable for general music education for non-artistic students in primary and middle schools. In the new form for music teaching and appreciation, students can participate more actively.
本研究构建了一个基于统一律的便携式乐器数字系统。该系统采用了Equal-temperament,在Musical Pad上集成了不同的演奏模式。通过使用可视化和数字化的系统,没有受过音乐训练的人也可以进行音乐表演。音乐垫模拟不同的乐器,包括键盘,木管,弦乐器和其他管弦乐乐器。因此,音乐爱好者可以在复调音乐中合作演奏各种各样的部分。该系统适用于中小学非艺术学生的普通音乐教育。在新的音乐教学和欣赏形式中,学生可以更积极地参与。
{"title":"Smart Portable Musical Simulation System Based on Unified Temperament","authors":"Lin Gan, Li Lv, Cuicui Wang, Mu Zhang","doi":"10.1109/MIPR51284.2021.00069","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00069","url":null,"abstract":"This study builds a digital system of a portable musical instrument based on Unified Temperament. The system utilizes Equal-temperament, which integrates different modes of playing on the Musical Pad. By using the visualized and digitalized system, people without musical training will be able to give a musical performance. The Musical Pad simulates different musical instruments, including keyboard, woodwind, string, and other orchestral instruments. Therefore, music lovers can cooperate to play a variety of parts in polyphonic music. The system is suitable for general music education for non-artistic students in primary and middle schools. In the new form for music teaching and appreciation, students can participate more actively.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134455874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A probabilistic and random method for the generation of Bai nationality music fragments 白族音乐片段的概率随机生成方法
Pengcheng Shang, Shan Ni, Li Zhou
Based on the theory of Chinese folk music, this paper analyzes the characteristics of Chinese Bai nationality music works, applies probabilistic and random methods to generate music fragments -with Bai nationality style, and conducts expert interviews on the generated melodies. The interview results show that, to some extent, the generation method of Bai nationality music fragments based on probability and randomness is effective for the melody creations with Bai nationality style, which is consistent with the characteristics of Bai nationality music. This method can also play a reference role in the intelligent protection and inheritance of Chinese folk music.
本文以中国民族音乐理论为基础,分析了中国白族音乐作品的特点,运用概率和随机方法生成具有白族风格的音乐片段,并对生成的旋律进行了专家访谈。访谈结果表明,基于概率和随机性的白族音乐片段生成方法在一定程度上对于白族风格的旋律创作是有效的,符合白族音乐的特点。该方法对中国民间音乐的智能保护与传承也具有一定的借鉴作用。
{"title":"A probabilistic and random method for the generation of Bai nationality music fragments","authors":"Pengcheng Shang, Shan Ni, Li Zhou","doi":"10.1109/MIPR51284.2021.00057","DOIUrl":"https://doi.org/10.1109/MIPR51284.2021.00057","url":null,"abstract":"Based on the theory of Chinese folk music, this paper analyzes the characteristics of Chinese Bai nationality music works, applies probabilistic and random methods to generate music fragments -with Bai nationality style, and conducts expert interviews on the generated melodies. The interview results show that, to some extent, the generation method of Bai nationality music fragments based on probability and randomness is effective for the melody creations with Bai nationality style, which is consistent with the characteristics of Bai nationality music. This method can also play a reference role in the intelligent protection and inheritance of Chinese folk music.","PeriodicalId":139543,"journal":{"name":"2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130827563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 IEEE 4th International Conference on Multimedia Information Processing and Retrieval (MIPR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1