首页 > 最新文献

IEICE Transactions on Information and Systems最新文献

英文 中文
Social Relation Atmosphere Recognition with Relevant Visual Concepts 基于相关视觉概念的社会关系氛围识别
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023pcp0008
Ying JI, Yu WANG, Kensaku MORI, Jien KATO
Social relationships (e.g., couples, opponents) are the foundational part of society. Social relation atmosphere describes the overall interaction environment between social relationships. Discovering social relation atmosphere can help machines better comprehend human behaviors and improve the performance of social intelligent applications. Most existing research mainly focuses on investigating social relationships, while ignoring the social relation atmosphere. Due to the complexity of the expressions in video data and the uncertainty of the social relation atmosphere, it is even difficult to define and evaluate. In this paper, we innovatively analyze the social relation atmosphere in video data. We introduce a Relevant Visual Concept (RVC) from the social relationship recognition task to facilitate social relation atmosphere recognition, because social relationships contain useful information about human interactions and surrounding environments, which are crucial clues for social relation atmosphere recognition. Our approach consists of two main steps: (1) we first generate a group of visual concepts that preserve the inherent social relationship information by utilizing a 3D explanation module; (2) the extracted relevant visual concepts are used to supplement the social relation atmosphere recognition. In addition, we present a new dataset based on the existing Video Social Relation Dataset. Each video is annotated with four kinds of social relation atmosphere attributes and one social relationship. We evaluate the proposed method on our dataset. Experiments with various 3D ConvNets and fusion methods demonstrate that the proposed method can effectively improve recognition accuracy compared to end-to-end ConvNets. The visualization results also indicate that essential information in social relationships can be discovered and used to enhance social relation atmosphere recognition.
社会关系(如夫妻、对手)是社会的基本组成部分。社会关系氛围描述了社会关系之间的整体互动环境。发现社会关系氛围可以帮助机器更好地理解人类行为,提高社会智能应用的性能。现有的研究大多侧重于对社会关系的考察,而忽略了对社会关系氛围的考察。由于视频数据表达的复杂性和社会关系氛围的不确定性,甚至难以定义和评价。本文创新性地分析了视频数据中的社会关系氛围。我们从社会关系识别任务中引入相关视觉概念(Relevant Visual Concept, RVC)来促进社会关系氛围的识别,因为社会关系包含有关人际互动和周围环境的有用信息,这些信息是社会关系氛围识别的重要线索。我们的方法包括两个主要步骤:(1)我们首先利用3D解释模块生成一组视觉概念,这些概念保留了固有的社会关系信息;(2)利用提取的相关视觉概念补充社会关系氛围识别。此外,我们在现有视频社交关系数据集的基础上提出了一个新的数据集。每个视频都标注了四种社会关系氛围属性和一种社会关系。我们在我们的数据集上评估了所提出的方法。对多种三维卷积神经网络和融合方法的实验表明,与端到端卷积神经网络相比,该方法可以有效提高识别精度。可视化结果还表明,可以发现社会关系中的重要信息,并利用这些信息来增强社会关系氛围的识别。
{"title":"Social Relation Atmosphere Recognition with Relevant Visual Concepts","authors":"Ying JI, Yu WANG, Kensaku MORI, Jien KATO","doi":"10.1587/transinf.2023pcp0008","DOIUrl":"https://doi.org/10.1587/transinf.2023pcp0008","url":null,"abstract":"Social relationships (e.g., couples, opponents) are the foundational part of society. Social relation atmosphere describes the overall interaction environment between social relationships. Discovering social relation atmosphere can help machines better comprehend human behaviors and improve the performance of social intelligent applications. Most existing research mainly focuses on investigating social relationships, while ignoring the social relation atmosphere. Due to the complexity of the expressions in video data and the uncertainty of the social relation atmosphere, it is even difficult to define and evaluate. In this paper, we innovatively analyze the social relation atmosphere in video data. We introduce a Relevant Visual Concept (RVC) from the social relationship recognition task to facilitate social relation atmosphere recognition, because social relationships contain useful information about human interactions and surrounding environments, which are crucial clues for social relation atmosphere recognition. Our approach consists of two main steps: (1) we first generate a group of visual concepts that preserve the inherent social relationship information by utilizing a 3D explanation module; (2) the extracted relevant visual concepts are used to supplement the social relation atmosphere recognition. In addition, we present a new dataset based on the existing Video Social Relation Dataset. Each video is annotated with four kinds of social relation atmosphere attributes and one social relationship. We evaluate the proposed method on our dataset. Experiments with various 3D ConvNets and fusion methods demonstrate that the proposed method can effectively improve recognition accuracy compared to end-to-end ConvNets. The visualization results also indicate that essential information in social relationships can be discovered and used to enhance social relation atmosphere recognition.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prior Information Based Decomposition and Reconstruction Learning for Micro-Expression Recognition 基于先验信息的微表情识别分解与重构学习
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2022edl8065
Jinsheng WEI, Haoyu CHEN, Guanming LU, Jingjie YAN, Yue XIE, Guoying ZHAO
Micro-expression recognition (MER) draws intensive research interest as micro-expressions (MEs) can infer genuine emotions. Prior information can guide the model to learn discriminative ME features effectively. However, most works focus on researching the general models with a stronger representation ability to adaptively aggregate ME movement information in a holistic way, which may ignore the prior information and properties of MEs. To solve this issue, driven by the prior information that the category of ME can be inferred by the relationship between the actions of facial different components, this work designs a novel model that can conform to this prior information and learn ME movement features in an interpretable way. Specifically, this paper proposes a Decomposition and Reconstruction-based Graph Representation Learning (DeRe-GRL) model to efectively learn high-level ME features. DeRe-GRL includes two modules: Action Decomposition Module (ADM) and Relation Reconstruction Module (RRM), where ADM learns action features of facial key components and RRM explores the relationship between these action features. Based on facial key components, ADM divides the geometric movement features extracted by the graph model-based backbone into several sub-features, and learns the map matrix to map these sub-features into multiple action features; then, RRM learns weights to weight all action features to build the relationship between action features. The experimental results demonstrate the effectiveness of the proposed modules, and the proposed method achieves competitive performance.
微表情识别是一种可以推断真实情绪的技术,引起了人们的广泛关注。先验信息可以有效地指导模型学习判别性特征。然而,大多数研究都集中在研究具有较强表征能力的通用模型,以整体的方式自适应地聚合MEs的运动信息,这可能忽略了MEs的先验信息和属性。为了解决这一问题,本文根据面部不同成分动作之间的关系可以推断出ME的类别这一先验信息,设计了一种符合这一先验信息的新模型,并以可解释的方式学习ME的运动特征。具体而言,本文提出了一种基于分解和重构的图表示学习(de - grl)模型,以有效地学习高级ME特征。dee - grl包括两个模块:动作分解模块(Action Decomposition Module, ADM)和关系重构模块(Relation Reconstruction Module, RRM),其中ADM学习面部关键成分的动作特征,RRM探索这些动作特征之间的关系。ADM基于人脸关键成分,将基于图模型主干提取的几何运动特征划分为若干个子特征,并学习映射矩阵将这些子特征映射为多个动作特征;然后,RRM学习权重,对所有动作特征进行加权,构建动作特征之间的关系。实验结果证明了所提模块的有效性,所提方法取得了较好的性能。
{"title":"Prior Information Based Decomposition and Reconstruction Learning for Micro-Expression Recognition","authors":"Jinsheng WEI, Haoyu CHEN, Guanming LU, Jingjie YAN, Yue XIE, Guoying ZHAO","doi":"10.1587/transinf.2022edl8065","DOIUrl":"https://doi.org/10.1587/transinf.2022edl8065","url":null,"abstract":"Micro-expression recognition (MER) draws intensive research interest as micro-expressions (MEs) can infer genuine emotions. Prior information can guide the model to learn discriminative ME features effectively. However, most works focus on researching the general models with a stronger representation ability to adaptively aggregate ME movement information in a holistic way, which may ignore the prior information and properties of MEs. To solve this issue, driven by the prior information that the category of ME can be inferred by the relationship between the actions of facial different components, this work designs a novel model that can conform to this prior information and learn ME movement features in an interpretable way. Specifically, this paper proposes a Decomposition and Reconstruction-based Graph Representation Learning (DeRe-GRL) model to efectively learn high-level ME features. DeRe-GRL includes two modules: Action Decomposition Module (ADM) and Relation Reconstruction Module (RRM), where ADM learns action features of facial key components and RRM explores the relationship between these action features. Based on facial key components, ADM divides the geometric movement features extracted by the graph model-based backbone into several sub-features, and learns the map matrix to map these sub-features into multiple action features; then, RRM learns weights to weight all action features to build the relationship between action features. The experimental results demonstrate the effectiveness of the proposed modules, and the proposed method achieves competitive performance.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantitative Estimation of Video Forgery with Anomaly Analysis of Optical Flow 基于光流异常分析的视频伪造定量估计
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2022edl8107
Wan Yeon LEE, Yun-Seok CHOI, Tong Min KIM
We propose a quantitative measurement technique of video forgery that eliminates the decision burden of subtle boundary between normal and tampered patterns. We also propose the automatic adjustment scheme of spatial and temporal target zones, which maximizes the abnormality measurement of forged videos. Evaluation shows that the proposed scheme provides manifest detection capability against both inter-frame and intra-frame forgeries.
我们提出了一种视频伪造的定量测量技术,消除了正常模式和篡改模式之间微妙边界的决策负担。我们还提出了时空目标区域的自动调整方案,最大限度地提高了伪造视频的异常测量。评估表明,该方案对帧间和帧内伪造都具有明显的检测能力。
{"title":"Quantitative Estimation of Video Forgery with Anomaly Analysis of Optical Flow","authors":"Wan Yeon LEE, Yun-Seok CHOI, Tong Min KIM","doi":"10.1587/transinf.2022edl8107","DOIUrl":"https://doi.org/10.1587/transinf.2022edl8107","url":null,"abstract":"We propose a quantitative measurement technique of video forgery that eliminates the decision burden of subtle boundary between normal and tampered patterns. We also propose the automatic adjustment scheme of spatial and temporal target zones, which maximizes the abnormality measurement of forged videos. Evaluation shows that the proposed scheme provides manifest detection capability against both inter-frame and intra-frame forgeries.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135373147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local-to-Global Structure-Aware Transformer for Question Answering over Structured Knowledge 面向结构化知识问答的局部到全局结构感知转换器
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023edp7034
Yingyao WANG, Han WANG, Chaoqun DUAN, Tiejun ZHAO
Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.
结构化知识(即表格和图表)上的问答任务需要对结构化信息进行编码的能力。传统的基于线性链自然语言的预训练语言模型不能直接用于表和图的编码。现有方法通过将结构化知识扁平化为序列,采用预先训练好的模型。然而,序列化操作会导致知识结构信息的丢失。为了更好地利用预先训练好的变压器进行结构化知识表示,我们提出了一种新的结构感知变压器(satans),它将知识的局部到全局结构信息注入到不同自关注层的掩膜中。具体来说,在较低的自关注层中,satans专注于每个知识令牌的局部结构信息,以学习它的更健壮的表示。在上层自关注层,satans进一步注入结构化知识的全局信息,实现知识标记间的信息集成。这样,satans可以有效地分别从知识序列和注意掩模中学习到语义表示和结构信息。我们在表格事实验证任务和知识库问答任务上对satans进行评估。此外,我们探索了两种将符号推理和语言推理相结合的方法来解决预训练模型缺乏符号推理能力的问题。实验结果表明,该方法在两个基准上的表现始终优于强基线。
{"title":"Local-to-Global Structure-Aware Transformer for Question Answering over Structured Knowledge","authors":"Yingyao WANG, Han WANG, Chaoqun DUAN, Tiejun ZHAO","doi":"10.1587/transinf.2023edp7034","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7034","url":null,"abstract":"Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135369904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Facial Mask Completion Using StyleGAN2 Preserving Features of the Person 使用StyleGAN2完成面膜,保留人物的特征
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023pcp0002
Norihiko KAWAI, Hiroaki KOIKE
Due to the global outbreak of coronaviruses, people are increasingly wearing masks even when photographed. As a result, photos uploaded to web pages and social networking services with the lower half of the face hidden are less likely to convey the attractiveness of the photographed persons. In this study, we propose a method to complete facial mask regions using StyleGAN2, a type of Generative Adversarial Networks (GAN). In the proposed method, a reference image of the same person without a mask is prepared separately from a target image of the person wearing a mask. After the mask region in the target image is temporarily inpainted, the face orientation and contour of the person in the reference image are changed to match those of the target image using StyleGAN2. The changed image is then composited into the mask region while correcting the color tone to produce a mask-free image while preserving the person's features.
由于新型冠状病毒的全球爆发,人们越来越多地戴着口罩拍照。因此,上传到网页和社交网络服务的照片中,遮住脸的下半部分不太可能传达出被拍照者的吸引力。在这项研究中,我们提出了一种使用StyleGAN2(一种生成式对抗网络(GAN))来完成人脸区域的方法。在所提出的方法中,将同一人不戴口罩的参考图像与戴口罩的人的目标图像分开制备。在目标图像中的掩模区域被临时填充后,使用StyleGAN2改变参考图像中人物的面部方向和轮廓,使其与目标图像相匹配。然后将改变后的图像合成到掩模区域,同时对色调进行校正,在保留人物特征的同时产生无掩模图像。
{"title":"Facial Mask Completion Using StyleGAN2 Preserving Features of the Person","authors":"Norihiko KAWAI, Hiroaki KOIKE","doi":"10.1587/transinf.2023pcp0002","DOIUrl":"https://doi.org/10.1587/transinf.2023pcp0002","url":null,"abstract":"Due to the global outbreak of coronaviruses, people are increasingly wearing masks even when photographed. As a result, photos uploaded to web pages and social networking services with the lower half of the face hidden are less likely to convey the attractiveness of the photographed persons. In this study, we propose a method to complete facial mask regions using StyleGAN2, a type of Generative Adversarial Networks (GAN). In the proposed method, a reference image of the same person without a mask is prepared separately from a target image of the person wearing a mask. After the mask region in the target image is temporarily inpainted, the face orientation and contour of the person in the reference image are changed to match those of the target image using StyleGAN2. The changed image is then composited into the mask region while correcting the color tone to produce a mask-free image while preserving the person's features.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Regressive Gaussian Process Latent Variable Model for Few-Frame Human Motion Prediction 基于回归高斯过程潜变量模型的少帧人体运动预测
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023pcp0001
Xin JIN, Jia GUO
Human motion prediction has always been an interesting research topic in computer vision and robotics. It means forecasting human movements in the future conditioning on historical 3-dimensional human skeleton sequences. Existing predicting algorithms usually rely on extensive annotated or non-annotated motion capture data and are non-adaptive. This paper addresses the problem of few-frame human motion prediction, in the spirit of the recent progress on manifold learning. More precisely, our approach is based on the insight that achieving an accurate prediction relies on a sufficiently linear expression in the latent space from a few training data in observation space. To accomplish this, we propose Regressive Gaussian Process Latent Variable Model (RGPLVM) that introduces a novel regressive kernel function for the model training. By doing so, our model produces a linear mapping from the training data space to the latent space, while effectively transforming the prediction of human motion in physical space to the linear regression analysis in the latent space equivalent. The comparison with two learning motion prediction approaches (the state-of-the-art meta learning and the classical LSTM-3LR) demonstrate that our GPLVM significantly improves the prediction performance on various of actions in the small-sample size regime.
人体运动预测一直是计算机视觉和机器人领域一个有趣的研究课题。它意味着在历史上的三维人体骨骼序列的条件下预测未来的人类运动。现有的预测算法通常依赖于大量带注释或不带注释的动作捕捉数据,并且是非自适应的。本文以流形学习的最新进展为精神,研究了少帧人体运动预测问题。更准确地说,我们的方法是基于这样一种见解,即实现准确的预测依赖于观察空间中少量训练数据在潜在空间中的充分线性表达。为了实现这一目标,我们提出了回归高斯过程潜变量模型(RGPLVM),该模型为模型训练引入了一种新的回归核函数。通过这样做,我们的模型产生了从训练数据空间到潜在空间的线性映射,同时有效地将物理空间中人体运动的预测转换为潜在空间当量中的线性回归分析。与两种学习运动预测方法(最先进的元学习和经典的LSTM-3LR)的比较表明,我们的GPLVM显著提高了小样本量范围内各种动作的预测性能。
{"title":"Regressive Gaussian Process Latent Variable Model for Few-Frame Human Motion Prediction","authors":"Xin JIN, Jia GUO","doi":"10.1587/transinf.2023pcp0001","DOIUrl":"https://doi.org/10.1587/transinf.2023pcp0001","url":null,"abstract":"Human motion prediction has always been an interesting research topic in computer vision and robotics. It means forecasting human movements in the future conditioning on historical 3-dimensional human skeleton sequences. Existing predicting algorithms usually rely on extensive annotated or non-annotated motion capture data and are non-adaptive. This paper addresses the problem of few-frame human motion prediction, in the spirit of the recent progress on manifold learning. More precisely, our approach is based on the insight that achieving an accurate prediction relies on a sufficiently linear expression in the latent space from a few training data in observation space. To accomplish this, we propose Regressive Gaussian Process Latent Variable Model (RGPLVM) that introduces a novel regressive kernel function for the model training. By doing so, our model produces a linear mapping from the training data space to the latent space, while effectively transforming the prediction of human motion in physical space to the linear regression analysis in the latent space equivalent. The comparison with two learning motion prediction approaches (the state-of-the-art meta learning and the classical LSTM-3LR) demonstrate that our GPLVM significantly improves the prediction performance on various of actions in the small-sample size regime.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Network-Based Post-Processing Filter on V-PCC Attribute Frames 基于神经网络的V-PCC属性帧后处理滤波
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023pcl0002
Keiichiro TAKADA, Yasuaki TOKUMO, Tomohiro IKAI, Takeshi CHUJOH
Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.
基于视频的点云压缩(V-PCC)利用视频压缩技术对密集的点云进行高效编码,以相对较小的计算负担提供最先进的压缩性能。V-PCC将三维点云数据转换成三种二维帧,即占用帧、几何帧和属性帧,并通过视频压缩进行编码。另一方面,由于视频压缩,这些帧的质量可能会下降。本文提出了一种基于自适应神经网络的属性帧后处理滤波器,以缓解属性帧的退化问题。在此基础上,研究了一种基于占用帧的训练方法。实验结果表明,Y、U和V的平均bd速率增益分别为3.0%、29.3%和22.2%。
{"title":"Neural Network-Based Post-Processing Filter on V-PCC Attribute Frames","authors":"Keiichiro TAKADA, Yasuaki TOKUMO, Tomohiro IKAI, Takeshi CHUJOH","doi":"10.1587/transinf.2023pcl0002","DOIUrl":"https://doi.org/10.1587/transinf.2023pcl0002","url":null,"abstract":"Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU-Accelerated Estimation and Targeted Reduction of Peak IR-Drop during Scan Chain Shifting 扫描链移位过程中峰值红外降的gpu加速估计和目标降低
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023edp7011
Shiling SHI, Stefan HOLST, Xiaoqing WEN
High power dissipation during scan test often causes undue yield loss, especially for low-power circuits. One major reason is that the resulting IR-drop in shift mode may corrupt test data. A common approach to solving this problem is partial-shift, in which multiple scan chains are formed and only one group of scan chains is shifted at a time. However, existing partial-shift based methods suffer from two major problems: (1) their IR-drop estimation is not accurate enough or computationally too expensive to be done for each shift cycle; (2) partial-shift is hence applied to all shift cycles, resulting in long test time. This paper addresses these two problems with a novel IR-drop-aware scan shift method, featuring: (1) Cycle-based IR-Drop Estimation (CIDE) supported by a GPU-accelerated dynamic power simulator to quickly find potential shift cycles with excessive peak IR-drop; (2) a scan shift scheduling method that generates a scan chain grouping targeted for each considered shift cycle to reduce the impact on test time. Experiments on ITC'99 benchmark circuits show that: (1) the CIDE is computationally feasible; (2) the proposed scan shift schedule can achieve a global peak IR-drop reduction of up to 47%. Its scheduling efficiency is 58.4% higher than that of an existing typical method on average, which means our method has less test time.
扫描测试过程中的高功率耗散常常导致不适当的良率损失,特别是对于低功耗电路。一个主要原因是在移位模式下产生的ir下降可能会损坏测试数据。解决这一问题的一种常用方法是部分移位,即形成多个扫描链,每次只移位一组扫描链。然而,现有的基于部分移位的方法存在两个主要问题:(1)对每个移位周期的红外降估计不够准确或计算成本太高;(2)因此,所有移位周期都采用部分移位,导致测试时间长。针对这两个问题,本文提出了一种新的红外降感知扫描移位方法,其特点是:(1)基于周期的红外降估计(CIDE),在gpu加速的动态功率模拟器的支持下,快速发现具有过高峰值红外降的潜在移位周期;(2)扫描移位调度方法,针对每个考虑的移位周期生成目标扫描链分组,以减少对测试时间的影响。在ITC’99基准电路上的实验表明:(1)CIDE在计算上是可行的;(2)所提出的扫描移位方案可使全局峰值红外降降低47%。它的调度效率比现有的典型方法平均提高58.4%,这意味着我们的方法具有更少的测试时间。
{"title":"GPU-Accelerated Estimation and Targeted Reduction of Peak IR-Drop during Scan Chain Shifting","authors":"Shiling SHI, Stefan HOLST, Xiaoqing WEN","doi":"10.1587/transinf.2023edp7011","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7011","url":null,"abstract":"High power dissipation during scan test often causes undue yield loss, especially for low-power circuits. One major reason is that the resulting IR-drop in shift mode may corrupt test data. A common approach to solving this problem is partial-shift, in which multiple scan chains are formed and only one group of scan chains is shifted at a time. However, existing partial-shift based methods suffer from two major problems: (1) their IR-drop estimation is not accurate enough or computationally too expensive to be done for each shift cycle; (2) partial-shift is hence applied to all shift cycles, resulting in long test time. This paper addresses these two problems with a novel IR-drop-aware scan shift method, featuring: (1) Cycle-based IR-Drop Estimation (CIDE) supported by a GPU-accelerated dynamic power simulator to quickly find potential shift cycles with excessive peak IR-drop; (2) a scan shift scheduling method that generates a scan chain grouping targeted for each considered shift cycle to reduce the impact on test time. Experiments on ITC'99 benchmark circuits show that: (1) the CIDE is computationally feasible; (2) the proposed scan shift schedule can achieve a global peak IR-drop reduction of up to 47%. Its scheduling efficiency is 58.4% higher than that of an existing typical method on average, which means our method has less test time.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Scale Estimation for Omni-Directional Saliency Maps Using Learnable Equator Bias 基于可学习赤道偏差的全向显著性图多尺度估计
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023edp7055
Takao YAMANAKA, Tatsuya SUZUKI, Taiki NOBUTSUNE, Chenjunlin WU
Omni-directional images have been used in wide range of applications including virtual/augmented realities, self-driving cars, robotics simulators, and surveillance systems. For these applications, it would be useful to estimate saliency maps representing probability distributions of gazing points with a head-mounted display, to detect important regions in the omni-directional images. This paper proposes a novel saliency-map estimation model for the omni-directional images by extracting overlapping 2-dimensional (2D) plane images from omni-directional images at various directions and angles of view. While 2D saliency maps tend to have high probability at the center of images (center bias), the high-probability region appears at horizontal directions in omni-directional saliency maps when a head-mounted display is used (equator bias). Therefore, the 2D saliency model with a center-bias layer was fine-tuned with an omni-directional dataset by replacing the center-bias layer to an equator-bias layer conditioned on the elevation angle for the extraction of the 2D plane image. The limited availability of omni-directional images in saliency datasets can be compensated by using the well-established 2D saliency model pretrained by a large number of training images with the ground truth of 2D saliency maps. In addition, this paper proposes a multi-scale estimation method by extracting 2D images in multiple angles of view to detect objects of various sizes with variable receptive fields. The saliency maps estimated from the multiple angles of view were integrated by using pixel-wise attention weights calculated in an integration layer for weighting the optimal scale to each object. The proposed method was evaluated using a publicly available dataset with evaluation metrics for omni-directional saliency maps. It was confirmed that the accuracy of the saliency maps was improved by the proposed method.
全方位图像已被广泛应用于虚拟/增强现实、自动驾驶汽车、机器人模拟器和监控系统等领域。对于这些应用,使用头戴式显示器估计表示凝视点概率分布的显著性地图,以检测全向图像中的重要区域将是有用的。本文提出了一种新的全向图像显著性图估计模型,该模型通过从不同视角和方向的全向图像中提取重叠的二维平面图像。虽然2D显着性地图往往在图像中心具有高概率(中心偏差),但当使用头戴式显示器时,高概率区域出现在全方位显着性地图的水平方向(赤道偏差)。因此,利用全向数据集对具有中心偏置层的二维显著性模型进行微调,将中心偏置层替换为以仰角为条件的赤道偏置层,提取二维平面图像。在显著性数据集中,全向图像的可用性有限,可以通过使用大量训练图像和2D显著性图的基础真值进行预训练而得到完善的2D显著性模型来弥补。此外,本文提出了一种多尺度估计方法,通过提取多个视角的二维图像来检测不同大小、不同感受域的目标。通过在集成层中计算逐像素的注意力权重,对多个视角估计的显著性图进行集成,并对每个目标进行最优比例尺加权。使用公开可用的数据集对所提出的方法进行了评估,该数据集具有全向显著性地图的评估指标。实验结果表明,该方法提高了显著性图的精度。
{"title":"Multi-Scale Estimation for Omni-Directional Saliency Maps Using Learnable Equator Bias","authors":"Takao YAMANAKA, Tatsuya SUZUKI, Taiki NOBUTSUNE, Chenjunlin WU","doi":"10.1587/transinf.2023edp7055","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7055","url":null,"abstract":"Omni-directional images have been used in wide range of applications including virtual/augmented realities, self-driving cars, robotics simulators, and surveillance systems. For these applications, it would be useful to estimate saliency maps representing probability distributions of gazing points with a head-mounted display, to detect important regions in the omni-directional images. This paper proposes a novel saliency-map estimation model for the omni-directional images by extracting overlapping 2-dimensional (2D) plane images from omni-directional images at various directions and angles of view. While 2D saliency maps tend to have high probability at the center of images (center bias), the high-probability region appears at horizontal directions in omni-directional saliency maps when a head-mounted display is used (equator bias). Therefore, the 2D saliency model with a center-bias layer was fine-tuned with an omni-directional dataset by replacing the center-bias layer to an equator-bias layer conditioned on the elevation angle for the extraction of the 2D plane image. The limited availability of omni-directional images in saliency datasets can be compensated by using the well-established 2D saliency model pretrained by a large number of training images with the ground truth of 2D saliency maps. In addition, this paper proposes a multi-scale estimation method by extracting 2D images in multiple angles of view to detect objects of various sizes with variable receptive fields. The saliency maps estimated from the multiple angles of view were integrated by using pixel-wise attention weights calculated in an integration layer for weighting the optimal scale to each object. The proposed method was evaluated using a publicly available dataset with evaluation metrics for omni-directional saliency maps. It was confirmed that the accuracy of the saliency maps was improved by the proposed method.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decentralized Incentive Scheme for Peer-to-Peer Video Streaming using Solana Blockchain 使用Solana区块链的点对点视频流的分散激励方案
4区 计算机科学 Q3 Engineering Pub Date : 2023-10-01 DOI: 10.1587/transinf.2023edp7027
Yunqi MA, Satoshi FUJITA
Peer-to-peer (P2P) technology has gained popularity as a way to enhance system performance. Nodes in a P2P network work together by providing network resources to one another. In this study, we examine the use of P2P technology for video streaming and develop a distributed incentive mechanism to prevent free-riding. Our proposed solution combines WebTorrent and the Solana blockchain and can be accessed through a web browser. To incentivize uploads, some of the received video chunks are encrypted using AES. Smart contracts on the blockchain are used for third-party verification of uploads and for managing access to the video content. Experimental results on a test network showed that our system can encrypt and decrypt chunks in about 1/40th the time it takes using WebRTC, without affecting the quality of video streaming. Smart contracts were also found to quickly verify uploads in about 860 milliseconds. The paper also explores how to effectively reward virtual points for uploads.
点对点(P2P)技术作为提高系统性能的一种方式已经得到了广泛的应用。P2P网络中的节点通过相互提供网络资源来协同工作。在本研究中,我们研究了视频流媒体中P2P技术的使用,并开发了一种分布式激励机制来防止搭便车。我们提出的解决方案结合了WebTorrent和Solana区块链,可以通过web浏览器访问。为了激励上传,一些接收到的视频块使用AES加密。区块链上的智能合约用于上传的第三方验证和管理对视频内容的访问。在测试网络上的实验结果表明,我们的系统在不影响视频流质量的情况下,加密和解密块的时间约为使用WebRTC的1/40。智能合约还被发现可以在大约860毫秒内快速验证上传。本文还探讨了如何有效地奖励上传的虚拟积分。
{"title":"Decentralized Incentive Scheme for Peer-to-Peer Video Streaming using Solana Blockchain","authors":"Yunqi MA, Satoshi FUJITA","doi":"10.1587/transinf.2023edp7027","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7027","url":null,"abstract":"Peer-to-peer (P2P) technology has gained popularity as a way to enhance system performance. Nodes in a P2P network work together by providing network resources to one another. In this study, we examine the use of P2P technology for video streaming and develop a distributed incentive mechanism to prevent free-riding. Our proposed solution combines WebTorrent and the Solana blockchain and can be accessed through a web browser. To incentivize uploads, some of the received video chunks are encrypted using AES. Smart contracts on the blockchain are used for third-party verification of uploads and for managing access to the video content. Experimental results on a test network showed that our system can encrypt and decrypt chunks in about 1/40th the time it takes using WebRTC, without affecting the quality of video streaming. Smart contracts were also found to quickly verify uploads in about 860 milliseconds. The paper also explores how to effectively reward virtual points for uploads.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEICE Transactions on Information and Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1