Pub Date : 2023-10-01DOI: 10.1587/transinf.2023pcp0008
Ying JI, Yu WANG, Kensaku MORI, Jien KATO
Social relationships (e.g., couples, opponents) are the foundational part of society. Social relation atmosphere describes the overall interaction environment between social relationships. Discovering social relation atmosphere can help machines better comprehend human behaviors and improve the performance of social intelligent applications. Most existing research mainly focuses on investigating social relationships, while ignoring the social relation atmosphere. Due to the complexity of the expressions in video data and the uncertainty of the social relation atmosphere, it is even difficult to define and evaluate. In this paper, we innovatively analyze the social relation atmosphere in video data. We introduce a Relevant Visual Concept (RVC) from the social relationship recognition task to facilitate social relation atmosphere recognition, because social relationships contain useful information about human interactions and surrounding environments, which are crucial clues for social relation atmosphere recognition. Our approach consists of two main steps: (1) we first generate a group of visual concepts that preserve the inherent social relationship information by utilizing a 3D explanation module; (2) the extracted relevant visual concepts are used to supplement the social relation atmosphere recognition. In addition, we present a new dataset based on the existing Video Social Relation Dataset. Each video is annotated with four kinds of social relation atmosphere attributes and one social relationship. We evaluate the proposed method on our dataset. Experiments with various 3D ConvNets and fusion methods demonstrate that the proposed method can effectively improve recognition accuracy compared to end-to-end ConvNets. The visualization results also indicate that essential information in social relationships can be discovered and used to enhance social relation atmosphere recognition.
{"title":"Social Relation Atmosphere Recognition with Relevant Visual Concepts","authors":"Ying JI, Yu WANG, Kensaku MORI, Jien KATO","doi":"10.1587/transinf.2023pcp0008","DOIUrl":"https://doi.org/10.1587/transinf.2023pcp0008","url":null,"abstract":"Social relationships (e.g., couples, opponents) are the foundational part of society. Social relation atmosphere describes the overall interaction environment between social relationships. Discovering social relation atmosphere can help machines better comprehend human behaviors and improve the performance of social intelligent applications. Most existing research mainly focuses on investigating social relationships, while ignoring the social relation atmosphere. Due to the complexity of the expressions in video data and the uncertainty of the social relation atmosphere, it is even difficult to define and evaluate. In this paper, we innovatively analyze the social relation atmosphere in video data. We introduce a Relevant Visual Concept (RVC) from the social relationship recognition task to facilitate social relation atmosphere recognition, because social relationships contain useful information about human interactions and surrounding environments, which are crucial clues for social relation atmosphere recognition. Our approach consists of two main steps: (1) we first generate a group of visual concepts that preserve the inherent social relationship information by utilizing a 3D explanation module; (2) the extracted relevant visual concepts are used to supplement the social relation atmosphere recognition. In addition, we present a new dataset based on the existing Video Social Relation Dataset. Each video is annotated with four kinds of social relation atmosphere attributes and one social relationship. We evaluate the proposed method on our dataset. Experiments with various 3D ConvNets and fusion methods demonstrate that the proposed method can effectively improve recognition accuracy compared to end-to-end ConvNets. The visualization results also indicate that essential information in social relationships can be discovered and used to enhance social relation atmosphere recognition.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Micro-expression recognition (MER) draws intensive research interest as micro-expressions (MEs) can infer genuine emotions. Prior information can guide the model to learn discriminative ME features effectively. However, most works focus on researching the general models with a stronger representation ability to adaptively aggregate ME movement information in a holistic way, which may ignore the prior information and properties of MEs. To solve this issue, driven by the prior information that the category of ME can be inferred by the relationship between the actions of facial different components, this work designs a novel model that can conform to this prior information and learn ME movement features in an interpretable way. Specifically, this paper proposes a Decomposition and Reconstruction-based Graph Representation Learning (DeRe-GRL) model to efectively learn high-level ME features. DeRe-GRL includes two modules: Action Decomposition Module (ADM) and Relation Reconstruction Module (RRM), where ADM learns action features of facial key components and RRM explores the relationship between these action features. Based on facial key components, ADM divides the geometric movement features extracted by the graph model-based backbone into several sub-features, and learns the map matrix to map these sub-features into multiple action features; then, RRM learns weights to weight all action features to build the relationship between action features. The experimental results demonstrate the effectiveness of the proposed modules, and the proposed method achieves competitive performance.
{"title":"Prior Information Based Decomposition and Reconstruction Learning for Micro-Expression Recognition","authors":"Jinsheng WEI, Haoyu CHEN, Guanming LU, Jingjie YAN, Yue XIE, Guoying ZHAO","doi":"10.1587/transinf.2022edl8065","DOIUrl":"https://doi.org/10.1587/transinf.2022edl8065","url":null,"abstract":"Micro-expression recognition (MER) draws intensive research interest as micro-expressions (MEs) can infer genuine emotions. Prior information can guide the model to learn discriminative ME features effectively. However, most works focus on researching the general models with a stronger representation ability to adaptively aggregate ME movement information in a holistic way, which may ignore the prior information and properties of MEs. To solve this issue, driven by the prior information that the category of ME can be inferred by the relationship between the actions of facial different components, this work designs a novel model that can conform to this prior information and learn ME movement features in an interpretable way. Specifically, this paper proposes a Decomposition and Reconstruction-based Graph Representation Learning (DeRe-GRL) model to efectively learn high-level ME features. DeRe-GRL includes two modules: Action Decomposition Module (ADM) and Relation Reconstruction Module (RRM), where ADM learns action features of facial key components and RRM explores the relationship between these action features. Based on facial key components, ADM divides the geometric movement features extracted by the graph model-based backbone into several sub-features, and learns the map matrix to map these sub-features into multiple action features; then, RRM learns weights to weight all action features to build the relationship between action features. The experimental results demonstrate the effectiveness of the proposed modules, and the proposed method achieves competitive performance.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1587/transinf.2022edl8107
Wan Yeon LEE, Yun-Seok CHOI, Tong Min KIM
We propose a quantitative measurement technique of video forgery that eliminates the decision burden of subtle boundary between normal and tampered patterns. We also propose the automatic adjustment scheme of spatial and temporal target zones, which maximizes the abnormality measurement of forged videos. Evaluation shows that the proposed scheme provides manifest detection capability against both inter-frame and intra-frame forgeries.
{"title":"Quantitative Estimation of Video Forgery with Anomaly Analysis of Optical Flow","authors":"Wan Yeon LEE, Yun-Seok CHOI, Tong Min KIM","doi":"10.1587/transinf.2022edl8107","DOIUrl":"https://doi.org/10.1587/transinf.2022edl8107","url":null,"abstract":"We propose a quantitative measurement technique of video forgery that eliminates the decision burden of subtle boundary between normal and tampered patterns. We also propose the automatic adjustment scheme of spatial and temporal target zones, which maximizes the abnormality measurement of forged videos. Evaluation shows that the proposed scheme provides manifest detection capability against both inter-frame and intra-frame forgeries.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135373147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1587/transinf.2023edp7034
Yingyao WANG, Han WANG, Chaoqun DUAN, Tiejun ZHAO
Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.
{"title":"Local-to-Global Structure-Aware Transformer for Question Answering over Structured Knowledge","authors":"Yingyao WANG, Han WANG, Chaoqun DUAN, Tiejun ZHAO","doi":"10.1587/transinf.2023edp7034","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7034","url":null,"abstract":"Question-answering tasks over structured knowledge (i.e., tables and graphs) require the ability to encode structural information. Traditional pre-trained language models trained on linear-chain natural language cannot be directly applied to encode tables and graphs. The existing methods adopt the pre-trained models in such tasks by flattening structured knowledge into sequences. However, the serialization operation will lead to the loss of the structural information of knowledge. To better employ pre-trained transformers for structured knowledge representation, we propose a novel structure-aware transformer (SATrans) that injects the local-to-global structural information of the knowledge into the mask of the different self-attention layers. Specifically, in the lower self-attention layers, SATrans focus on the local structural information of each knowledge token to learn a more robust representation of it. In the upper self-attention layers, SATrans further injects the global information of the structured knowledge to integrate the information among knowledge tokens. In this way, the SATrans can effectively learn the semantic representation and structural information from the knowledge sequence and the attention mask, respectively. We evaluate SATrans on the table fact verification task and the knowledge base question-answering task. Furthermore, we explore two methods to combine symbolic and linguistic reasoning for these tasks to solve the problem that the pre-trained models lack symbolic reasoning ability. The experiment results reveal that the methods consistently outperform strong baselines on the two benchmarks.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135369904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1587/transinf.2023pcp0002
Norihiko KAWAI, Hiroaki KOIKE
Due to the global outbreak of coronaviruses, people are increasingly wearing masks even when photographed. As a result, photos uploaded to web pages and social networking services with the lower half of the face hidden are less likely to convey the attractiveness of the photographed persons. In this study, we propose a method to complete facial mask regions using StyleGAN2, a type of Generative Adversarial Networks (GAN). In the proposed method, a reference image of the same person without a mask is prepared separately from a target image of the person wearing a mask. After the mask region in the target image is temporarily inpainted, the face orientation and contour of the person in the reference image are changed to match those of the target image using StyleGAN2. The changed image is then composited into the mask region while correcting the color tone to produce a mask-free image while preserving the person's features.
{"title":"Facial Mask Completion Using StyleGAN2 Preserving Features of the Person","authors":"Norihiko KAWAI, Hiroaki KOIKE","doi":"10.1587/transinf.2023pcp0002","DOIUrl":"https://doi.org/10.1587/transinf.2023pcp0002","url":null,"abstract":"Due to the global outbreak of coronaviruses, people are increasingly wearing masks even when photographed. As a result, photos uploaded to web pages and social networking services with the lower half of the face hidden are less likely to convey the attractiveness of the photographed persons. In this study, we propose a method to complete facial mask regions using StyleGAN2, a type of Generative Adversarial Networks (GAN). In the proposed method, a reference image of the same person without a mask is prepared separately from a target image of the person wearing a mask. After the mask region in the target image is temporarily inpainted, the face orientation and contour of the person in the reference image are changed to match those of the target image using StyleGAN2. The changed image is then composited into the mask region while correcting the color tone to produce a mask-free image while preserving the person's features.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1587/transinf.2023pcp0001
Xin JIN, Jia GUO
Human motion prediction has always been an interesting research topic in computer vision and robotics. It means forecasting human movements in the future conditioning on historical 3-dimensional human skeleton sequences. Existing predicting algorithms usually rely on extensive annotated or non-annotated motion capture data and are non-adaptive. This paper addresses the problem of few-frame human motion prediction, in the spirit of the recent progress on manifold learning. More precisely, our approach is based on the insight that achieving an accurate prediction relies on a sufficiently linear expression in the latent space from a few training data in observation space. To accomplish this, we propose Regressive Gaussian Process Latent Variable Model (RGPLVM) that introduces a novel regressive kernel function for the model training. By doing so, our model produces a linear mapping from the training data space to the latent space, while effectively transforming the prediction of human motion in physical space to the linear regression analysis in the latent space equivalent. The comparison with two learning motion prediction approaches (the state-of-the-art meta learning and the classical LSTM-3LR) demonstrate that our GPLVM significantly improves the prediction performance on various of actions in the small-sample size regime.
{"title":"Regressive Gaussian Process Latent Variable Model for Few-Frame Human Motion Prediction","authors":"Xin JIN, Jia GUO","doi":"10.1587/transinf.2023pcp0001","DOIUrl":"https://doi.org/10.1587/transinf.2023pcp0001","url":null,"abstract":"Human motion prediction has always been an interesting research topic in computer vision and robotics. It means forecasting human movements in the future conditioning on historical 3-dimensional human skeleton sequences. Existing predicting algorithms usually rely on extensive annotated or non-annotated motion capture data and are non-adaptive. This paper addresses the problem of few-frame human motion prediction, in the spirit of the recent progress on manifold learning. More precisely, our approach is based on the insight that achieving an accurate prediction relies on a sufficiently linear expression in the latent space from a few training data in observation space. To accomplish this, we propose Regressive Gaussian Process Latent Variable Model (RGPLVM) that introduces a novel regressive kernel function for the model training. By doing so, our model produces a linear mapping from the training data space to the latent space, while effectively transforming the prediction of human motion in physical space to the linear regression analysis in the latent space equivalent. The comparison with two learning motion prediction approaches (the state-of-the-art meta learning and the classical LSTM-3LR) demonstrate that our GPLVM significantly improves the prediction performance on various of actions in the small-sample size regime.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.
{"title":"Neural Network-Based Post-Processing Filter on V-PCC Attribute Frames","authors":"Keiichiro TAKADA, Yasuaki TOKUMO, Tomohiro IKAI, Takeshi CHUJOH","doi":"10.1587/transinf.2023pcl0002","DOIUrl":"https://doi.org/10.1587/transinf.2023pcl0002","url":null,"abstract":"Video-based point cloud compression (V-PCC) utilizes video compression technology to efficiently encode dense point clouds providing state-of-the-art compression performance with a relatively small computation burden. V-PCC converts 3-dimensional point cloud data into three types of 2-dimensional frames, i.e., occupancy, geometry, and attribute frames, and encodes them via video compression. On the other hand, the quality of these frames may be degraded due to video compression. This paper proposes an adaptive neural network-based post-processing filter on attribute frames to alleviate the degradation problem. Furthermore, a novel training method using occupancy frames is studied. The experimental results show average BD-rate gains of 3.0%, 29.3% and 22.2% for Y, U and V respectively.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1587/transinf.2023edp7011
Shiling SHI, Stefan HOLST, Xiaoqing WEN
High power dissipation during scan test often causes undue yield loss, especially for low-power circuits. One major reason is that the resulting IR-drop in shift mode may corrupt test data. A common approach to solving this problem is partial-shift, in which multiple scan chains are formed and only one group of scan chains is shifted at a time. However, existing partial-shift based methods suffer from two major problems: (1) their IR-drop estimation is not accurate enough or computationally too expensive to be done for each shift cycle; (2) partial-shift is hence applied to all shift cycles, resulting in long test time. This paper addresses these two problems with a novel IR-drop-aware scan shift method, featuring: (1) Cycle-based IR-Drop Estimation (CIDE) supported by a GPU-accelerated dynamic power simulator to quickly find potential shift cycles with excessive peak IR-drop; (2) a scan shift scheduling method that generates a scan chain grouping targeted for each considered shift cycle to reduce the impact on test time. Experiments on ITC'99 benchmark circuits show that: (1) the CIDE is computationally feasible; (2) the proposed scan shift schedule can achieve a global peak IR-drop reduction of up to 47%. Its scheduling efficiency is 58.4% higher than that of an existing typical method on average, which means our method has less test time.
{"title":"GPU-Accelerated Estimation and Targeted Reduction of Peak IR-Drop during Scan Chain Shifting","authors":"Shiling SHI, Stefan HOLST, Xiaoqing WEN","doi":"10.1587/transinf.2023edp7011","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7011","url":null,"abstract":"High power dissipation during scan test often causes undue yield loss, especially for low-power circuits. One major reason is that the resulting IR-drop in shift mode may corrupt test data. A common approach to solving this problem is partial-shift, in which multiple scan chains are formed and only one group of scan chains is shifted at a time. However, existing partial-shift based methods suffer from two major problems: (1) their IR-drop estimation is not accurate enough or computationally too expensive to be done for each shift cycle; (2) partial-shift is hence applied to all shift cycles, resulting in long test time. This paper addresses these two problems with a novel IR-drop-aware scan shift method, featuring: (1) Cycle-based IR-Drop Estimation (CIDE) supported by a GPU-accelerated dynamic power simulator to quickly find potential shift cycles with excessive peak IR-drop; (2) a scan shift scheduling method that generates a scan chain grouping targeted for each considered shift cycle to reduce the impact on test time. Experiments on ITC'99 benchmark circuits show that: (1) the CIDE is computationally feasible; (2) the proposed scan shift schedule can achieve a global peak IR-drop reduction of up to 47%. Its scheduling efficiency is 58.4% higher than that of an existing typical method on average, which means our method has less test time.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Omni-directional images have been used in wide range of applications including virtual/augmented realities, self-driving cars, robotics simulators, and surveillance systems. For these applications, it would be useful to estimate saliency maps representing probability distributions of gazing points with a head-mounted display, to detect important regions in the omni-directional images. This paper proposes a novel saliency-map estimation model for the omni-directional images by extracting overlapping 2-dimensional (2D) plane images from omni-directional images at various directions and angles of view. While 2D saliency maps tend to have high probability at the center of images (center bias), the high-probability region appears at horizontal directions in omni-directional saliency maps when a head-mounted display is used (equator bias). Therefore, the 2D saliency model with a center-bias layer was fine-tuned with an omni-directional dataset by replacing the center-bias layer to an equator-bias layer conditioned on the elevation angle for the extraction of the 2D plane image. The limited availability of omni-directional images in saliency datasets can be compensated by using the well-established 2D saliency model pretrained by a large number of training images with the ground truth of 2D saliency maps. In addition, this paper proposes a multi-scale estimation method by extracting 2D images in multiple angles of view to detect objects of various sizes with variable receptive fields. The saliency maps estimated from the multiple angles of view were integrated by using pixel-wise attention weights calculated in an integration layer for weighting the optimal scale to each object. The proposed method was evaluated using a publicly available dataset with evaluation metrics for omni-directional saliency maps. It was confirmed that the accuracy of the saliency maps was improved by the proposed method.
{"title":"Multi-Scale Estimation for Omni-Directional Saliency Maps Using Learnable Equator Bias","authors":"Takao YAMANAKA, Tatsuya SUZUKI, Taiki NOBUTSUNE, Chenjunlin WU","doi":"10.1587/transinf.2023edp7055","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7055","url":null,"abstract":"Omni-directional images have been used in wide range of applications including virtual/augmented realities, self-driving cars, robotics simulators, and surveillance systems. For these applications, it would be useful to estimate saliency maps representing probability distributions of gazing points with a head-mounted display, to detect important regions in the omni-directional images. This paper proposes a novel saliency-map estimation model for the omni-directional images by extracting overlapping 2-dimensional (2D) plane images from omni-directional images at various directions and angles of view. While 2D saliency maps tend to have high probability at the center of images (center bias), the high-probability region appears at horizontal directions in omni-directional saliency maps when a head-mounted display is used (equator bias). Therefore, the 2D saliency model with a center-bias layer was fine-tuned with an omni-directional dataset by replacing the center-bias layer to an equator-bias layer conditioned on the elevation angle for the extraction of the 2D plane image. The limited availability of omni-directional images in saliency datasets can be compensated by using the well-established 2D saliency model pretrained by a large number of training images with the ground truth of 2D saliency maps. In addition, this paper proposes a multi-scale estimation method by extracting 2D images in multiple angles of view to detect objects of various sizes with variable receptive fields. The saliency maps estimated from the multiple angles of view were integrated by using pixel-wise attention weights calculated in an integration layer for weighting the optimal scale to each object. The proposed method was evaluated using a publicly available dataset with evaluation metrics for omni-directional saliency maps. It was confirmed that the accuracy of the saliency maps was improved by the proposed method.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-10-01DOI: 10.1587/transinf.2023edp7027
Yunqi MA, Satoshi FUJITA
Peer-to-peer (P2P) technology has gained popularity as a way to enhance system performance. Nodes in a P2P network work together by providing network resources to one another. In this study, we examine the use of P2P technology for video streaming and develop a distributed incentive mechanism to prevent free-riding. Our proposed solution combines WebTorrent and the Solana blockchain and can be accessed through a web browser. To incentivize uploads, some of the received video chunks are encrypted using AES. Smart contracts on the blockchain are used for third-party verification of uploads and for managing access to the video content. Experimental results on a test network showed that our system can encrypt and decrypt chunks in about 1/40th the time it takes using WebRTC, without affecting the quality of video streaming. Smart contracts were also found to quickly verify uploads in about 860 milliseconds. The paper also explores how to effectively reward virtual points for uploads.
{"title":"Decentralized Incentive Scheme for Peer-to-Peer Video Streaming using Solana Blockchain","authors":"Yunqi MA, Satoshi FUJITA","doi":"10.1587/transinf.2023edp7027","DOIUrl":"https://doi.org/10.1587/transinf.2023edp7027","url":null,"abstract":"Peer-to-peer (P2P) technology has gained popularity as a way to enhance system performance. Nodes in a P2P network work together by providing network resources to one another. In this study, we examine the use of P2P technology for video streaming and develop a distributed incentive mechanism to prevent free-riding. Our proposed solution combines WebTorrent and the Solana blockchain and can be accessed through a web browser. To incentivize uploads, some of the received video chunks are encrypted using AES. Smart contracts on the blockchain are used for third-party verification of uploads and for managing access to the video content. Experimental results on a test network showed that our system can encrypt and decrypt chunks in about 1/40th the time it takes using WebRTC, without affecting the quality of video streaming. Smart contracts were also found to quickly verify uploads in about 860 milliseconds. The paper also explores how to effectively reward virtual points for uploads.","PeriodicalId":55002,"journal":{"name":"IEICE Transactions on Information and Systems","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135372797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}