首页 > 最新文献

IEEE Transactions on Big Data最新文献

英文 中文
Enabling Homogeneous GNNs to Handle Heterogeneous Graphs via Relation Embedding 通过关系嵌入实现同构GNN处理异构图
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-08 DOI: 10.1109/TBDATA.2023.3313031
Junfu Wang;Yuanfang Guo;Liang Yang;Yunhong Wang
Graph Neural Networks (GNNs) have been generalized to process the heterogeneous graphs by various approaches. Unfortunately, these approaches usually model the heterogeneity via various complicated modules. This article aims to propose a simple yet effective framework to assign adequate ability to the homogeneous GNNs to handle the heterogeneous graphs. Specifically, we propose Relation Embedding based Graph Neural Network (RE-GNN), which employs only one parameter per relation to embed the importance of distinct types of relations and node-type-specific self-loop connections. To optimize these relation embeddings and the model parameters simultaneously, a gradient scaling factor is proposed to constrain the embeddings to converge to suitable values. Besides, we interpret the proposed RE-GNN from two perspectives, and theoretically demonstrate that our RE-GCN possesses more expressive power than GTN (which is a typical heterogeneous GNN, and it can generate meta-paths adaptively). Extensive experiments demonstrate that our RE-GNN can effectively and efficiently handle the heterogeneous graphs and can be applied to various homogeneous GNNs.
图神经网络(gnn)已被各种方法推广到处理异构图。不幸的是,这些方法通常通过各种复杂的模块对异构性进行建模。本文旨在提出一个简单而有效的框架,赋予同构gnn足够的能力来处理异构图。具体来说,我们提出了基于关系嵌入的图神经网络(RE-GNN),它只使用每个关系的一个参数来嵌入不同类型的关系和节点类型特定的自环连接的重要性。为了同时优化这些关系嵌入和模型参数,提出了一个梯度缩放因子来约束嵌入收敛到合适的值。此外,我们从两个角度对我们提出的RE-GNN进行了解释,并从理论上证明了我们的RE-GCN比GTN(典型的异构GNN,可以自适应生成元路径)具有更强的表达能力。大量的实验表明,我们的RE-GNN可以有效地处理异构图,并且可以应用于各种同质gnn。
{"title":"Enabling Homogeneous GNNs to Handle Heterogeneous Graphs via Relation Embedding","authors":"Junfu Wang;Yuanfang Guo;Liang Yang;Yunhong Wang","doi":"10.1109/TBDATA.2023.3313031","DOIUrl":"10.1109/TBDATA.2023.3313031","url":null,"abstract":"Graph Neural Networks (GNNs) have been generalized to process the heterogeneous graphs by various approaches. Unfortunately, these approaches usually model the heterogeneity via various complicated modules. This article aims to propose a simple yet effective framework to assign adequate ability to the homogeneous GNNs to handle the heterogeneous graphs. Specifically, we propose Relation Embedding based Graph Neural Network (RE-GNN), which employs only one parameter per relation to embed the importance of distinct types of relations and node-type-specific self-loop connections. To optimize these relation embeddings and the model parameters simultaneously, a gradient scaling factor is proposed to constrain the embeddings to converge to suitable values. Besides, we interpret the proposed RE-GNN from two perspectives, and theoretically demonstrate that our RE-GCN possesses more expressive power than GTN (which is a typical heterogeneous GNN, and it can generate meta-paths adaptively). Extensive experiments demonstrate that our RE-GNN can effectively and efficiently handle the heterogeneous graphs and can be applied to various homogeneous GNNs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1697-1710"},"PeriodicalIF":7.2,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44348282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts 基于协同专家的图分类长尾识别研究
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-07 DOI: 10.1109/TBDATA.2023.3313029
Si-Yu Yi;Zhengyang Mao;Wei Ju;Yong-Dao Zhou;Luchen Liu;Xiao Luo;Ming Zhang
Graph classification, aiming at learning the graph-level representations for effective class assignments, has received outstanding achievements, which heavily relies on high-quality datasets that have balanced class distribution. In fact, most real-world graph data naturally presents a long-tailed form, where the head classes occupy much more samples than the tail classes, it thus is essential to study the graph-level classification over long-tailed data while still remaining largely unexplored. However, most existing long-tailed learning methods in visions fail to jointly optimize the representation learning and classifier training, as well as neglect the mining of the hard-to-classify classes. Directly applying existing methods to graphs may lead to sub-optimal performance, since the model trained on graphs would be more sensitive to the long-tailed distribution due to the complex topological characteristics. Hence, in this paper, we propose a novel long-tailed graph-level classification framework via Collaborative Multi-expert Learning (CoMe) to tackle the problem. To equilibrate the contributions of head and tail classes, we first develop balanced contrastive learning from the view of representation learning, and then design an individual-expert classifier training based on hard class mining. In addition, we execute gated fusion and disentangled knowledge distillation among the multiple experts to promote the collaboration in a multi-expert framework. Comprehensive experiments are performed on seven widely-used benchmark datasets to demonstrate the superiority of our method CoMe over state-of-the-art baselines.
图分类以学习有效的类作业的图级表示为目标,在很大程度上依赖于类分布均衡的高质量数据集,已经取得了突出的成就。事实上,大多数现实世界的图数据自然呈现出长尾形式,其中头部类比尾部类占用更多的样本,因此研究长尾数据上的图级分类是必不可少的,同时仍有很大程度上未被探索。然而,现有的视觉长尾学习方法大多没有将表示学习和分类器训练结合起来进行优化,也忽略了对难分类类的挖掘。将现有方法直接应用于图可能会导致性能不佳,因为在图上训练的模型由于复杂的拓扑特征对长尾分布更加敏感。为此,本文提出了一种基于协同多专家学习(CoMe)的长尾图级分类框架。为了平衡头类和尾类的贡献,我们首先从表征学习的角度发展平衡对比学习,然后设计一个基于硬类挖掘的个体专家分类器训练。此外,我们还在多专家框架中进行了门控融合和解纠缠知识蒸馏,以促进多专家框架中的协作。在七个广泛使用的基准数据集上进行了全面的实验,以证明我们的方法优于最先进的基线。
{"title":"Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts","authors":"Si-Yu Yi;Zhengyang Mao;Wei Ju;Yong-Dao Zhou;Luchen Liu;Xiao Luo;Ming Zhang","doi":"10.1109/TBDATA.2023.3313029","DOIUrl":"https://doi.org/10.1109/TBDATA.2023.3313029","url":null,"abstract":"Graph classification, aiming at learning the graph-level representations for effective class assignments, has received outstanding achievements, which heavily relies on high-quality datasets that have balanced class distribution. In fact, most real-world graph data naturally presents a long-tailed form, where the head classes occupy much more samples than the tail classes, it thus is essential to study the graph-level classification over long-tailed data while still remaining largely unexplored. However, most existing long-tailed learning methods in visions fail to jointly optimize the representation learning and classifier training, as well as neglect the mining of the hard-to-classify classes. Directly applying existing methods to graphs may lead to sub-optimal performance, since the model trained on graphs would be more sensitive to the long-tailed distribution due to the complex topological characteristics. Hence, in this paper, we propose a novel long-tailed graph-level classification framework via \u0000<underline><b>Co</b></u>\u0000llaborative \u0000<underline><b>M</b></u>\u0000ulti-\u0000<underline><b>e</b></u>\u0000xpert Learning (CoMe) to tackle the problem. To equilibrate the contributions of head and tail classes, we first develop balanced contrastive learning from the view of representation learning, and then design an individual-expert classifier training based on hard class mining. In addition, we execute gated fusion and disentangled knowledge distillation among the multiple experts to promote the collaboration in a multi-expert framework. Comprehensive experiments are performed on seven widely-used benchmark datasets to demonstrate the superiority of our method CoMe over state-of-the-art baselines.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1683-1696"},"PeriodicalIF":7.2,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138138227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Seq2CASE: Weakly Supervised Sequence to Commentary Aspect Score Estimation for Recommendation Seq2CASE:弱监督序列对推荐的评论方面评分估计
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-09-07 DOI: 10.1109/TBDATA.2023.3313028
Chien-Tse Cheng;Yu-Hsun Lin;Chung-Shou Liao
Online users’ feedback has numerous text comments to enrich the review quality on mainstream platforms, such as Yelp and Google Maps. Reading through numerous review comments to speculate the important aspects is tedious and time-consuming. Apparently, there is a huge gap between the numerous commentary text and the crucial aspects for users’ preferences. In this study, we proposed a weakly supervised framework called Sequence to Commentary Aspect Score Estimation (Seq2CASE) to estimate the vital aspect scores from the review comments, since the ground truth of the aspect score is seldom available. The aspect score estimation from Seq2CASE is close to the actual aspect scoring; precisely, the average Mean Absolute Error (MAE) is less than 0.4 for a 5-point grading scale. The performance of Seq2CASE is comparable to or even better than the state-of-the-art supervised approaches in recommendation tasks. We expect this work to be a stepping stone that can inspire more unsupervised studies working on this important but relatively underexploited research.
在线用户的反馈有大量的文字评论,丰富了Yelp、谷歌Maps等主流平台的评论质量。通过阅读大量的评论来推测重要的方面是乏味和耗时的。显然,大量的评论文本与用户偏好的关键方面之间存在巨大差距。在本研究中,我们提出了一个弱监督框架,称为序列到评论方面分数估计(Seq2CASE),以从评论评论中估计重要方面分数,因为方面分数的基本真相很少可用。Seq2CASE的方面得分估计接近实际方面得分;准确地说,5分制评分的平均绝对误差(MAE)小于0.4。Seq2CASE的性能与推荐任务中最先进的监督方法相当,甚至更好。我们希望这项工作能够成为一个垫脚石,可以激发更多的无监督研究,致力于这一重要但相对未被充分利用的研究。
{"title":"Seq2CASE: Weakly Supervised Sequence to Commentary Aspect Score Estimation for Recommendation","authors":"Chien-Tse Cheng;Yu-Hsun Lin;Chung-Shou Liao","doi":"10.1109/TBDATA.2023.3313028","DOIUrl":"10.1109/TBDATA.2023.3313028","url":null,"abstract":"Online users’ feedback has numerous text comments to enrich the review quality on mainstream platforms, such as Yelp and Google Maps. Reading through numerous review comments to speculate the important aspects is tedious and time-consuming. Apparently, there is a huge gap between the numerous commentary text and the crucial aspects for users’ preferences. In this study, we proposed a weakly supervised framework called Sequence to Commentary Aspect Score Estimation (Seq2CASE) to estimate the vital aspect scores from the review comments, since the ground truth of the aspect score is seldom available. The aspect score estimation from Seq2CASE is close to the actual aspect scoring; precisely, the average Mean Absolute Error (MAE) is less than 0.4 for a 5-point grading scale. The performance of Seq2CASE is comparable to or even better than the state-of-the-art supervised approaches in recommendation tasks. We expect this work to be a stepping stone that can inspire more unsupervised studies working on this important but relatively underexploited research.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1670-1682"},"PeriodicalIF":7.2,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Streaming Local Community Detection Through Approximate Conductance 通过近似电导率进行流式本地群落检测
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-31 DOI: 10.1109/TBDATA.2023.3310251
Meng Wang;Yanhao Yang;David Bindel;Kun He
Community is a universal structure in various complex networks, and community detection is a fundamental task for network analysis. With the rapid growth of network scale, networks are massive, changing rapidly, and could naturally be modeled as graph streams. Due to the limited memory and access constraint in graph streams, existing non-streaming community detection methods are no longer applicable. This raises an emerging need for online approaches. In this work, we consider the problem of uncovering the local community containing a few query nodes in graph streams, termed streaming local community detection. This new problem raised recently is more challenging for community detection, and only a few works address this online setting. Correspondingly, we design an online single-pass streaming local community detection approach. Inspired by the local property of communities, our method samples the local structure around the query nodes in graph streams and extracts the target community on the sampled subgraph using our proposed metric called approximate conductance. Comprehensive experiments show that our method remarkably outperforms the streaming baseline on both effectiveness and efficiency, and even achieves similar accuracy compared to the state-of-the-art non-streaming local community detection methods that use static and complete graphs.
社群是各种复杂网络中的一种普遍结构,社群检测是网络分析的一项基本任务。随着网络规模的快速增长,网络规模庞大、变化迅速,自然可以被建模为图流。由于图流的内存和访问限制有限,现有的非流式社群检测方法已不再适用。这就提出了对在线方法的新需求。在这项工作中,我们考虑的问题是发现图流中包含几个查询节点的本地社区,即流本地社区检测。最近提出的这一新问题对社区检测来说更具挑战性,只有少数作品涉及这一在线设置。因此,我们设计了一种在线单程流本地社区检测方法。受社群局部属性的启发,我们的方法对图流中查询节点周围的局部结构进行采样,并使用我们提出的近似传导率指标在采样子图上提取目标社群。综合实验表明,我们的方法在效果和效率上都明显优于流式基线方法,甚至与使用静态和完整图的最先进非流式本地社区检测方法相比,也达到了类似的精度。
{"title":"Streaming Local Community Detection Through Approximate Conductance","authors":"Meng Wang;Yanhao Yang;David Bindel;Kun He","doi":"10.1109/TBDATA.2023.3310251","DOIUrl":"10.1109/TBDATA.2023.3310251","url":null,"abstract":"Community is a universal structure in various complex networks, and community detection is a fundamental task for network analysis. With the rapid growth of network scale, networks are massive, changing rapidly, and could naturally be modeled as graph streams. Due to the limited memory and access constraint in graph streams, existing non-streaming community detection methods are no longer applicable. This raises an emerging need for online approaches. In this work, we consider the problem of uncovering the local community containing a few query nodes in graph streams, termed streaming local community detection. This new problem raised recently is more challenging for community detection, and only a few works address this online setting. Correspondingly, we design an online single-pass streaming local community detection approach. Inspired by the local property of communities, our method samples the local structure around the query nodes in graph streams and extracts the target community on the sampled subgraph using our proposed metric called approximate conductance. Comprehensive experiments show that our method remarkably outperforms the streaming baseline on both effectiveness and efficiency, and even achieves similar accuracy compared to the state-of-the-art non-streaming local community detection methods that use static and complete graphs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 1","pages":"12-22"},"PeriodicalIF":7.2,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89772759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transfer Learning With Document-Level Data Augmentation for Aspect-Level Sentiment Classification 面向方面级情感分类的文档级数据增强迁移学习
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-30 DOI: 10.1109/TBDATA.2023.3310267
Xiaosai Huang;Jing Li;Jia Wu;Jun Chang;Donghua Liu
Aspect-level sentiment classification (ASC) seeks to reveal the emotional tendency of a designated aspect of a text. Some researchers have recently tried to exploit large amounts of document-level sentiment classification (DSC) data available to help improve the performance of ASC models through transfer learning. However, these studies often ignore the difference in sentiment distribution between document-level and aspect-level data without preprocessing the document-level knowledge. Our study provides a transfer learning with document-level data augmentation (TL-DDA) framework to transfer more accurate document-level knowledge to the ASC model by means of document-level data augmentation and attention fusion. First, we use document data selection and text concatenation to produce document-level data with various sentiment distributions. The augmented document data is then utilized for pre-training a well-designed DSC model. Finally, after attention adjustment, we fuse the word attention obtained from this DSC model into the ASC model. Results of experiments utilizing two publicly available datasets suggest that TL-DDA is reliable.
方面级情感分类(ASC)旨在揭示文本中指定方面的情感倾向。最近,一些研究人员试图利用大量的文档级情感分类(DSC)数据,通过迁移学习来帮助提高ASC模型的性能。然而,这些研究往往忽略了文档级和方面级数据之间情感分布的差异,没有对文档级知识进行预处理。我们的研究提供了一个具有文档级数据增强(TL-DDA)框架的迁移学习,通过文档级数据增强和注意力融合将更准确的文档级知识转移到ASC模型中。首先,我们使用文档数据选择和文本连接来生成具有各种情感分布的文档级数据。然后利用增强的文档数据对设计良好的DSC模型进行预训练。最后,经过注意调整,我们将DSC模型得到的单词注意融合到ASC模型中。利用两个公开数据集的实验结果表明,TL-DDA是可靠的。
{"title":"Transfer Learning With Document-Level Data Augmentation for Aspect-Level Sentiment Classification","authors":"Xiaosai Huang;Jing Li;Jia Wu;Jun Chang;Donghua Liu","doi":"10.1109/TBDATA.2023.3310267","DOIUrl":"10.1109/TBDATA.2023.3310267","url":null,"abstract":"Aspect-level sentiment classification (ASC) seeks to reveal the emotional tendency of a designated aspect of a text. Some researchers have recently tried to exploit large amounts of document-level sentiment classification (DSC) data available to help improve the performance of ASC models through transfer learning. However, these studies often ignore the difference in sentiment distribution between document-level and aspect-level data without preprocessing the document-level knowledge. Our study provides a transfer learning with document-level data augmentation (TL-DDA) framework to transfer more accurate document-level knowledge to the ASC model by means of \u0000<italic>document-level data augmentation</i>\u0000 and \u0000<italic>attention fusion</i>\u0000. First, we use \u0000<italic>document data selection</i>\u0000 and \u0000<italic>text concatenation</i>\u0000 to produce document-level data with various sentiment distributions. The augmented document data is then utilized for pre-training a well-designed DSC model. Finally, after \u0000<italic>attention adjustment</i>\u0000, we \u0000<italic>fuse the word attention</i>\u0000 obtained from this DSC model into the ASC model. Results of experiments utilizing two publicly available datasets suggest that TL-DDA is reliable.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1643-1657"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TS-RTPM-Net: Data-Driven Tensor Sketching for Efficient CP Decomposition TS-RTPM-Net:数据驱动张量素描,实现高效 CP 分解
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-30 DOI: 10.1109/TBDATA.2023.3310254
Xingyu Cao;Xiangtao Zhang;Ce Zhu;Jiani Liu;Yipeng Liu
Tensor decomposition is widely used in feature extraction, data analysis, and other fields. As a means of tensor decomposition, the robust tensor power method based on tensor sketch (TS-RTPM) can quickly mine the potential features of tensor, but in some cases, its approximation performance is limited. In this paper, we propose a data-driven framework called TS-RTPM-Net, which improves the estimation accuracy of TS-RTPM by jointly training the TS value matrices with the RTPM initial matrices. It also uses two greedy initialization algorithms to optimize the TS location matrices. In addition, TS-RTPM-Net accelerates TS-RTPM by using fast power iteration modules. Comparative experiments on real-world datasets verify that TS-RTPM-Net outperforms TS-RTPM in terms of estimation accuracy, running speed, and memory consumption.
张量分解被广泛应用于特征提取、数据分析等领域。作为张量分解的一种手段,基于张量素描的鲁棒张量幂方法(TS-RTPM)能快速挖掘张量的潜在特征,但在某些情况下,其近似性能有限。本文提出了一种名为 TS-RTPM-Net 的数据驱动框架,它通过联合训练 TS 值矩阵和 RTPM 初始矩阵来提高 TS-RTPM 的估计精度。它还使用两种贪婪初始化算法来优化 TS 位置矩阵。此外,TS-RTPM-Net 还通过使用快速幂迭代模块来加速 TS-RTPM。实际数据集的对比实验验证了 TS-RTPM-Net 在估计精度、运行速度和内存消耗方面都优于 TS-RTPM。
{"title":"TS-RTPM-Net: Data-Driven Tensor Sketching for Efficient CP Decomposition","authors":"Xingyu Cao;Xiangtao Zhang;Ce Zhu;Jiani Liu;Yipeng Liu","doi":"10.1109/TBDATA.2023.3310254","DOIUrl":"10.1109/TBDATA.2023.3310254","url":null,"abstract":"Tensor decomposition is widely used in feature extraction, data analysis, and other fields. As a means of tensor decomposition, the robust tensor power method based on tensor sketch (TS-RTPM) can quickly mine the potential features of tensor, but in some cases, its approximation performance is limited. In this paper, we propose a data-driven framework called TS-RTPM-Net, which improves the estimation accuracy of TS-RTPM by jointly training the TS value matrices with the RTPM initial matrices. It also uses two greedy initialization algorithms to optimize the TS location matrices. In addition, TS-RTPM-Net accelerates TS-RTPM by using fast power iteration modules. Comparative experiments on real-world datasets verify that TS-RTPM-Net outperforms TS-RTPM in terms of estimation accuracy, running speed, and memory consumption.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 1","pages":"1-11"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Box Embeddings for Fine-Grained Entity Typing 改进的细粒度实体类型的盒嵌入
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-30 DOI: 10.1109/TBDATA.2023.3310239
Yixiu Qin;Yizhao Wang;Jiawei Li;Shun Mao;He Wang;Yuncheng Jiang
Different from traditional vector-based fine-grained entity typing methods, the box-based method is more effective in capturing the complex relationships between entity mentions and entity types. The box-based fine-grained entity typing method projects entity types and entity mentions into high-dimensional box space, where entity types and entity mentions are embedded as d-dimensional hyperrectangles. However, the impacts of entity types are not considered during classification in high-dimensional box space, and the model cannot be optimized precisely when two boxes are completely separated or overlapped in high-dimensional box space. Based on the above shortcomings, an Improved Box Embeddings (IBE) method for fine-grained entity typing is proposed in this work. The IBE not only introduces the impacts of entity types during classification in high-dimensional box space, but also proposes a distance based module to optimize the model precisely when two boxes are completely separated or overlapped in high-dimensional box space. Experimental results on four fine-grained entity typing datasets verify the effectiveness of the proposed IBE, demonstrating that IBE is a state-of-the-art method for fine-grained entity typing.
与传统的基于矢量的细粒度实体类型方法不同,基于框的方法在捕获实体提及和实体类型之间的复杂关系方面更有效。基于盒的细粒度实体类型方法将实体类型和实体提及投射到高维盒空间中,其中实体类型和实体提及被嵌入为d维超矩形。然而,在高维盒空间中,分类时没有考虑实体类型的影响,当两个盒子在高维盒空间中完全分离或重叠时,无法精确优化模型。基于上述不足,本文提出了一种改进的细粒度实体分类盒嵌入方法。IBE不仅引入了实体类型对高维盒空间分类的影响,而且提出了一个基于距离的模块,在高维盒空间中,当两个盒完全分离或重叠时精确优化模型。在四个细粒度实体类型数据集上的实验结果验证了所提出的IBE的有效性,表明IBE是一种最先进的细粒度实体类型方法。
{"title":"Improved Box Embeddings for Fine-Grained Entity Typing","authors":"Yixiu Qin;Yizhao Wang;Jiawei Li;Shun Mao;He Wang;Yuncheng Jiang","doi":"10.1109/TBDATA.2023.3310239","DOIUrl":"10.1109/TBDATA.2023.3310239","url":null,"abstract":"Different from traditional vector-based fine-grained entity typing methods, the box-based method is more effective in capturing the complex relationships between entity mentions and entity types. The box-based fine-grained entity typing method projects entity types and entity mentions into high-dimensional box space, where entity types and entity mentions are embedded as \u0000<italic>d</i>\u0000-dimensional hyperrectangles. However, the impacts of entity types are not considered during classification in high-dimensional box space, and the model cannot be optimized precisely when two boxes are completely separated or overlapped in high-dimensional box space. Based on the above shortcomings, an \u0000<bold>I</b>\u0000mproved \u0000<bold>B</b>\u0000ox \u0000<bold>E</b>\u0000mbeddings (IBE) method for fine-grained entity typing is proposed in this work. The IBE not only introduces the impacts of entity types during classification in high-dimensional box space, but also proposes a distance based module to optimize the model precisely when two boxes are completely separated or overlapped in high-dimensional box space. Experimental results on four fine-grained entity typing datasets verify the effectiveness of the proposed IBE, demonstrating that IBE is a state-of-the-art method for fine-grained entity typing.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1631-1642"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PredLife: Predicting Fine-Grained Future Activity Patterns PredLife:预测细粒度的未来活动模式
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-30 DOI: 10.1109/TBDATA.2023.3310241
Wenjing Li;Xiaodan Shi;Dou Huang;Xudong Shen;Jinyu Chen;Hill Hiroki Kobayashi;Haoran Zhang;Xuan Song;Ryosuke Shibasaki
Activity pattern prediction is a critical part of urban computing, urban planning, intelligent transportation, and so on. Based on a dataset with more than 10 million GPS trajectory records collected by mobile sensors, this research proposed a CNN-BiLSTM-VAE-ATT-based encoder-decoder model for fine-grained individual activity sequence prediction. The model combines the long-term and short-term dependencies crosswise and also considers randomness, diversity, and uncertainty of individual activity patterns. The proposed results show higher accuracy compared to the ten baselines. The model can generate high diversity results while approximating the original activity patterns distribution. Moreover, the model also has interpretability in revealing the time dependency importance of the activity pattern prediction.
活动模式预测是城市计算、城市规划、智能交通等领域的重要组成部分。基于移动传感器收集的1000多万条GPS轨迹数据集,提出了一种基于cnn - bilstm - vae - at的编码器-解码器模型,用于细粒度个体活动序列预测。该模型横向结合了长期和短期依赖关系,并考虑了个体活动模式的随机性、多样性和不确定性。与10个基线相比,所提出的结果具有更高的精度。该模型在接近原始活动模式分布的情况下,可以得到较高的多样性结果。此外,该模型在揭示活动模式预测的时间依赖性重要性方面也具有可解释性。
{"title":"PredLife: Predicting Fine-Grained Future Activity Patterns","authors":"Wenjing Li;Xiaodan Shi;Dou Huang;Xudong Shen;Jinyu Chen;Hill Hiroki Kobayashi;Haoran Zhang;Xuan Song;Ryosuke Shibasaki","doi":"10.1109/TBDATA.2023.3310241","DOIUrl":"10.1109/TBDATA.2023.3310241","url":null,"abstract":"Activity pattern prediction is a critical part of urban computing, urban planning, intelligent transportation, and so on. Based on a dataset with more than 10 million GPS trajectory records collected by mobile sensors, this research proposed a CNN-BiLSTM-VAE-ATT-based encoder-decoder model for fine-grained individual activity sequence prediction. The model combines the long-term and short-term dependencies crosswise and also considers randomness, diversity, and uncertainty of individual activity patterns. The proposed results show higher accuracy compared to the ten baselines. The model can generate high diversity results while approximating the original activity patterns distribution. Moreover, the model also has interpretability in revealing the time dependency importance of the activity pattern prediction.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1658-1669"},"PeriodicalIF":7.2,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cosine Multilinear Principal Component Analysis for Recognition 余弦多线性主成分分析识别
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-02 DOI: 10.1109/TBDATA.2023.3301389
Feng Han;Chengcai Leng;Bing Li;Anup Basu;Licheng Jiao
Existing two-dimensional principal component analysis methods can only handle second-order tensors (i.e., matrices). However, with the advancement of technology, tensors of order three and higher are gradually increasing. This brings new challenges to dimensionality reduction. Thus, a multilinear method called MPCA was proposed. Although MPCA can be applied to all tensors, using the square of the F-norm makes it very sensitive to outliers. Several two-dimensional methods, such as Angle 2DPCA, have good robustness but cannot be applied to all tensors. We extend the robust Angle 2DPCA method to a multilinear method and propose Cosine Multilinear Principal Component Analysis (CosMPCA) for tensor representation. Our CosMPCA method considers the relationship between the reconstruction error and projection scatter and selects the cosine metric. In addition, our method naturally uses the F-norm to reduce the impact of outliers. We introduce an iterative algorithm to solve CosMPCA. We provide detailed theoretical analysis in both the proposed method and the analysis of the algorithm. Experiments show that our method is robust to outliers and is suitable for tensors of any order.
现有的二维主成分分析方法只能处理二阶张量(即矩阵)。然而,随着技术的进步,三阶及以上的张量逐渐增加。这给降维带来了新的挑战。因此,提出了一种称为MPCA的多线性方法。尽管MPCA可以应用于所有张量,但使用f范数的平方使其对异常值非常敏感。一些二维方法,如角2DPCA,具有良好的鲁棒性,但不能适用于所有张量。我们将鲁棒角2DPCA方法扩展到多线性方法,并提出了余弦多线性主成分分析(CosMPCA)用于张量表示。我们的CosMPCA方法考虑了重建误差与投影散射之间的关系,并选择了余弦度量。此外,我们的方法自然地使用f范数来减少异常值的影响。介绍了一种求解CosMPCA的迭代算法。我们对所提出的方法和算法进行了详细的理论分析。实验表明,该方法对异常值具有较强的鲁棒性,适用于任意阶张量。
{"title":"Cosine Multilinear Principal Component Analysis for Recognition","authors":"Feng Han;Chengcai Leng;Bing Li;Anup Basu;Licheng Jiao","doi":"10.1109/TBDATA.2023.3301389","DOIUrl":"10.1109/TBDATA.2023.3301389","url":null,"abstract":"Existing two-dimensional principal component analysis methods can only handle second-order tensors (i.e., matrices). However, with the advancement of technology, tensors of order three and higher are gradually increasing. This brings new challenges to dimensionality reduction. Thus, a multilinear method called MPCA was proposed. Although MPCA can be applied to all tensors, using the square of the F-norm makes it very sensitive to outliers. Several two-dimensional methods, such as Angle 2DPCA, have good robustness but cannot be applied to all tensors. We extend the robust Angle 2DPCA method to a multilinear method and propose Cosine Multilinear Principal Component Analysis (CosMPCA) for tensor representation. Our CosMPCA method considers the relationship between the reconstruction error and projection scatter and selects the cosine metric. In addition, our method naturally uses the F-norm to reduce the impact of outliers. We introduce an iterative algorithm to solve CosMPCA. We provide detailed theoretical analysis in both the proposed method and the analysis of the algorithm. Experiments show that our method is robust to outliers and is suitable for tensors of any order.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1620-1630"},"PeriodicalIF":7.2,"publicationDate":"2023-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning 大规模学习的自适应强力球随机共轭梯度
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2023-08-01 DOI: 10.1109/TBDATA.2023.3300546
Zhuang Yang
The extreme success of stochastic optimization (SO) in large-scale machine learning problems, information retrieval, bioinformatics, etc., has been widely reported, especially in recent years. As an effective tactic, conjugate gradient (CG) has been gaining its popularity in accelerating SO algorithms. This paper develops a novel type of stochastic conjugate gradient descent (SCG) algorithms from the perspective of the Powerball strategy and the hypergradient descent (HD) technique. The crucial idea behind the resulting methods is inspired by pursuing the equilibrium of ordinary differential equations (ODEs). We elucidate the effect of the Powerball strategy in SCG algorithms. The introduction of HD, on the other side, makes the resulting methods work with an online learning rate. Meanwhile, we provide a comprehension of the theoretical results for the resulting algorithms under non-convex assumptions. As a byproduct, we bridge the gap between the learning rate and powered stochastic optimization (PSO) algorithms, which is still an open problem. Resorting to numerical experiments on numerous benchmark datasets, we test the parameter sensitivity of the proposed methods and demonstrate the superior performance of our new algorithms over state-of-the-art algorithms.
随机优化(SO)在大规模机器学习问题、信息检索、生物信息学等领域的巨大成功已经被广泛报道,尤其是近年来。共轭梯度(CG)作为一种有效的策略,在加速SO算法中得到了广泛的应用。从强力球策略和超梯度下降技术的角度出发,提出了一种新的随机共轭梯度下降(SCG)算法。结果方法背后的关键思想是由追求常微分方程(ode)的平衡所启发的。我们阐明了强力球策略在SCG算法中的作用。另一方面,HD的引入使最终的方法与在线学习率一起工作。同时,我们提供了在非凸假设下所得算法的理论结果的理解。作为一个副产品,我们弥合了学习率和动力随机优化(PSO)算法之间的差距,这仍然是一个悬而未决的问题。通过在众多基准数据集上进行数值实验,我们测试了所提出方法的参数敏感性,并证明了我们的新算法比最先进的算法具有优越的性能。
{"title":"Adaptive Powerball Stochastic Conjugate Gradient for Large-Scale Learning","authors":"Zhuang Yang","doi":"10.1109/TBDATA.2023.3300546","DOIUrl":"10.1109/TBDATA.2023.3300546","url":null,"abstract":"The extreme success of stochastic optimization (SO) in large-scale machine learning problems, information retrieval, bioinformatics, etc., has been widely reported, especially in recent years. As an effective tactic, conjugate gradient (CG) has been gaining its popularity in accelerating SO algorithms. This paper develops a novel type of stochastic conjugate gradient descent (SCG) algorithms from the perspective of the Powerball strategy and the hypergradient descent (HD) technique. The crucial idea behind the resulting methods is inspired by pursuing the equilibrium of ordinary differential equations (ODEs). We elucidate the effect of the Powerball strategy in SCG algorithms. The introduction of HD, on the other side, makes the resulting methods work with an online learning rate. Meanwhile, we provide a comprehension of the theoretical results for the resulting algorithms under non-convex assumptions. As a byproduct, we bridge the gap between the learning rate and powered stochastic optimization (PSO) algorithms, which is still an open problem. Resorting to numerical experiments on numerous benchmark datasets, we test the parameter sensitivity of the proposed methods and demonstrate the superior performance of our new algorithms over state-of-the-art algorithms.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1598-1606"},"PeriodicalIF":7.2,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62972533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
IEEE Transactions on Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1