首页 > 最新文献

IEEE Transactions on Big Data最新文献

英文 中文
Natural Language Processing for Arabic Sentiment Analysis: A Systematic Literature Review 用于阿拉伯语情感分析的自然语言处理:系统性文献综述
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-14 DOI: 10.1109/TBDATA.2024.3366083
Souha Al Katat;Chamseddine Zaki;Hussein Hazimeh;Ibrahim El Bitar;Rafael Angarita;Lionel Trojman
Sentiment analysis involves using computational methods to identify and classify opinions expressed in text, with the goal of determining whether the writer's stance towards a particular topic, product, or idea is positive, negative, or neutral. However, sentiment analysis in Arabic presents unique challenges due to the complexity of Arabic morphology and the variety of dialects, which make language classification even more difficult. To address these challenges, we conducted to investigation and overview the techniques used in the last five years for embedding and classification of Arabic sentiment analysis (ASA). We collected data from 100 publications, resulting in a representative dataset of 2,300 detailed records that included attributes related to the dataset, feature extraction, approach, parameters, and performance measures. Our study aimed to identify the most powerful approaches and best model settings by analyzing the collected data to identify the significant parameters influencing performance. The results showed that Deep Learning and Machine Learning were the most commonly used techniques, followed by lexicon and transformer-based techniques. However, Deep Learning models were found to be more accurate for sentiment classification than other Machine Learning models. Furthermore, multi-level embedding was found to be a significant step in improving model accuracy.
情感分析包括使用计算方法对文本中表达的观点进行识别和分类,目的是确定作者对特定主题、产品或想法的立场是积极、消极还是中立。然而,由于阿拉伯语形态的复杂性和方言的多样性,阿拉伯语的情感分析面临着独特的挑战,这使得语言分类变得更加困难。为了应对这些挑战,我们对过去五年中用于阿拉伯语情感分析(ASA)嵌入和分类的技术进行了调查和概述。我们从 100 篇出版物中收集了数据,形成了一个包含 2,300 条详细记录的代表性数据集,其中包括与数据集、特征提取、方法、参数和性能指标相关的属性。我们的研究旨在通过分析收集到的数据,找出影响性能的重要参数,从而确定最强大的方法和最佳模型设置。结果显示,深度学习和机器学习是最常用的技术,其次是基于词典和转换器的技术。然而,与其他机器学习模型相比,深度学习模型在情感分类方面更为准确。此外,多层次嵌入被认为是提高模型准确性的一个重要步骤。
{"title":"Natural Language Processing for Arabic Sentiment Analysis: A Systematic Literature Review","authors":"Souha Al Katat;Chamseddine Zaki;Hussein Hazimeh;Ibrahim El Bitar;Rafael Angarita;Lionel Trojman","doi":"10.1109/TBDATA.2024.3366083","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3366083","url":null,"abstract":"Sentiment analysis involves using computational methods to identify and classify opinions expressed in text, with the goal of determining whether the writer's stance towards a particular topic, product, or idea is positive, negative, or neutral. However, sentiment analysis in Arabic presents unique challenges due to the complexity of Arabic morphology and the variety of dialects, which make language classification even more difficult. To address these challenges, we conducted to investigation and overview the techniques used in the last five years for embedding and classification of Arabic sentiment analysis (ASA). We collected data from 100 publications, resulting in a representative dataset of 2,300 detailed records that included attributes related to the dataset, feature extraction, approach, parameters, and performance measures. Our study aimed to identify the most powerful approaches and best model settings by analyzing the collected data to identify the significant parameters influencing performance. The results showed that Deep Learning and Machine Learning were the most commonly used techniques, followed by lexicon and transformer-based techniques. However, Deep Learning models were found to be more accurate for sentiment classification than other Machine Learning models. Furthermore, multi-level embedding was found to be a significant step in improving model accuracy.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 5","pages":"576-594"},"PeriodicalIF":7.5,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using App Usage Data From Mobile Devices to Improve Activity-Based Travel Demand Models 利用移动设备的应用使用数据改进基于活动的旅行需求模型
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-14 DOI: 10.1109/TBDATA.2024.3366088
Ana Belén Rodríguez González;Javier Burrieza-Galán;Juan José Vinagre Díaz;Inés Peirats de Castro;Mark Richard Wilby;Oliva Garcia Cantú-Ros
In the last years we have seen several studies showing the potential of mobile network data to reconstruct activity and mobility patterns of the population. These data sources allow continuous monitoring of the population with a higher degree of spatial and temporal resolution and at a lower cost compared with traditional methods. However, for certain applications, the spatial resolution of these data sources is still not enough since it typically provides a spatial resolution of hundreds of meters in urban areas and of few kilometers in rural areas. In this article, we fill this gap by proposing a methodology that utilises GPS data from the usage of different applications in mobile devices. This approach improves the spatial precision in the location of activities, previously identified with the mobile network data.
在过去的几年中,我们看到了一些研究显示移动网络数据在重建人口活动和流动模式方面的潜力。与传统方法相比,这些数据源能够以更高的时空分辨率和更低的成本对人口进行连续监测。然而,在某些应用中,这些数据源的空间分辨率仍然不够,因为在城市地区,其空间分辨率通常只有几百米,而在农村地区则只有几公里。在本文中,我们提出了一种利用移动设备中不同应用的 GPS 数据的方法,从而填补了这一空白。这种方法提高了先前通过移动网络数据确定的活动位置的空间精度。
{"title":"Using App Usage Data From Mobile Devices to Improve Activity-Based Travel Demand Models","authors":"Ana Belén Rodríguez González;Javier Burrieza-Galán;Juan José Vinagre Díaz;Inés Peirats de Castro;Mark Richard Wilby;Oliva Garcia Cantú-Ros","doi":"10.1109/TBDATA.2024.3366088","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3366088","url":null,"abstract":"In the last years we have seen several studies showing the potential of mobile network data to reconstruct activity and mobility patterns of the population. These data sources allow continuous monitoring of the population with a higher degree of spatial and temporal resolution and at a lower cost compared with traditional methods. However, for certain applications, the spatial resolution of these data sources is still not enough since it typically provides a spatial resolution of hundreds of meters in urban areas and of few kilometers in rural areas. In this article, we fill this gap by proposing a methodology that utilises GPS data from the usage of different applications in mobile devices. This approach improves the spatial precision in the location of activities, previously identified with the mobile network data.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 5","pages":"633-643"},"PeriodicalIF":7.5,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10436340","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HGN2T: A Simple but Plug-and-Play Framework Extending HGNNs on Heterogeneous Temporal Graphs HGN2T:在异构时态图上扩展 HGNN 的简单但即插即用的框架
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-14 DOI: 10.1109/TBDATA.2024.3366085
Huan Liu;Pengfei Jiao;Xuan Guo;Huaming Wu;Mengzhou Gao;Jilin Zhang
Heterogeneous graphs (HGs) with multiple entity and relation types are common in real-world networks. Heterogeneous graph neural networks (HGNNs) have shown promise for learning HG representations. However, most HGNNs are designed for static HGs and are not compatible with heterogeneous temporal graphs (HTGs). A few existing works have focused on HTG representation learning but they care more about how to capture the dynamic evolutions and less about their compatibility with those well-designed static HGNNs. They also handle graph structure and temporal dependency learning separately, ignoring that HTG evolutions are influenced by both nodes and relationships. To address this, we propose HGN2T, a simple and general framework that makes static HGNNs compatible with HTGs. HGN2T is plug-and-play, enabling static HGNNs to leverage their graph structure learning strengths. To capture the relationship-influenced evolutions, we design a special mechanism coupling both the HGNN and sequential model. Finally, through joint optimization by both detection and prediction tasks, the learned representations can fully capture temporal dependencies from historical information. We conduct several empirical evaluation tasks, and the results show our HGN2T can adapt static HGNNs to HTGs and overperform existing methods for HTGs.
在现实世界的网络中,具有多种实体和关系类型的异构图(HGs)很常见。异构图神经网络(HGNN)已显示出学习 HG 表示的前景。然而,大多数 HGNN 都是针对静态 HG 设计的,与异构时态图(HTG)不兼容。现有的一些研究侧重于 HTG 表示学习,但它们更关注如何捕捉动态演化,而较少关注与那些设计良好的静态 HGNN 的兼容性。它们还将图结构和时间依赖性学习分开处理,忽略了 HTG 演变同时受到节点和关系的影响。为了解决这个问题,我们提出了 HGN2T,这是一个简单而通用的框架,能让静态 HGNN 与 HTG 兼容。HGN2T 即插即用,能让静态 HGNN 充分利用其图结构学习优势。为了捕捉受关系影响的演化,我们设计了一种特殊机制,将 HGNN 和序列模型结合起来。最后,通过检测和预测任务的联合优化,学习到的表征可以从历史信息中充分捕捉时间依赖性。我们进行了几项实证评估任务,结果表明我们的 HGN2T 可以使静态 HGNN 适应 HTG,并优于现有的 HTG 方法。
{"title":"HGN2T: A Simple but Plug-and-Play Framework Extending HGNNs on Heterogeneous Temporal Graphs","authors":"Huan Liu;Pengfei Jiao;Xuan Guo;Huaming Wu;Mengzhou Gao;Jilin Zhang","doi":"10.1109/TBDATA.2024.3366085","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3366085","url":null,"abstract":"Heterogeneous graphs (HGs) with multiple entity and relation types are common in real-world networks. Heterogeneous graph neural networks (HGNNs) have shown promise for learning HG representations. However, most HGNNs are designed for static HGs and are not compatible with heterogeneous temporal graphs (HTGs). A few existing works have focused on HTG representation learning but they care more about how to capture the dynamic evolutions and less about their compatibility with those well-designed static HGNNs. They also handle graph structure and temporal dependency learning separately, ignoring that HTG evolutions are influenced by both nodes and relationships. To address this, we propose HGN2T, a simple and general framework that makes static HGNNs compatible with HTGs. HGN2T is plug-and-play, enabling static HGNNs to leverage their graph structure learning strengths. To capture the relationship-influenced evolutions, we design a special mechanism coupling both the HGNN and sequential model. Finally, through joint optimization by both detection and prediction tasks, the learned representations can fully capture temporal dependencies from historical information. We conduct several empirical evaluation tasks, and the results show our HGN2T can adapt static HGNNs to HTGs and overperform existing methods for HTGs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 5","pages":"620-632"},"PeriodicalIF":7.5,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130221","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Cross-View Subspace Clustering via Adaptive Contrastive Learning 通过自适应对比学习进行无监督跨视图子空间聚类
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-14 DOI: 10.1109/TBDATA.2024.3366084
Zihao Zhang;Qianqian Wang;Quanxue Gao;Chengquan Pei;Wei Feng
Cross-view subspace clustering has become a popular unsupervised method for cross-view data analysis because it can extract both the consistent and complementary features of data for different views. Nonetheless, existing methods usually ignore the discriminative features due to a lack of label supervision, which limits its further improvement in clustering performance. To address this issue, we design a novel model that leverages the self-supervision information embedded in the data itself by combining contrastive learning and self-expression learning, i.e., unsupervised cross-view subspace clustering via adaptive contrastive learning (CVCL). Specifically, CVCL employs an encoder to learn a latent subspace from the cross-view data and convert it to a consistent subspace with a self-expression layer. In this way, contrastive learning helps to provide more discriminative features for the self-expression learning layer, and the self-expression learning layer in turn supervises contrastive learning. Besides, CVCL adaptively chooses positive and negative samples for contrastive learning to reduce the noisy impact of improper negative sample pairs. Ultimately, the decoder is designed for reconstruction tasks, operating on the output of the self-expressive layer, and strives to faithfully restore the original data as much as possible, ensuring that the encoded features are potentially effective. Extensive experiments conducted across multiple cross-view datasets showcase the exceptional performance and superiority of our model.
由于跨视角子空间聚类可以提取不同视角数据的一致性和互补性特征,因此已成为跨视角数据分析中一种流行的无监督方法。然而,由于缺乏标签监督,现有方法通常会忽略判别特征,从而限制了聚类性能的进一步提高。为了解决这个问题,我们设计了一种新型模型,通过将对比学习和自我表达学习相结合,利用数据本身蕴含的自我监督信息,即通过自适应对比学习(CVCL)实现无监督跨视图子空间聚类。具体来说,CVCL 采用编码器从跨视图数据中学习一个潜在子空间,并将其转换为一个具有自我表达层的一致子空间。这样,对比学习有助于为自我表达学习层提供更具区分性的特征,而自我表达学习层则反过来监督对比学习。此外,CVCL 还能自适应地选择正负样本进行对比学习,以减少不恰当的负样本对带来的噪声影响。最后,解码器专为重构任务而设计,在自我表达层的输出上运行,力求尽可能忠实地还原原始数据,确保编码的特征具有潜在的有效性。在多个跨视角数据集上进行的广泛实验展示了我们模型的卓越性能和优越性。
{"title":"Unsupervised Cross-View Subspace Clustering via Adaptive Contrastive Learning","authors":"Zihao Zhang;Qianqian Wang;Quanxue Gao;Chengquan Pei;Wei Feng","doi":"10.1109/TBDATA.2024.3366084","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3366084","url":null,"abstract":"Cross-view subspace clustering has become a popular unsupervised method for cross-view data analysis because it can extract both the consistent and complementary features of data for different views. Nonetheless, existing methods usually ignore the discriminative features due to a lack of label supervision, which limits its further improvement in clustering performance. To address this issue, we design a novel model that leverages the self-supervision information embedded in the data itself by combining contrastive learning and self-expression learning, i.e., unsupervised cross-view subspace clustering via adaptive contrastive learning (CVCL). Specifically, CVCL employs an encoder to learn a latent subspace from the cross-view data and convert it to a consistent subspace with a self-expression layer. In this way, contrastive learning helps to provide more discriminative features for the self-expression learning layer, and the self-expression learning layer in turn supervises contrastive learning. Besides, CVCL adaptively chooses positive and negative samples for contrastive learning to reduce the noisy impact of improper negative sample pairs. Ultimately, the decoder is designed for reconstruction tasks, operating on the output of the self-expressive layer, and strives to faithfully restore the original data as much as possible, ensuring that the encoded features are potentially effective. Extensive experiments conducted across multiple cross-view datasets showcase the exceptional performance and superiority of our model.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 5","pages":"609-619"},"PeriodicalIF":7.5,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142130220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decentralized Federated Learning: A Survey on Security and Privacy 分散式联合学习:安全与隐私调查
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-05 DOI: 10.1109/TBDATA.2024.3362191
Ehsan Hallaji;Roozbeh Razavi-Far;Mehrdad Saif;Boyu Wang;Qiang Yang
Federated learning has been rapidly evolving and gaining popularity in recent years due to its privacy-preserving features, among other advantages. Nevertheless, the exchange of model updates and gradients in this architecture provides new attack surfaces for malicious users of the network which may jeopardize the model performance and user and data privacy. For this reason, one of the main motivations for decentralized federated learning is to eliminate server-related threats by removing the server from the network and compensating for it through technologies such as blockchain. However, this advantage comes at the cost of challenging the system with new privacy threats. Thus, performing a thorough security analysis in this new paradigm is necessary. This survey studies possible variations of threats and adversaries in decentralized federated learning and overviews the potential defense mechanisms. Trustability and verifiability of decentralized federated learning are also considered in this study.
近年来,联盟学习因其保护隐私等优点而迅速发展并越来越受欢迎。然而,这种架构中的模型更新和梯度交换为网络恶意用户提供了新的攻击面,可能会危及模型性能以及用户和数据隐私。因此,去中心化联合学习的主要动机之一是通过将服务器从网络中移除,并通过区块链等技术对其进行补偿,从而消除与服务器相关的威胁。然而,这一优势是以系统面临新的隐私威胁为代价的。因此,有必要对这种新模式进行全面的安全分析。本调查研究了去中心化联合学习中可能存在的各种威胁和对手,并概述了潜在的防御机制。本研究还考虑了分散式联合学习的可信性和可验证性。
{"title":"Decentralized Federated Learning: A Survey on Security and Privacy","authors":"Ehsan Hallaji;Roozbeh Razavi-Far;Mehrdad Saif;Boyu Wang;Qiang Yang","doi":"10.1109/TBDATA.2024.3362191","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3362191","url":null,"abstract":"Federated learning has been rapidly evolving and gaining popularity in recent years due to its privacy-preserving features, among other advantages. Nevertheless, the exchange of model updates and gradients in this architecture provides new attack surfaces for malicious users of the network which may jeopardize the model performance and user and data privacy. For this reason, one of the main motivations for decentralized federated learning is to eliminate server-related threats by removing the server from the network and compensating for it through technologies such as blockchain. However, this advantage comes at the cost of challenging the system with new privacy threats. Thus, performing a thorough security analysis in this new paradigm is necessary. This survey studies possible variations of threats and adversaries in decentralized federated learning and overviews the potential defense mechanisms. Trustability and verifiability of decentralized federated learning are also considered in this study.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 2","pages":"194-213"},"PeriodicalIF":7.2,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140123489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Hypergraph Structure Learning for Multivariate Time Series Forecasting 多变量时间序列预测的动态超图结构学习
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-05 DOI: 10.1109/TBDATA.2024.3362188
Shun Wang;Yong Zhang;Xuanqi Lin;Yongli Hu;Qingming Huang;Baocai Yin
Multivariate time series forecasting plays an important role in many domain applications, such as air pollution forecasting and traffic forecasting. Modeling the complex dependencies among time series is a key challenging task in multivariate time series forecasting. Many previous works have used graph structures to learn inter-series correlations, which have achieved remarkable performance. However, graph networks can only capture spatio-temporal dependencies between pairs of nodes, which cannot handle high-order correlations among time series. We propose a Dynamic Hypergraph Structure Learning model (DHSL) to solve the above problems. We generate dynamic hypergraph structures from time series data using the K-Nearest Neighbors method. Then a dynamic hypergraph structure learning module is used to optimize the hypergraph structure to obtain more accurate high-order correlations among nodes. Finally, the hypergraph structures dynamically learned are used in the spatio-temporal hypergraph neural network. We conduct experiments on six real-world datasets. The prediction performance of our model surpasses existing graph network-based prediction models. The experimental results demonstrate the effectiveness and competitiveness of the DHSL model for multivariate time series forecasting.
多变量时间序列预测在空气污染预测和交通预测等许多领域的应用中发挥着重要作用。对时间序列间的复杂依赖关系建模是多变量时间序列预测的一项重要挑战任务。以往的许多研究都使用图结构来学习序列间的相关性,并取得了显著的效果。然而,图网络只能捕捉节点对之间的时空依赖关系,无法处理时间序列之间的高阶相关性。我们提出了一种动态超图结构学习模型(DHSL)来解决上述问题。我们使用 K 最近邻方法从时间序列数据中生成动态超图结构。然后使用动态超图结构学习模块优化超图结构,以获得节点间更精确的高阶相关性。最后,将动态学习到的超图结构用于时空超图神经网络。我们在六个真实世界数据集上进行了实验。我们模型的预测性能超越了现有的基于图网络的预测模型。实验结果证明了 DHSL 模型在多变量时间序列预测中的有效性和竞争力。
{"title":"Dynamic Hypergraph Structure Learning for Multivariate Time Series Forecasting","authors":"Shun Wang;Yong Zhang;Xuanqi Lin;Yongli Hu;Qingming Huang;Baocai Yin","doi":"10.1109/TBDATA.2024.3362188","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3362188","url":null,"abstract":"Multivariate time series forecasting plays an important role in many domain applications, such as air pollution forecasting and traffic forecasting. Modeling the complex dependencies among time series is a key challenging task in multivariate time series forecasting. Many previous works have used graph structures to learn inter-series correlations, which have achieved remarkable performance. However, graph networks can only capture spatio-temporal dependencies between pairs of nodes, which cannot handle high-order correlations among time series. We propose a Dynamic Hypergraph Structure Learning model (DHSL) to solve the above problems. We generate dynamic hypergraph structures from time series data using the K-Nearest Neighbors method. Then a dynamic hypergraph structure learning module is used to optimize the hypergraph structure to obtain more accurate high-order correlations among nodes. Finally, the hypergraph structures dynamically learned are used in the spatio-temporal hypergraph neural network. We conduct experiments on six real-world datasets. The prediction performance of our model surpasses existing graph network-based prediction models. The experimental results demonstrate the effectiveness and competitiveness of the DHSL model for multivariate time series forecasting.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"556-567"},"PeriodicalIF":7.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ALTRUIST: A Python Package to Emulate a Virtual Digital Cohort Study Using Social Media Data ALTRUIST:利用社交媒体数据模拟虚拟数字队列研究的 Python 软件包
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-02-05 DOI: 10.1109/TBDATA.2024.3362193
Charline Bour;Abir Elbeji;Luigi De Giovanni;Adrian Ahne;Guy Fagherazzi
Epidemiological cohort studies play a crucial role in identifying risk factors for various outcomes among participants. These studies are often time-consuming and costly due to recruitment and long-term follow-up. Social media (SM) data has emerged as a valuable complementary source for digital epidemiology and health research, as online communities of patients regularly share information about their illnesses. Unlike traditional clinical questionnaires, SM offer unstructured but insightful information about patients’ disease burden. Yet, there is limited guidance on analyzing SM data as a prospective cohort. We presented the concept of virtual digital cohort studies (VDCS) as an approach to replicate cohort studies using SM data. In this paper, we introduce ALTRUIST, an open-source Python package enabling standardized generation of VDCS on SM. ALTRUIST facilitates data collection, preprocessing, and analysis steps that mimic a traditional cohort study. We provide a practical use case focusing on diabetes to illustrate the methodology. By leveraging SM data, which offers large-scale and cost-effective information on users’ health, we demonstrate the potential of VDCS as an essential tool for specific research questions. ALTRUIST is customizable and can be applied to data from various online communities of patients, complementing traditional epidemiological methods and promoting minimally disruptive health research.
流行病学队列研究在确定参与者中各种结果的风险因素方面发挥着至关重要的作用。由于招募和长期随访,这些研究往往耗时费钱。社交媒体(SM)数据已成为数字流行病学和健康研究的重要补充来源,因为患者在线社区会定期分享有关他们疾病的信息。与传统的临床问卷调查不同,社交媒体提供了有关患者疾病负担的非结构化但有洞察力的信息。然而,将 SM 数据作为前瞻性队列进行分析的指导却很有限。我们提出了虚拟数字队列研究(VDCS)的概念,作为利用 SM 数据复制队列研究的一种方法。在本文中,我们介绍了 ALTRUIST,这是一个开源 Python 软件包,可在 SM 上标准化生成 VDCS。ALTRUIST 简化了模拟传统队列研究的数据收集、预处理和分析步骤。我们提供了一个以糖尿病为重点的实用案例来说明该方法。通过利用 SM 数据,我们展示了 VDCS 作为解决特定研究问题的重要工具的潜力。ALTRUIST 是可定制的,可应用于各种患者在线社区的数据,是对传统流行病学方法的补充,促进了破坏性最小的健康研究。
{"title":"ALTRUIST: A Python Package to Emulate a Virtual Digital Cohort Study Using Social Media Data","authors":"Charline Bour;Abir Elbeji;Luigi De Giovanni;Adrian Ahne;Guy Fagherazzi","doi":"10.1109/TBDATA.2024.3362193","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3362193","url":null,"abstract":"Epidemiological cohort studies play a crucial role in identifying risk factors for various outcomes among participants. These studies are often time-consuming and costly due to recruitment and long-term follow-up. Social media (SM) data has emerged as a valuable complementary source for digital epidemiology and health research, as online communities of patients regularly share information about their illnesses. Unlike traditional clinical questionnaires, SM offer unstructured but insightful information about patients’ disease burden. Yet, there is limited guidance on analyzing SM data as a prospective cohort. We presented the concept of virtual digital cohort studies (VDCS) as an approach to replicate cohort studies using SM data. In this paper, we introduce ALTRUIST, an open-source Python package enabling standardized generation of VDCS on SM. ALTRUIST facilitates data collection, preprocessing, and analysis steps that mimic a traditional cohort study. We provide a practical use case focusing on diabetes to illustrate the methodology. By leveraging SM data, which offers large-scale and cost-effective information on users’ health, we demonstrate the potential of VDCS as an essential tool for specific research questions. ALTRUIST is customizable and can be applied to data from various online communities of patients, complementing traditional epidemiological methods and promoting minimally disruptive health research.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"568-575"},"PeriodicalIF":7.5,"publicationDate":"2024-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10420428","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-Tuned Personality Federated Learning for Graph Data 针对图形数据的微调个性联合学习
IF 7.2 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-19 DOI: 10.1109/TBDATA.2024.3356388
Meiting Xue;Zian Zhou;Pengfei Jiao;Huijun Tang
Federated Learning (FL) empowers multiple clients to collaboratively learn a global generalization model without the need to share their local data, thus reducing privacy risks and expanding the scope of AI applications. However, current works focus less on data in a highly nonidentically distributed manner such as graph data which are common in reality, and ignore the problem of model personalization between clients for graph data training in federated learning. In this paper, we propose a novel personality graph federated learning framework based on variational graph autoencoders that incorporates model contrastive learning and local fine-tuning to achieve personalized federated training on graph data for each client, which is called FedVGAE. Then we introduce an encoder-sharing strategy to the proposed framework that shares the parameters of the encoder layer to further improve personality performance. The node classification and link prediction experiments demonstrate that our method achieves better performance than other federated learning methods on most graph datasets in the non-iid setting. Finally, we conduct ablation experiments, the result demonstrates the effectiveness of our proposed method.
联盟学习(Federated Learning,FL)使多个客户端能够协作学习一个全局泛化模型,而无需共享各自的本地数据,从而降低了隐私风险,扩大了人工智能的应用范围。然而,目前的研究较少关注高度非同分布式的数据,如现实中常见的图数据,而忽略了联合学习中图数据训练的客户端之间的模型个性化问题。在本文中,我们提出了一种基于变异图自编码器的新型个性图联合学习框架,该框架结合了模型对比学习和局部微调,以实现每个客户端对图数据的个性化联合训练,我们称之为 FedVGAE。然后,我们在拟议框架中引入了编码器共享策略,共享编码器层的参数,以进一步提高个性性能。节点分类和链接预测实验表明,我们的方法在非 iid 环境下的大多数图数据集上都取得了比其他联合学习方法更好的性能。最后,我们进行了消融实验,结果证明了我们提出的方法的有效性。
{"title":"Fine-Tuned Personality Federated Learning for Graph Data","authors":"Meiting Xue;Zian Zhou;Pengfei Jiao;Huijun Tang","doi":"10.1109/TBDATA.2024.3356388","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3356388","url":null,"abstract":"Federated Learning (FL) empowers multiple clients to collaboratively learn a global generalization model without the need to share their local data, thus reducing privacy risks and expanding the scope of AI applications. However, current works focus less on data in a highly nonidentically distributed manner such as graph data which are common in reality, and ignore the problem of model personalization between clients for graph data training in federated learning. In this paper, we propose a novel personality graph federated learning framework based on variational graph autoencoders that incorporates model contrastive learning and local fine-tuning to achieve personalized federated training on graph data for each client, which is called FedVGAE. Then we introduce an encoder-sharing strategy to the proposed framework that shares the parameters of the encoder layer to further improve personality performance. The node classification and link prediction experiments demonstrate that our method achieves better performance than other federated learning methods on most graph datasets in the non-iid setting. Finally, we conduct ablation experiments, the result demonstrates the effectiveness of our proposed method.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 3","pages":"313-319"},"PeriodicalIF":7.2,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924721","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LGRL: Local-Global Representation Learning for On-the-Fly FG-SBIR LGRL:用于即时 FG-SBIR 的局部-全局表征学习
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-19 DOI: 10.1109/TBDATA.2024.3356393
Dawei Dai;Yingge Liu;Yutang Li;Shiyu Fu;Shuyin Xia;Guoyin Wang
On-the-fly Fine-grained sketch-based image retrieval (On-the-fly FG-SBIR) framework aim to break the barriers that sketch drawing requires excellent skills and is time-consuming. Considering such problems, a partial sketch with fewer strokes contains only the little local information, and the drawing process may show great difference among users, resulting in poor performance at the early retrieval. In this study, we developed a local-global representation learning (LGRL) method, in which we learn the representations for both the local and global regions of the partial sketch and its target photos. Specifically, we first designed a triplet network to learn the joint embedding space shared between the local and global regions of the entire sketch and its corresponding region of the photo. Then, we divided each partial sketch in the sketch-drawing episode into several local regions; Another learnable module following the triplet network was designed to learn the representations for the local regions of the partial sketch. Finally, by combining both the local and global regions of the sketches and photos, the final distance was determined. In the experiments, our method outperformed state-of-the-art baseline methods in terms of early retrieval efficiency on two publicly sketch-retrieval datasets and the practice test.
基于即时细粒度草图的图像检索(On-the-fly Fine-grained Sketch-based Image Retrieval,简称On-the-fly FG-SBIR)框架旨在打破草图绘制需要高超技巧和耗费时间的障碍。考虑到这些问题,笔画较少的局部草图仅包含很少的局部信息,而且用户之间的绘制过程可能存在很大差异,导致早期检索性能不佳。在本研究中,我们开发了一种局部-全局表示学习(LGRL)方法,即学习局部草图及其目标照片的局部区域和全局区域的表示。具体来说,我们首先设计了一个三元组网络来学习整个草图的局部和全局区域与照片的相应区域之间共享的联合嵌入空间。然后,我们将草图绘制过程中的每个局部草图划分为若干局部区域;继三重网络之后,我们又设计了另一个可学习模块,用于学习局部草图局部区域的表征。最后,结合草图和照片的局部区域和全局区域,确定最终距离。在实验中,在两个公开的草图检索数据集和实践测试中,我们的方法在早期检索效率方面优于最先进的基线方法。
{"title":"LGRL: Local-Global Representation Learning for On-the-Fly FG-SBIR","authors":"Dawei Dai;Yingge Liu;Yutang Li;Shiyu Fu;Shuyin Xia;Guoyin Wang","doi":"10.1109/TBDATA.2024.3356393","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3356393","url":null,"abstract":"On-the-fly Fine-grained sketch-based image retrieval (On-the-fly FG-SBIR) framework aim to break the barriers that sketch drawing requires excellent skills and is time-consuming. Considering such problems, a partial sketch with fewer strokes contains only the little local information, and the drawing process may show great difference among users, resulting in poor performance at the early retrieval. In this study, we developed a local-global representation learning (LGRL) method, in which we learn the representations for both the local and global regions of the partial sketch and its target photos. Specifically, we first designed a triplet network to learn the joint embedding space shared between the local and global regions of the entire sketch and its corresponding region of the photo. Then, we divided each partial sketch in the sketch-drawing episode into several local regions; Another learnable module following the triplet network was designed to learn the representations for the local regions of the partial sketch. Finally, by combining both the local and global regions of the sketches and photos, the final distance was determined. In the experiments, our method outperformed state-of-the-art baseline methods in terms of early retrieval efficiency on two publicly sketch-retrieval datasets and the practice test.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"543-555"},"PeriodicalIF":7.5,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection GAT-COBO:用于电信欺诈检测的成本敏感图神经网络
IF 7.5 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-01-11 DOI: 10.1109/TBDATA.2024.3352978
Xinxin Hu;Haotian Chen;Junjie Zhang;Hongchang Chen;Shuxin Liu;Xing Li;Yahui Wang;Xiangyang Xue
Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a significant increase in telecom fraud, which severely dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This emerging and complex issue has received limited attention in prior research. In this paper, we propose a Graph ATtention network with COst-sensitive BOosting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs.
伴随着 5G 等移动通信技术的快速发展,电信诈骗案件大幅增加,严重侵蚀了个人财富和社会财富。近年来,图挖掘技术逐渐成为检测电信诈骗的主流解决方案。然而,帕累托原理导致的图不平衡问题给图数据挖掘带来了严峻的挑战。这一新兴而复杂的问题在之前的研究中受到的关注有限。在本文中,我们针对图不平衡问题提出了一种具有 COst-sensitive BOosting(GAT-COBO)功能的图 ATtention 网络。首先,我们设计了一个基于 GAT 的基础分类器来学习图中所有节点的嵌入。然后,我们将嵌入信息输入一个精心设计的成本敏感学习器,以进行不平衡学习。接下来,我们根据误分类成本更新权重,使模型更加关注少数类别。最后,我们将多个成本敏感学习器获得的节点嵌入相加,得到一个综合节点表示,用于下游的异常检测任务。在两个真实世界的电信欺诈检测数据集上进行的大量实验表明,我们提出的方法对图不平衡问题非常有效,性能优于最先进的 GNN 和基于 GNN 的欺诈检测器。此外,我们的模型还有助于解决 GNN 中普遍存在的过度平滑问题。
{"title":"GAT-COBO: Cost-Sensitive Graph Neural Network for Telecom Fraud Detection","authors":"Xinxin Hu;Haotian Chen;Junjie Zhang;Hongchang Chen;Shuxin Liu;Xing Li;Yahui Wang;Xiangyang Xue","doi":"10.1109/TBDATA.2024.3352978","DOIUrl":"https://doi.org/10.1109/TBDATA.2024.3352978","url":null,"abstract":"Along with the rapid evolution of mobile communication technologies, such as 5G, there has been a significant increase in telecom fraud, which severely dissipates individual fortune and social wealth. In recent years, graph mining techniques are gradually becoming a mainstream solution for detecting telecom fraud. However, the graph imbalance problem, caused by the Pareto principle, brings severe challenges to graph data mining. This emerging and complex issue has received limited attention in prior research. In this paper, we propose a \u0000<underline>G</u>\u0000raph \u0000<underline>AT</u>\u0000tention network with \u0000<underline>CO</u>\u0000st-sensitive \u0000<underline>BO</u>\u0000osting (GAT-COBO) for the graph imbalance problem. First, we design a GAT-based base classifier to learn the embeddings of all nodes in the graph. Then, we feed the embeddings into a well-designed cost-sensitive learner for imbalanced learning. Next, we update the weights according to the misclassification cost to make the model focus more on the minority class. Finally, we sum the node embeddings obtained by multiple cost-sensitive learners to obtain a comprehensive node representation, which is used for the downstream anomaly detection task. Extensive experiments on two real-world telecom fraud detection datasets demonstrate that our proposed method is effective for the graph imbalance problem, outperforming the state-of-the-art GNNs and GNN-based fraud detectors. In addition, our model is also helpful for solving the widespread over-smoothing problem in GNNs.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"10 4","pages":"528-542"},"PeriodicalIF":7.5,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141602526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Big Data
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1