Node and relevant data selection in distributed predictive analytics: A query-centric approach

IF 7.7 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Network and Computer Applications Pub Date : 2024-09-19 DOI:10.1016/j.jnca.2024.104029
Tahani Aladwani , Christos Anagnostopoulos , Kostas Kolomvatsos
{"title":"Node and relevant data selection in distributed predictive analytics: A query-centric approach","authors":"Tahani Aladwani ,&nbsp;Christos Anagnostopoulos ,&nbsp;Kostas Kolomvatsos","doi":"10.1016/j.jnca.2024.104029","DOIUrl":null,"url":null,"abstract":"<div><div>Distributed Predictive Analytics (DPA) refers to constructing predictive models based on data distributed across nodes. DPA reduces the need for data centralization, thus, alleviating concerns about data privacy, decreasing the load on central servers, and minimizing communication overhead. However, data collected by nodes are inherently different; each node can have different distributions, volumes, access patterns, and features space. This heterogeneity hinders the development of accurate models in a distributed fashion. Many state-of-the-art methods adopt random node selection as a straightforward approach. Such method is particularly ineffective when dealing with data and access pattern heterogeneity, as it increases the likelihood of selecting nodes with low-quality or irrelevant data for DPA. Consequently, it is only after training models over randomly selected nodes that the most suitable ones can be identified based on the predictive performance. This results in more time and resource consumption, and increased network load. In this work, holistic knowledge of nodes’ data characteristics and access patterns is crucial. Such knowledge enables the successful selection of a subset of suitable nodes for each DPA task (query) before model training. Our method engages the most suitable nodes by predicting their relevant distributed data and learning predictive models <em>per</em> query. We introduce a novel DPA query-centric mechanism for node and relevant data selection. We contribute with (i) predictive selection mechanisms based on the availability and relevance of data per DPA query and (ii) various distributed machine learning mechanisms that engage the most suitable nodes for model training. We evaluate the efficiency of our mechanism and provide a comparative assessment with other methods found in the literature. Our experiments showcase that our mechanism significantly outperforms other approaches being applicable in DPA.</div></div>","PeriodicalId":54784,"journal":{"name":"Journal of Network and Computer Applications","volume":"232 ","pages":"Article 104029"},"PeriodicalIF":7.7000,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Computer Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1084804524002066","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Distributed Predictive Analytics (DPA) refers to constructing predictive models based on data distributed across nodes. DPA reduces the need for data centralization, thus, alleviating concerns about data privacy, decreasing the load on central servers, and minimizing communication overhead. However, data collected by nodes are inherently different; each node can have different distributions, volumes, access patterns, and features space. This heterogeneity hinders the development of accurate models in a distributed fashion. Many state-of-the-art methods adopt random node selection as a straightforward approach. Such method is particularly ineffective when dealing with data and access pattern heterogeneity, as it increases the likelihood of selecting nodes with low-quality or irrelevant data for DPA. Consequently, it is only after training models over randomly selected nodes that the most suitable ones can be identified based on the predictive performance. This results in more time and resource consumption, and increased network load. In this work, holistic knowledge of nodes’ data characteristics and access patterns is crucial. Such knowledge enables the successful selection of a subset of suitable nodes for each DPA task (query) before model training. Our method engages the most suitable nodes by predicting their relevant distributed data and learning predictive models per query. We introduce a novel DPA query-centric mechanism for node and relevant data selection. We contribute with (i) predictive selection mechanisms based on the availability and relevance of data per DPA query and (ii) various distributed machine learning mechanisms that engage the most suitable nodes for model training. We evaluate the efficiency of our mechanism and provide a comparative assessment with other methods found in the literature. Our experiments showcase that our mechanism significantly outperforms other approaches being applicable in DPA.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
分布式预测分析中的节点和相关数据选择:以查询为中心的方法
分布式预测分析(DPA)是指根据分布在各节点上的数据构建预测模型。DPA 减少了数据集中的需要,从而减轻了对数据隐私的担忧,降低了中央服务器的负荷,并最大限度地减少了通信开销。然而,节点收集的数据本质上是不同的;每个节点可能有不同的分布、容量、访问模式和特征空间。这种异质性阻碍了分布式精确模型的开发。许多最先进的方法都采用随机节点选择作为直接方法。这种方法在处理数据和访问模式异质性时尤其无效,因为它会增加选择低质量或不相关数据节点进行 DPA 的可能性。因此,只有在随机选择的节点上训练模型后,才能根据预测性能确定最合适的节点。这将导致更多的时间和资源消耗,并增加网络负载。在这项工作中,全面了解节点的数据特征和访问模式至关重要。有了这些知识,就能在模型训练前为每个 DPA 任务(查询)成功选择合适的节点子集。我们的方法通过预测节点的相关分布式数据和学习每个查询的预测模型来选择最合适的节点。我们引入了一种新颖的以 DPA 查询为中心的节点和相关数据选择机制。我们的贡献包括:(i) 基于每个 DPA 查询的数据可用性和相关性的预测选择机制;(ii) 各种分布式机器学习机制,这些机制可让最合适的节点参与模型训练。我们评估了我们机制的效率,并提供了与文献中其他方法的比较评估。我们的实验表明,我们的机制明显优于其他适用于 DPA 的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Network and Computer Applications
Journal of Network and Computer Applications 工程技术-计算机:跨学科应用
CiteScore
21.50
自引率
3.40%
发文量
142
审稿时长
37 days
期刊介绍: The Journal of Network and Computer Applications welcomes research contributions, surveys, and notes in all areas relating to computer networks and applications thereof. Sample topics include new design techniques, interesting or novel applications, components or standards; computer networks with tools such as WWW; emerging standards for internet protocols; Wireless networks; Mobile Computing; emerging computing models such as cloud computing, grid computing; applications of networked systems for remote collaboration and telemedicine, etc. The journal is abstracted and indexed in Scopus, Engineering Index, Web of Science, Science Citation Index Expanded and INSPEC.
期刊最新文献
On and off the manifold: Generation and Detection of adversarial attacks in IIoT networks Light up that Droid! On the effectiveness of static analysis features against app obfuscation for Android malware detection Clusters in chaos: A deep unsupervised learning paradigm for network anomaly detection Consensus hybrid ensemble machine learning for intrusion detection with explainable AI Adaptive differential privacy in asynchronous federated learning for aerial-aided edge computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1