Knowledge and Information Systems最新文献_第9页

Robustness verification of k-nearest neighbors by abstract interpretation 通过抽象解释验证 k 近邻的鲁棒性

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-26 DOI: 10.1007/s10115-024-02108-4

Nicolò Fassina, Francesco Ranzato, Marco Zanella

We study the certification of stability properties, such as robustness and individual fairness, of the k-nearest neighbor algorithm (kNN). Our approach leverages abstract interpretation, a well-established program analysis technique that has been proven successful in verifying several machine learning algorithms, notably, neural networks, decision trees, and support vector machines. In this work, we put forward an abstract interpretation-based framework for designing a sound approximate version of the kNN algorithm, which is instantiated to the interval and zonotope abstractions for approximating the range of numerical features. We show how this abstraction-based method can be used for stability, robustness, and individual fairness certification of kNN. Our certification technique has been implemented and experimentally evaluated on several benchmark datasets. These experimental results show that our tool can formally prove the stability of kNN classifiers in a precise and efficient way, thus expanding the range of machine learning models amenable to robustness certification.

我们研究了 k 近邻算法（kNN）稳定性属性的认证，如稳健性和个体公平性。抽象解释是一种成熟的程序分析技术，在验证神经网络、决策树和支持向量机等多种机器学习算法方面已被证明是成功的。在这项工作中，我们提出了一个基于抽象解释的框架，用于设计 kNN 算法的合理近似版本，并将其实例化为用于近似数值特征范围的区间和带状抽象。我们展示了这种基于抽象的方法如何用于 kNN 的稳定性、鲁棒性和个体公平性认证。我们的认证技术已在多个基准数据集上实现并进行了实验评估。这些实验结果表明，我们的工具能以精确、高效的方式正式证明 kNN 分类器的稳定性，从而扩大了可进行鲁棒性认证的机器学习模型的范围。

{"title":"Robustness verification of k-nearest neighbors by abstract interpretation","authors":"Nicolò Fassina, Francesco Ranzato, Marco Zanella","doi":"10.1007/s10115-024-02108-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02108-4","url":null,"abstract":"We study the certification of stability properties, such as robustness and individual fairness, of the k-nearest neighbor algorithm (kNN). Our approach leverages abstract interpretation, a well-established program analysis technique that has been proven successful in verifying several machine learning algorithms, notably, neural networks, decision trees, and support vector machines. In this work, we put forward an abstract interpretation-based framework for designing a sound approximate version of the kNN algorithm, which is instantiated to the interval and zonotope abstractions for approximating the range of numerical features. We show how this abstraction-based method can be used for stability, robustness, and individual fairness certification of kNN. Our certification technique has been implemented and experimentally evaluated on several benchmark datasets. These experimental results show that our tool can formally prove the stability of kNN classifiers in a precise and efficient way, thus expanding the range of machine learning models amenable to robustness certification.\u0000","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"2015 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BotCL: a social bot detection model based on graph contrastive learning BotCL：基于图对比学习的社交机器人检测模型

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-26 DOI: 10.1007/s10115-024-02116-4

Yan Li, Zhenyu Li, Daofu Gong, Qian Hu, Haoyu Lu

The proliferation of social bots on social networks presents significant challenges to network security due to their malicious activities. While graph neural network models have shown promise in detecting social bots, acquiring a large number of high-quality labeled accounts remains challenging, impacting bot detection performance. To address this issue, we introduce BotCL, a social bot detection model that employs contrastive learning through data augmentation. Initially, we build a directed graph based on following/follower relationships, utilizing semantic, attribute, and structural features of accounts as initial node features. We then simulate account behaviors within the social network and apply two data augmentation techniques to generate multiple views of the directed graph. Subsequently, we encode the generated views using relational graph convolutional networks, achieving maximum homogeneity in node representations by minimizing the contrastive loss. Finally, node labels are predicted using Softmax. The proposed method augments data based on its distribution, showcasing robustness to noise. Extensive experimental results on Cresci-2015, Twibot-20, and Twibot-22 datasets demonstrate that our approach surpasses the state-of-the-art methods in terms of performance.

由于社交机器人的恶意活动，它们在社交网络上的扩散给网络安全带来了巨大挑战。虽然图神经网络模型在检测社交僵尸方面已显示出良好的前景，但获取大量高质量的标签账户仍具有挑战性，从而影响了僵尸检测性能。为了解决这个问题，我们引入了 BotCL，这是一种通过数据增强进行对比学习的社交僵尸检测模型。起初，我们基于关注/粉丝关系构建有向图，利用账户的语义、属性和结构特征作为初始节点特征。然后，我们模拟账户在社交网络中的行为，并应用两种数据增强技术生成有向图的多个视图。随后，我们使用关系图卷积网络对生成的视图进行编码，通过最小化对比损失实现节点表示的最大同质性。最后，使用 Softmax 预测节点标签。所提出的方法根据数据的分布对数据进行了增强，展示了对噪声的鲁棒性。在 Cresci-2015、Twibot-20 和 Twibot-22 数据集上的大量实验结果表明，我们的方法在性能上超越了最先进的方法。

{"title":"BotCL: a social bot detection model based on graph contrastive learning","authors":"Yan Li, Zhenyu Li, Daofu Gong, Qian Hu, Haoyu Lu","doi":"10.1007/s10115-024-02116-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02116-4","url":null,"abstract":"The proliferation of social bots on social networks presents significant challenges to network security due to their malicious activities. While graph neural network models have shown promise in detecting social bots, acquiring a large number of high-quality labeled accounts remains challenging, impacting bot detection performance. To address this issue, we introduce BotCL, a social bot detection model that employs contrastive learning through data augmentation. Initially, we build a directed graph based on following/follower relationships, utilizing semantic, attribute, and structural features of accounts as initial node features. We then simulate account behaviors within the social network and apply two data augmentation techniques to generate multiple views of the directed graph. Subsequently, we encode the generated views using relational graph convolutional networks, achieving maximum homogeneity in node representations by minimizing the contrastive loss. Finally, node labels are predicted using Softmax. The proposed method augments data based on its distribution, showcasing robustness to noise. Extensive experimental results on Cresci-2015, Twibot-20, and Twibot-22 datasets demonstrate that our approach surpasses the state-of-the-art methods in terms of performance.","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing trust and privacy in distributed networks: a comprehensive survey on blockchain-based federated learning 增强分布式网络中的信任和隐私：基于区块链的联合学习综合调查

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-25 DOI: 10.1007/s10115-024-02117-3

Ji Liu, Chunlu Chen, Yu Li, Lin Sun, Yulun Song, Jingbo Zhou, Bo Jing, Dejing Dou

While centralized servers pose a risk of being a single point of failure, decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities. Merging distributed computing with cryptographic techniques, decentralized technologies introduce a novel computing paradigm. Blockchain ensures secure, transparent, and tamper-proof data management by validating and recording transactions via consensus across network nodes. Federated Learning (FL), as a distributed machine learning framework, enables participants to collaboratively train models while safeguarding data privacy by avoiding direct raw data exchange. Despite the growing interest in decentralized methods, their application in FL remains underexplored. This paper presents a thorough investigation into blockchain-based FL (BCFL), spotlighting the synergy between blockchain’s security features and FL’s privacy-preserving model training capabilities. First, we present the taxonomy of BCFL from three aspects, including decentralized, separate networks, and reputation-based architectures. Then, we summarize the general architecture of BCFL systems, providing a comprehensive perspective on FL architectures informed by blockchain. Afterward, we analyze the application of BCFL in healthcare, IoT, and other privacy-sensitive areas. Finally, we identify future research directions of BCFL.

集中式服务器存在单点故障的风险，而区块链等去中心化方法通过在多个实体之间实施共识机制，提供了一种令人信服的解决方案。去中心化技术将分布式计算与密码技术相结合，引入了一种新的计算模式。区块链通过在网络节点间达成共识来验证和记录交易，从而确保安全、透明和防篡改的数据管理。联邦学习（FL）作为一种分布式机器学习框架，使参与者能够协作训练模型，同时通过避免直接交换原始数据来保护数据隐私。尽管人们对去中心化方法的兴趣与日俱增，但这些方法在联合学习中的应用仍未得到充分探索。本文对基于区块链的 FL（BCFL）进行了深入研究，强调了区块链的安全特性与 FL 的隐私保护模型训练功能之间的协同作用。首先，我们从三个方面介绍了 BCFL 的分类，包括去中心化、独立网络和基于信誉的架构。然后，我们总结了 BCFL 系统的一般架构，提供了一个以区块链为基础的 FL 架构的全面视角。之后，我们分析了 BCFL 在医疗保健、物联网和其他隐私敏感领域的应用。最后，我们确定了 BCFL 的未来研究方向。

{"title":"Enhancing trust and privacy in distributed networks: a comprehensive survey on blockchain-based federated learning","authors":"Ji Liu, Chunlu Chen, Yu Li, Lin Sun, Yulun Song, Jingbo Zhou, Bo Jing, Dejing Dou","doi":"10.1007/s10115-024-02117-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02117-3","url":null,"abstract":"While centralized servers pose a risk of being a single point of failure, decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities. Merging distributed computing with cryptographic techniques, decentralized technologies introduce a novel computing paradigm. Blockchain ensures secure, transparent, and tamper-proof data management by validating and recording transactions via consensus across network nodes. Federated Learning (FL), as a distributed machine learning framework, enables participants to collaboratively train models while safeguarding data privacy by avoiding direct raw data exchange. Despite the growing interest in decentralized methods, their application in FL remains underexplored. This paper presents a thorough investigation into blockchain-based FL (BCFL), spotlighting the synergy between blockchain’s security features and FL’s privacy-preserving model training capabilities. First, we present the taxonomy of BCFL from three aspects, including decentralized, separate networks, and reputation-based architectures. Then, we summarize the general architecture of BCFL systems, providing a comprehensive perspective on FL architectures informed by blockchain. Afterward, we analyze the application of BCFL in healthcare, IoT, and other privacy-sensitive areas. Finally, we identify future research directions of BCFL.\u0000","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"26 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Forecasting financial market structure from network features using machine learning 利用机器学习从网络特征预测金融市场结构

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-22 DOI: 10.1007/s10115-024-02095-6

Douglas Castilho, Thársis T. P. Souza, Soong Moon Kang, João Gama, André C. P. L. F. de Carvalho

We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph, Dynamic Minimal Spanning Tree and Dynamic Threshold Networks. Experimental results show that the proposed model can forecast market structure with high predictive performance with up to (40%) improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.

我们提出了一种利用机器学习从基于链接和节点的金融网络特征预测市场相关性结构的模型。为此，我们通过量化全球主要市场指数中各公司成分股之间资产价格收益随时间变化的共同运动，将市场结构建模为动态资产网络。我们使用三种不同的网络过滤方法（即动态资产图、动态最小生成树和动态阈值网络）来估算市场结构，并提供了实证证据。实验结果表明，与基于时间不变相关性的基准相比，所提出的模型可以预测市场结构，并具有较高的预测性能。在所研究的所有市场中，与传统使用的成对相关性指标相比，非成对相关性特征显示出其重要性，特别是在股票市场结构的长期预测中。该研究为 DAX30、EUROSTOXX50、FTSE100、HANGSENG50、NASDAQ100 和 NIFTY50 市场指数的股票成分股提供了证据。研究结果有助于改进投资组合选择和风险管理方法，这些方法通常依赖于后向协方差矩阵来估计投资组合风险。

{"title":"Forecasting financial market structure from network features using machine learning","authors":"Douglas Castilho, Thársis T. P. Souza, Soong Moon Kang, João Gama, André C. P. L. F. de Carvalho","doi":"10.1007/s10115-024-02095-6","DOIUrl":"https://doi.org/10.1007/s10115-024-02095-6","url":null,"abstract":"We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph, Dynamic Minimal Spanning Tree and Dynamic Threshold Networks. Experimental results show that the proposed model can forecast market structure with high predictive performance with up to (40%) improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CHEKG: a collaborative and hybrid methodology for engineering modular and fair domain-specific knowledge graphs CHEKG：一种用于设计模块化和公平的特定领域知识图谱的协作和混合方法

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-20 DOI: 10.1007/s10115-024-02110-w

Sotiris Angelis, Efthymia Moraitou, George Caridakis, Konstantinos Kotis

Ontologies constitute the semantic model of Knowledge Graphs (KGs). This structural association indicates the potential existence of methodological analogies in the development of ontologies and KGs. The deployment of fully and well-defined methodologies for KG development based on existing ontology engineering methodologies (OEMs) has been suggested and efficiently applied. However, most of the modern/recent OEMs may not include tasks that (i) empower knowledge workers and domain experts to closely collaborate with ontology engineers and KG specialists for the development and maintenance of KGs, (ii) satisfy special requirements of KG development, such as (a) ensuring modularity and agility of KGs, (b) assessing and mitigating bias at schema and data levels. Toward this aim, the paper presents a methodology for the Collaborative and Hybrid Engineering of Knowledge Graphs (CHEKG), which constitutes a hybrid (schema-centric/top-down and data-driven/bottom-up), collaborative, agile, and iterative approach for developing modular and fair domain-specific KGs. CHEKG contributes to all phases of the KG engineering lifecycle: from the specification of a KG to its exploitation, evaluation, and refinement. The CHEKG methodology is based on the main phases of the extended Human-Centered Collaborative Ontology Engineering Methodology (ext-HCOME), while it adjusts and expands the individual processes and tasks of each phase according to the specialized requirements of KG development. Apart from the presentation of the methodology per se, the paper presents recent work regarding the deployment and evaluation of the CHEKG methodology for the engineering of semantic trajectories as KGs generated from unmanned aerial vehicles (UAVs) data during real cultural heritage documentation scenarios.

本体构成了知识图谱（KG）的语义模型。这种结构上的关联表明，在本体和知识图谱的开发过程中可能存在方法论上的类比。在现有本体工程方法论（OEMs）的基础上，为知识图谱（KGs）的开发部署完全且定义明确的方法论已被建议并有效应用。然而，大多数现代/最新的本体工程方法可能不包括以下任务：(i) 授权知识工作者和领域专家与本体工程师和知识库专家密切合作开发和维护知识库；(ii) 满足知识库开发的特殊要求，如 (a) 确保知识库的模块化和敏捷性；(b) 评估和减少模式和数据层面的偏差。为此，本文提出了一种知识图谱协作与混合工程（CHEKG）方法，它是一种混合（以模式为中心/自上而下和以数据为驱动/自下而上）、协作、敏捷和迭代的方法，用于开发模块化和公平的特定领域知识图谱。CHEKG 对幼稚园工程生命周期的所有阶段都有贡献：从幼稚园的规范到开发、评估和完善。CHEKG 方法论基于扩展的以人为中心的协作本体工程方法论（ext-HCOME）的主要阶段，同时根据 KG 开发的特殊要求，调整和扩展了每个阶段的个别过程和任务。除了介绍该方法论本身，本文还介绍了最近在实际文化遗产文献记录过程中部署和评估 CHEKG 方法论的最新工作，该方法论用于将语义轨迹作为从无人驾驶飞行器（UAV）数据中生成的 KG 进行工程设计。

{"title":"CHEKG: a collaborative and hybrid methodology for engineering modular and fair domain-specific knowledge graphs","authors":"Sotiris Angelis, Efthymia Moraitou, George Caridakis, Konstantinos Kotis","doi":"10.1007/s10115-024-02110-w","DOIUrl":"https://doi.org/10.1007/s10115-024-02110-w","url":null,"abstract":"Ontologies constitute the semantic model of Knowledge Graphs (KGs). This structural association indicates the potential existence of methodological analogies in the development of ontologies and KGs. The deployment of fully and well-defined methodologies for KG development based on existing ontology engineering methodologies (OEMs) has been suggested and efficiently applied. However, most of the modern/recent OEMs may not include tasks that (i) empower knowledge workers and domain experts to closely collaborate with ontology engineers and KG specialists for the development and maintenance of KGs, (ii) satisfy special requirements of KG development, such as (a) ensuring modularity and agility of KGs, (b) assessing and mitigating bias at schema and data levels. Toward this aim, the paper presents a methodology for the Collaborative and Hybrid Engineering of Knowledge Graphs (CHEKG), which constitutes a hybrid (schema-centric/top-down and data-driven/bottom-up), collaborative, agile, and iterative approach for developing modular and fair domain-specific KGs. CHEKG contributes to all phases of the KG engineering lifecycle: from the specification of a KG to its exploitation, evaluation, and refinement. The CHEKG methodology is based on the main phases of the extended Human-Centered Collaborative Ontology Engineering Methodology (ext-HCOME), while it adjusts and expands the individual processes and tasks of each phase according to the specialized requirements of KG development. Apart from the presentation of the methodology per se, the paper presents recent work regarding the deployment and evaluation of the CHEKG methodology for the engineering of semantic trajectories as KGs generated from unmanned aerial vehicles (UAVs) data during real cultural heritage documentation scenarios.","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"20 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An academic recommender system on large citation data based on clustering, graph modeling and deep learning 基于聚类、图建模和深度学习的大型引文数据学术推荐系统

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-18 DOI: 10.1007/s10115-024-02094-7

Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou, Antonio Corral

Recommendation (recommender) systems (RS) have played a significant role in both research and industry in recent years. In the area of academia, there is a need to help researchers discover the most appropriate and relevant scientific information through recommendations. Nevertheless, we argue that there is a major gap between academic state-of-the-art RS and real-world problems. In this paper, we present a novel multi-staged RS based on clustering, graph modeling and deep learning that manages to run on a full dataset (scientific digital library) in the magnitude of millions users and items (papers). We run several tests (experiments/evaluation) as a means to find the best approach regarding the tuning of our system; so, we present and compare three versions of our RS regarding recall and NDCG metrics. The results show that a multi-staged RS that utilizes a variety of techniques and algorithms is able to face real-world problems and large academic datasets. In this way, we suggest a way to close or minimize the gap between research and industry value RS.

近年来，推荐（recommender）系统（RS）在科研和工业领域都发挥了重要作用。在学术领域，需要通过推荐帮助研究人员发现最合适、最相关的科学信息。然而，我们认为，学术界最先进的 RS 与现实世界中的问题之间还存在很大差距。在本文中，我们提出了一种基于聚类、图建模和深度学习的新型多阶段 RS，它可以在数百万用户和条目（论文）的完整数据集（科学数字图书馆）上运行。我们进行了多项测试（实验/评估），以找到调整系统的最佳方法；因此，我们介绍并比较了三个版本的 RS 的召回率和 NDCG 指标。结果表明，利用各种技术和算法的多阶段 RS 能够应对现实世界的问题和大型学术数据集。因此，我们提出了一种方法来缩小或最小化研究与行业价值 RS 之间的差距。

{"title":"An academic recommender system on large citation data based on clustering, graph modeling and deep learning","authors":"Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou, Antonio Corral","doi":"10.1007/s10115-024-02094-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02094-7","url":null,"abstract":"Recommendation (recommender) systems (RS) have played a significant role in both research and industry in recent years. In the area of academia, there is a need to help researchers discover the most appropriate and relevant scientific information through recommendations. Nevertheless, we argue that there is a major gap between academic state-of-the-art RS and real-world problems. In this paper, we present a novel multi-staged RS based on clustering, graph modeling and deep learning that manages to run on a full dataset (scientific digital library) in the magnitude of millions users and items (papers). We run several tests (experiments/evaluation) as a means to find the best approach regarding the tuning of our system; so, we present and compare three versions of our RS regarding recall and NDCG metrics. The results show that a multi-staged RS that utilizes a variety of techniques and algorithms is able to face real-world problems and large academic datasets. In this way, we suggest a way to close or minimize the gap between research and industry value RS.\u0000","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"207 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring aspect-based sentiment analysis: an in-depth review of current methods and prospects for advancement 探索基于方面的情感分析：对当前方法和发展前景的深入评述

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-18 DOI: 10.1007/s10115-024-02104-8

Irfan Ali Kandhro, Fayyaz Ali, Mueen Uddin, Asadullah Kehar, Selvakumar Manickam

Aspect-based sentiment analysis (ABSA) is a natural language processing technique that seeks to recognize and extract the sentiment connected to various qualities or aspects of a specific good, service, or entity. It entails dissecting a text into its component pieces, determining the elements or aspects being examined, and then examining the attitude stated about each feature or aspect. The main objective of this research is to present a comprehensive understanding of aspect-based sentiment analysis (ABSA), such as its potential, ongoing trends and advancements, structure, practical applications, real-world implementation, and open issues. The current sentiment analysis aims to enhance granularity at the aspect level with two main objectives, including extracting aspects and polarity sentiment classification. Three main methods are designed for aspect extractions: pattern-based, machine learning and deep learning. These methods can capture both syntactic and semantic features of text without relying heavily on high-level feature engineering, which was a requirement in earlier approaches. Despite bringing traditional surveys, a comprehensive survey of the procedure for carrying out this task and the applications of ABSA are also included in this article. To fully comprehend each strategy's benefits and drawbacks, it is evaluated, compared, and investigated. To determine future directions, the ABSA’s difficulties are finally reviewed.

基于方面的情感分析（ABSA）是一种自然语言处理技术，旨在识别和提取与特定商品、服务或实体的各种品质或方面相关的情感。它需要将文本分解成各个部分，确定要检查的元素或方面，然后检查对每个特征或方面的态度。本研究的主要目的是全面了解基于方面的情感分析（ABSA），如其潜力、当前的趋势和进展、结构、实际应用、现实世界中的实施情况以及有待解决的问题。当前的情感分析旨在提高方面层面的粒度，主要有两个目标，包括方面提取和极性情感分类。针对方面提取设计了三种主要方法：基于模式的方法、机器学习方法和深度学习方法。这些方法可以同时捕捉文本的语法和语义特征，而无需严重依赖早期方法所要求的高层次特征工程。尽管带来了传统的调查，但本文也对执行这项任务的程序和 ABSA 的应用进行了全面调查。为了充分理解每种策略的优缺点，本文对其进行了评估、比较和研究。为了确定未来的发展方向，本文最后回顾了 ABSA 面临的困难。

{"title":"Exploring aspect-based sentiment analysis: an in-depth review of current methods and prospects for advancement","authors":"Irfan Ali Kandhro, Fayyaz Ali, Mueen Uddin, Asadullah Kehar, Selvakumar Manickam","doi":"10.1007/s10115-024-02104-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02104-8","url":null,"abstract":"Aspect-based sentiment analysis (ABSA) is a natural language processing technique that seeks to recognize and extract the sentiment connected to various qualities or aspects of a specific good, service, or entity. It entails dissecting a text into its component pieces, determining the elements or aspects being examined, and then examining the attitude stated about each feature or aspect. The main objective of this research is to present a comprehensive understanding of aspect-based sentiment analysis (ABSA), such as its potential, ongoing trends and advancements, structure, practical applications, real-world implementation, and open issues. The current sentiment analysis aims to enhance granularity at the aspect level with two main objectives, including extracting aspects and polarity sentiment classification. Three main methods are designed for aspect extractions: pattern-based, machine learning and deep learning. These methods can capture both syntactic and semantic features of text without relying heavily on high-level feature engineering, which was a requirement in earlier approaches. Despite bringing traditional surveys, a comprehensive survey of the procedure for carrying out this task and the applications of ABSA are also included in this article. To fully comprehend each strategy's benefits and drawbacks, it is evaluated, compared, and investigated. To determine future directions, the ABSA’s difficulties are finally reviewed.","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"100 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An approach for fuzzy group decision making and consensus measure with hesitant judgments of experts 专家犹豫不决的模糊群体决策和共识度量方法

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-17 DOI: 10.1007/s10115-024-02098-3

Chao Huang, Xiaoyue Wu, Mingwei Lin, Zeshui Xu

In some actual decision-making problems, experts may be hesitant to judge the performances of alternatives, which leads to experts providing decision matrices with incomplete information. However, most existing estimation methods for incomplete information in group decision-making (GDM) neglect the hesitant judgments of experts, possibly making the group decision outcomes unreasonable. Considering the hesitation degrees of experts in decision judgments, an approach is proposed based on the triangular intuitionistic fuzzy numbers (TIFNs) and TODIM (interactive and multiple criteria decision-making) method for GDM and consensus measure. First, TIFNs are applied to handle incomplete information due to the hesitant judgments of experts. Second, considering the risk attitudes of experts, a decision-making model is proposed to rank alternatives for GDM with incomplete information. Subsequently, based on measuring the concordance between solutions, a consensus model is presented to measure the group’s and individual’s consensus degrees. Finally, an illustrative example is presented to show the detailed implementation procedure of the proposed approach. The comparisons with some existing estimation methods verify the effectiveness of the proposed approach for handling incomplete information. The impacts and necessities of experts’ hesitation degrees are discussed by a sensitivity analysis.

在一些实际决策问题中，专家可能会对备选方案的性能判断犹豫不决，从而导致专家提供的决策矩阵信息不完整。然而，现有的群体决策（GDM）不完全信息估计方法大多忽视了专家的犹豫判断，可能会使群体决策结果不合理。考虑到专家在决策判断中的犹豫程度，本文提出了一种基于三角直觉模糊数（TIFNs）和 TODIM（交互式多准则决策）方法的 GDM 和共识度量方法。首先，三角直觉模糊数用于处理由于专家判断犹豫不决而导致的信息不完整问题。其次，考虑到专家的风险态度，提出了一个决策模型，用于对不完整信息下的 GDM 备选方案进行排序。随后，在测量解决方案之间一致性的基础上，提出了一个共识模型来测量群体和个人的共识度。最后，通过一个示例展示了所提方法的具体实施过程。通过与一些现有估算方法的比较，验证了所提方法在处理不完整信息方面的有效性。通过敏感性分析讨论了专家犹豫度的影响和必要性。

{"title":"An approach for fuzzy group decision making and consensus measure with hesitant judgments of experts","authors":"Chao Huang, Xiaoyue Wu, Mingwei Lin, Zeshui Xu","doi":"10.1007/s10115-024-02098-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02098-3","url":null,"abstract":"In some actual decision-making problems, experts may be hesitant to judge the performances of alternatives, which leads to experts providing decision matrices with incomplete information. However, most existing estimation methods for incomplete information in group decision-making (GDM) neglect the hesitant judgments of experts, possibly making the group decision outcomes unreasonable. Considering the hesitation degrees of experts in decision judgments, an approach is proposed based on the triangular intuitionistic fuzzy numbers (TIFNs) and TODIM (interactive and multiple criteria decision-making) method for GDM and consensus measure. First, TIFNs are applied to handle incomplete information due to the hesitant judgments of experts. Second, considering the risk attitudes of experts, a decision-making model is proposed to rank alternatives for GDM with incomplete information. Subsequently, based on measuring the concordance between solutions, a consensus model is presented to measure the group’s and individual’s consensus degrees. Finally, an illustrative example is presented to show the detailed implementation procedure of the proposed approach. The comparisons with some existing estimation methods verify the effectiveness of the proposed approach for handling incomplete information. The impacts and necessities of experts’ hesitation degrees are discussed by a sensitivity analysis.","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"7 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Protecting the privacy of social network data using graph correction 利用图形校正保护社交网络数据隐私

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-17 DOI: 10.1007/s10115-024-02115-5

Amir Dehaki Toroghi, Javad Hamidzadeh

Today, the rapid development of online social networks, as well as low costs, easy communication, and quick access with minimal facilities have made social networks an attractive and very influential phenomenon among people. The users of these networks tend to share their sensitive and private information with friends and acquaintances. This has caused the data of these networks to become a very important source of information about users, their interests, feelings, and activities. Analyzing this information can be very useful in predicting the behavior of users in dealing with various issues. But publishing this data for data mining can violate the privacy of users. As a result, data privacy protection of social networks has become an important and attractive research topic. In this context, various algorithms have been proposed, all of which meet privacy requirements by making changes in the information as well as the graph structure. But due to high processing costs and long execution times, these algorithms are not very appropriate for anonymizing big data. In this research, we improved the speed of data anonymization by using the number factorization technique to select and delete the best edges in the graph correction stage. We also used the chaotic krill herd algorithm to add edges, and considering the effect of all edges together on the structure of the graph, we selected edges and added them to the graph so that it preserved the graph’s utility. The evaluation results on the real-world datasets, show the efficiency of the proposed algorithm in comparison with the state-of-the-art methods to reduce the execution time and maintain the utility of the anonymous graph.

如今，在线社交网络发展迅猛，而且成本低廉、沟通方便、访问快捷、设施简便，使社交网络在人们中间成为一种极具吸引力和影响力的现象。这些网络的用户倾向于与朋友和熟人分享他们的敏感和私人信息。这使得这些网络的数据成为了解用户及其兴趣、情感和活动的重要信息来源。分析这些信息对于预测用户在处理各种问题时的行为非常有用。但公布这些数据用于数据挖掘可能会侵犯用户隐私。因此，社交网络的数据隐私保护已成为一个重要而有吸引力的研究课题。在此背景下，人们提出了各种算法，这些算法都是通过改变信息和图结构来满足隐私要求的。但由于处理成本高、执行时间长，这些算法不太适合大数据的匿名化。在这项研究中，我们通过使用数因子化技术，在图校正阶段选择和删除最佳边，从而提高了数据匿名化的速度。我们还使用了混沌磷虾群算法来添加边，考虑到所有边一起添加对图结构的影响，我们选择了边并将其添加到图中，从而保留了图的效用。在真实世界数据集上的评估结果表明，与最先进的方法相比，所提出的算法既能缩短执行时间，又能保持匿名图的效用。

{"title":"Protecting the privacy of social network data using graph correction","authors":"Amir Dehaki Toroghi, Javad Hamidzadeh","doi":"10.1007/s10115-024-02115-5","DOIUrl":"https://doi.org/10.1007/s10115-024-02115-5","url":null,"abstract":"Today, the rapid development of online social networks, as well as low costs, easy communication, and quick access with minimal facilities have made social networks an attractive and very influential phenomenon among people. The users of these networks tend to share their sensitive and private information with friends and acquaintances. This has caused the data of these networks to become a very important source of information about users, their interests, feelings, and activities. Analyzing this information can be very useful in predicting the behavior of users in dealing with various issues. But publishing this data for data mining can violate the privacy of users. As a result, data privacy protection of social networks has become an important and attractive research topic. In this context, various algorithms have been proposed, all of which meet privacy requirements by making changes in the information as well as the graph structure. But due to high processing costs and long execution times, these algorithms are not very appropriate for anonymizing big data. In this research, we improved the speed of data anonymization by using the number factorization technique to select and delete the best edges in the graph correction stage. We also used the chaotic krill herd algorithm to add edges, and considering the effect of all edges together on the structure of the graph, we selected edges and added them to the graph so that it preserved the graph’s utility. The evaluation results on the real-world datasets, show the efficiency of the proposed algorithm in comparison with the state-of-the-art methods to reduce the execution time and maintain the utility of the anonymous graph.","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"28 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data 基于模糊粗糙集的马群优化算法，适用于客户行为数据的地图缩减框架

IF 2.7 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge and Information Systems

Pub Date : 2024-04-16 DOI: 10.1007/s10115-024-02105-7

D. Sudha, M. Krishnamurthy

A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.

大量关联规则往往会降低数据挖掘结果的可靠性；因此，降维技术对数据分析至关重要。在分析海量数据集时，现有模型需要花费更多时间来扫描整个数据库，因为它们会发现数据分析所不需要的不必要项目和事务。为此，我们提出了基于模糊粗糙集的马群优化算法（FRS-HHO），并将其与 Map Reduce 算法相结合，以尽量缩短查询检索时间并提高性能。HHO 算法基于支持度、置信度、趣味性和提升度等多个目标，最大限度地减少数据集中不必要的条目数量和支持值最小的事务数量，从而最大限度地提高适配度，以评估关联规则的质量。种群中每个项的特征值由基于 Map Reduce 的适合度函数获得，从而以最短的时间生成最优的频繁项集。马群优化（HHO）被用来解决高维优化问题。所提出的 FRS-HHO 方法执行维度所需的时间较少，在总计 10 k 个事务的情况下，空间复杂度为 38%。此外，与其他方法相比，FRS-HHO 方法的速度提高了 17%，输入输出通信成本降低了 12%。所提出的 FRS-HHO 模型在执行时间、空间复杂度和速度方面都提高了性能。

{"title":"A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data","authors":"D. Sudha, M. Krishnamurthy","doi":"10.1007/s10115-024-02105-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02105-7","url":null,"abstract":"A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0