Pub Date : 2024-04-26DOI: 10.1007/s10115-024-02108-4
Nicolò Fassina, Francesco Ranzato, Marco Zanella
We study the certification of stability properties, such as robustness and individual fairness, of the k-nearest neighbor algorithm (kNN). Our approach leverages abstract interpretation, a well-established program analysis technique that has been proven successful in verifying several machine learning algorithms, notably, neural networks, decision trees, and support vector machines. In this work, we put forward an abstract interpretation-based framework for designing a sound approximate version of the kNN algorithm, which is instantiated to the interval and zonotope abstractions for approximating the range of numerical features. We show how this abstraction-based method can be used for stability, robustness, and individual fairness certification of kNN. Our certification technique has been implemented and experimentally evaluated on several benchmark datasets. These experimental results show that our tool can formally prove the stability of kNN classifiers in a precise and efficient way, thus expanding the range of machine learning models amenable to robustness certification.
我们研究了 k 近邻算法(kNN)稳定性属性的认证,如稳健性和个体公平性。抽象解释是一种成熟的程序分析技术,在验证神经网络、决策树和支持向量机等多种机器学习算法方面已被证明是成功的。在这项工作中,我们提出了一个基于抽象解释的框架,用于设计 kNN 算法的合理近似版本,并将其实例化为用于近似数值特征范围的区间和带状抽象。我们展示了这种基于抽象的方法如何用于 kNN 的稳定性、鲁棒性和个体公平性认证。我们的认证技术已在多个基准数据集上实现并进行了实验评估。这些实验结果表明,我们的工具能以精确、高效的方式正式证明 kNN 分类器的稳定性,从而扩大了可进行鲁棒性认证的机器学习模型的范围。
{"title":"Robustness verification of k-nearest neighbors by abstract interpretation","authors":"Nicolò Fassina, Francesco Ranzato, Marco Zanella","doi":"10.1007/s10115-024-02108-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02108-4","url":null,"abstract":"<p>We study the certification of stability properties, such as robustness and individual fairness, of the <i>k</i>-nearest neighbor algorithm (<i>k</i>NN). Our approach leverages abstract interpretation, a well-established program analysis technique that has been proven successful in verifying several machine learning algorithms, notably, neural networks, decision trees, and support vector machines. In this work, we put forward an abstract interpretation-based framework for designing a sound approximate version of the <i>k</i>NN algorithm, which is instantiated to the interval and zonotope abstractions for approximating the range of numerical features. We show how this abstraction-based method can be used for stability, robustness, and individual fairness certification of <i>k</i>NN. Our certification technique has been implemented and experimentally evaluated on several benchmark datasets. These experimental results show that our tool can formally prove the stability of <i>k</i>NN classifiers in a precise and efficient way, thus expanding the range of machine learning models amenable to robustness certification.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"2015 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-26DOI: 10.1007/s10115-024-02116-4
Yan Li, Zhenyu Li, Daofu Gong, Qian Hu, Haoyu Lu
The proliferation of social bots on social networks presents significant challenges to network security due to their malicious activities. While graph neural network models have shown promise in detecting social bots, acquiring a large number of high-quality labeled accounts remains challenging, impacting bot detection performance. To address this issue, we introduce BotCL, a social bot detection model that employs contrastive learning through data augmentation. Initially, we build a directed graph based on following/follower relationships, utilizing semantic, attribute, and structural features of accounts as initial node features. We then simulate account behaviors within the social network and apply two data augmentation techniques to generate multiple views of the directed graph. Subsequently, we encode the generated views using relational graph convolutional networks, achieving maximum homogeneity in node representations by minimizing the contrastive loss. Finally, node labels are predicted using Softmax. The proposed method augments data based on its distribution, showcasing robustness to noise. Extensive experimental results on Cresci-2015, Twibot-20, and Twibot-22 datasets demonstrate that our approach surpasses the state-of-the-art methods in terms of performance.
{"title":"BotCL: a social bot detection model based on graph contrastive learning","authors":"Yan Li, Zhenyu Li, Daofu Gong, Qian Hu, Haoyu Lu","doi":"10.1007/s10115-024-02116-4","DOIUrl":"https://doi.org/10.1007/s10115-024-02116-4","url":null,"abstract":"<p>The proliferation of social bots on social networks presents significant challenges to network security due to their malicious activities. While graph neural network models have shown promise in detecting social bots, acquiring a large number of high-quality labeled accounts remains challenging, impacting bot detection performance. To address this issue, we introduce BotCL, a social bot detection model that employs contrastive learning through data augmentation. Initially, we build a directed graph based on following/follower relationships, utilizing semantic, attribute, and structural features of accounts as initial node features. We then simulate account behaviors within the social network and apply two data augmentation techniques to generate multiple views of the directed graph. Subsequently, we encode the generated views using relational graph convolutional networks, achieving maximum homogeneity in node representations by minimizing the contrastive loss. Finally, node labels are predicted using Softmax. The proposed method augments data based on its distribution, showcasing robustness to noise. Extensive experimental results on Cresci-2015, Twibot-20, and Twibot-22 datasets demonstrate that our approach surpasses the state-of-the-art methods in terms of performance.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"15 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140806391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-25DOI: 10.1007/s10115-024-02117-3
Ji Liu, Chunlu Chen, Yu Li, Lin Sun, Yulun Song, Jingbo Zhou, Bo Jing, Dejing Dou
While centralized servers pose a risk of being a single point of failure, decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities. Merging distributed computing with cryptographic techniques, decentralized technologies introduce a novel computing paradigm. Blockchain ensures secure, transparent, and tamper-proof data management by validating and recording transactions via consensus across network nodes. Federated Learning (FL), as a distributed machine learning framework, enables participants to collaboratively train models while safeguarding data privacy by avoiding direct raw data exchange. Despite the growing interest in decentralized methods, their application in FL remains underexplored. This paper presents a thorough investigation into blockchain-based FL (BCFL), spotlighting the synergy between blockchain’s security features and FL’s privacy-preserving model training capabilities. First, we present the taxonomy of BCFL from three aspects, including decentralized, separate networks, and reputation-based architectures. Then, we summarize the general architecture of BCFL systems, providing a comprehensive perspective on FL architectures informed by blockchain. Afterward, we analyze the application of BCFL in healthcare, IoT, and other privacy-sensitive areas. Finally, we identify future research directions of BCFL.
{"title":"Enhancing trust and privacy in distributed networks: a comprehensive survey on blockchain-based federated learning","authors":"Ji Liu, Chunlu Chen, Yu Li, Lin Sun, Yulun Song, Jingbo Zhou, Bo Jing, Dejing Dou","doi":"10.1007/s10115-024-02117-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02117-3","url":null,"abstract":"<p>While centralized servers pose a risk of being a single point of failure, decentralized approaches like blockchain offer a compelling solution by implementing a consensus mechanism among multiple entities. Merging distributed computing with cryptographic techniques, decentralized technologies introduce a novel computing paradigm. Blockchain ensures secure, transparent, and tamper-proof data management by validating and recording transactions via consensus across network nodes. Federated Learning (FL), as a distributed machine learning framework, enables participants to collaboratively train models while safeguarding data privacy by avoiding direct raw data exchange. Despite the growing interest in decentralized methods, their application in FL remains underexplored. This paper presents a thorough investigation into blockchain-based FL (BCFL), spotlighting the synergy between blockchain’s security features and FL’s privacy-preserving model training capabilities. First, we present the taxonomy of BCFL from three aspects, including decentralized, separate networks, and reputation-based architectures. Then, we summarize the general architecture of BCFL systems, providing a comprehensive perspective on FL architectures informed by blockchain. Afterward, we analyze the application of BCFL in healthcare, IoT, and other privacy-sensitive areas. Finally, we identify future research directions of BCFL.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"26 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-22DOI: 10.1007/s10115-024-02095-6
Douglas Castilho, Thársis T. P. Souza, Soong Moon Kang, João Gama, André C. P. L. F. de Carvalho
We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph, Dynamic Minimal Spanning Tree and Dynamic Threshold Networks. Experimental results show that the proposed model can forecast market structure with high predictive performance with up to (40%) improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.
{"title":"Forecasting financial market structure from network features using machine learning","authors":"Douglas Castilho, Thársis T. P. Souza, Soong Moon Kang, João Gama, André C. P. L. F. de Carvalho","doi":"10.1007/s10115-024-02095-6","DOIUrl":"https://doi.org/10.1007/s10115-024-02095-6","url":null,"abstract":"<p>We propose a model that forecasts market correlation structure from link- and node-based financial network features using machine learning. For such, market structure is modeled as a dynamic asset network by quantifying time-dependent co-movement of asset price returns across company constituents of major global market indices. We provide empirical evidence using three different network filtering methods to estimate market structure, namely Dynamic Asset Graph, Dynamic Minimal Spanning Tree and Dynamic Threshold Networks. Experimental results show that the proposed model can forecast market structure with high predictive performance with up to <span>(40%)</span> improvement over a time-invariant correlation-based benchmark. Non-pair-wise correlation features showed to be important compared to traditionally used pair-wise correlation measures for all markets studied, particularly in the long-term forecasting of stock market structure. Evidence is provided for stock constituents of the DAX30, EUROSTOXX50, FTSE100, HANGSENG50, NASDAQ100 and NIFTY50 market indices. Findings can be useful to improve portfolio selection and risk management methods, which commonly rely on a backward-looking covariance matrix to estimate portfolio risk.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"44 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140799367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-20DOI: 10.1007/s10115-024-02110-w
Sotiris Angelis, Efthymia Moraitou, George Caridakis, Konstantinos Kotis
Ontologies constitute the semantic model of Knowledge Graphs (KGs). This structural association indicates the potential existence of methodological analogies in the development of ontologies and KGs. The deployment of fully and well-defined methodologies for KG development based on existing ontology engineering methodologies (OEMs) has been suggested and efficiently applied. However, most of the modern/recent OEMs may not include tasks that (i) empower knowledge workers and domain experts to closely collaborate with ontology engineers and KG specialists for the development and maintenance of KGs, (ii) satisfy special requirements of KG development, such as (a) ensuring modularity and agility of KGs, (b) assessing and mitigating bias at schema and data levels. Toward this aim, the paper presents a methodology for the Collaborative and Hybrid Engineering of Knowledge Graphs (CHEKG), which constitutes a hybrid (schema-centric/top-down and data-driven/bottom-up), collaborative, agile, and iterative approach for developing modular and fair domain-specific KGs. CHEKG contributes to all phases of the KG engineering lifecycle: from the specification of a KG to its exploitation, evaluation, and refinement. The CHEKG methodology is based on the main phases of the extended Human-Centered Collaborative Ontology Engineering Methodology (ext-HCOME), while it adjusts and expands the individual processes and tasks of each phase according to the specialized requirements of KG development. Apart from the presentation of the methodology per se, the paper presents recent work regarding the deployment and evaluation of the CHEKG methodology for the engineering of semantic trajectories as KGs generated from unmanned aerial vehicles (UAVs) data during real cultural heritage documentation scenarios.
本体构成了知识图谱(KG)的语义模型。这种结构上的关联表明,在本体和知识图谱的开发过程中可能存在方法论上的类比。在现有本体工程方法论(OEMs)的基础上,为知识图谱(KGs)的开发部署完全且定义明确的方法论已被建议并有效应用。然而,大多数现代/最新的本体工程方法可能不包括以下任务:(i) 授权知识工作者和领域专家与本体工程师和知识库专家密切合作开发和维护知识库;(ii) 满足知识库开发的特殊要求,如 (a) 确保知识库的模块化和敏捷性;(b) 评估和减少模式和数据层面的偏差。为此,本文提出了一种知识图谱协作与混合工程(CHEKG)方法,它是一种混合(以模式为中心/自上而下和以数据为驱动/自下而上)、协作、敏捷和迭代的方法,用于开发模块化和公平的特定领域知识图谱。CHEKG 对幼稚园工程生命周期的所有阶段都有贡献:从幼稚园的规范到开发、评估和完善。CHEKG 方法论基于扩展的以人为中心的协作本体工程方法论(ext-HCOME)的主要阶段,同时根据 KG 开发的特殊要求,调整和扩展了每个阶段的个别过程和任务。除了介绍该方法论本身,本文还介绍了最近在实际文化遗产文献记录过程中部署和评估 CHEKG 方法论的最新工作,该方法论用于将语义轨迹作为从无人驾驶飞行器(UAV)数据中生成的 KG 进行工程设计。
{"title":"CHEKG: a collaborative and hybrid methodology for engineering modular and fair domain-specific knowledge graphs","authors":"Sotiris Angelis, Efthymia Moraitou, George Caridakis, Konstantinos Kotis","doi":"10.1007/s10115-024-02110-w","DOIUrl":"https://doi.org/10.1007/s10115-024-02110-w","url":null,"abstract":"<p>Ontologies constitute the semantic model of Knowledge Graphs (KGs). This structural association indicates the potential existence of methodological analogies in the development of ontologies and KGs. The deployment of fully and well-defined methodologies for KG development based on existing ontology engineering methodologies (OEMs) has been suggested and efficiently applied. However, most of the modern/recent OEMs may not include tasks that (i) empower knowledge workers and domain experts to closely collaborate with ontology engineers and KG specialists for the development and maintenance of KGs, (ii) satisfy special requirements of KG development, such as (a) ensuring modularity and agility of KGs, (b) assessing and mitigating bias at schema and data levels. Toward this aim, the paper presents a methodology for the Collaborative and Hybrid Engineering of Knowledge Graphs (CHEKG), which constitutes a hybrid (schema-centric/top-down and data-driven/bottom-up), collaborative, agile, and iterative approach for developing modular and fair domain-specific KGs. CHEKG contributes to all phases of the KG engineering lifecycle: from the specification of a KG to its exploitation, evaluation, and refinement. The CHEKG methodology is based on the main phases of the extended Human-Centered Collaborative Ontology Engineering Methodology (ext-HCOME), while it adjusts and expands the individual processes and tasks of each phase according to the specialized requirements of KG development. Apart from the presentation of the methodology per se, the paper presents recent work regarding the deployment and evaluation of the CHEKG methodology for the engineering of semantic trajectories as KGs generated from unmanned aerial vehicles (UAVs) data during real cultural heritage documentation scenarios.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"20 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-18DOI: 10.1007/s10115-024-02094-7
Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou, Antonio Corral
Recommendation (recommender) systems (RS) have played a significant role in both research and industry in recent years. In the area of academia, there is a need to help researchers discover the most appropriate and relevant scientific information through recommendations. Nevertheless, we argue that there is a major gap between academic state-of-the-art RS and real-world problems. In this paper, we present a novel multi-staged RS based on clustering, graph modeling and deep learning that manages to run on a full dataset (scientific digital library) in the magnitude of millions users and items (papers). We run several tests (experiments/evaluation) as a means to find the best approach regarding the tuning of our system; so, we present and compare three versions of our RS regarding recall and NDCG metrics. The results show that a multi-staged RS that utilizes a variety of techniques and algorithms is able to face real-world problems and large academic datasets. In this way, we suggest a way to close or minimize the gap between research and industry value RS.
{"title":"An academic recommender system on large citation data based on clustering, graph modeling and deep learning","authors":"Vaios Stergiopoulos, Michael Vassilakopoulos, Eleni Tousidou, Antonio Corral","doi":"10.1007/s10115-024-02094-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02094-7","url":null,"abstract":"<p>Recommendation (recommender) systems (RS) have played a significant role in both research and industry in recent years. In the area of academia, there is a need to help researchers discover the most appropriate and relevant scientific information through recommendations. Nevertheless, we argue that there is a major gap between academic state-of-the-art RS and real-world problems. In this paper, we present a novel multi-staged RS based on clustering, graph modeling and deep learning that manages to run on a full dataset (scientific digital library) in the magnitude of millions users and items (papers). We run several tests (experiments/evaluation) as a means to find the best approach regarding the tuning of our system; so, we present and compare three versions of our RS regarding recall and NDCG metrics. The results show that a multi-staged RS that utilizes a variety of techniques and algorithms is able to face real-world problems and large academic datasets. In this way, we suggest a way to close or minimize the gap between research and industry value RS.\u0000</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"207 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aspect-based sentiment analysis (ABSA) is a natural language processing technique that seeks to recognize and extract the sentiment connected to various qualities or aspects of a specific good, service, or entity. It entails dissecting a text into its component pieces, determining the elements or aspects being examined, and then examining the attitude stated about each feature or aspect. The main objective of this research is to present a comprehensive understanding of aspect-based sentiment analysis (ABSA), such as its potential, ongoing trends and advancements, structure, practical applications, real-world implementation, and open issues. The current sentiment analysis aims to enhance granularity at the aspect level with two main objectives, including extracting aspects and polarity sentiment classification. Three main methods are designed for aspect extractions: pattern-based, machine learning and deep learning. These methods can capture both syntactic and semantic features of text without relying heavily on high-level feature engineering, which was a requirement in earlier approaches. Despite bringing traditional surveys, a comprehensive survey of the procedure for carrying out this task and the applications of ABSA are also included in this article. To fully comprehend each strategy's benefits and drawbacks, it is evaluated, compared, and investigated. To determine future directions, the ABSA’s difficulties are finally reviewed.
{"title":"Exploring aspect-based sentiment analysis: an in-depth review of current methods and prospects for advancement","authors":"Irfan Ali Kandhro, Fayyaz Ali, Mueen Uddin, Asadullah Kehar, Selvakumar Manickam","doi":"10.1007/s10115-024-02104-8","DOIUrl":"https://doi.org/10.1007/s10115-024-02104-8","url":null,"abstract":"<p>Aspect-based sentiment analysis (ABSA) is a natural language processing technique that seeks to recognize and extract the sentiment connected to various qualities or aspects of a specific good, service, or entity. It entails dissecting a text into its component pieces, determining the elements or aspects being examined, and then examining the attitude stated about each feature or aspect. The main objective of this research is to present a comprehensive understanding of aspect-based sentiment analysis (ABSA), such as its potential, ongoing trends and advancements, structure, practical applications, real-world implementation, and open issues. The current sentiment analysis aims to enhance granularity at the aspect level with two main objectives, including extracting aspects and polarity sentiment classification. Three main methods are designed for aspect extractions: pattern-based, machine learning and deep learning. These methods can capture both syntactic and semantic features of text without relying heavily on high-level feature engineering, which was a requirement in earlier approaches. Despite bringing traditional surveys, a comprehensive survey of the procedure for carrying out this task and the applications of ABSA are also included in this article. To fully comprehend each strategy's benefits and drawbacks, it is evaluated, compared, and investigated. To determine future directions, the ABSA’s difficulties are finally reviewed.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"100 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140623303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1007/s10115-024-02098-3
Chao Huang, Xiaoyue Wu, Mingwei Lin, Zeshui Xu
In some actual decision-making problems, experts may be hesitant to judge the performances of alternatives, which leads to experts providing decision matrices with incomplete information. However, most existing estimation methods for incomplete information in group decision-making (GDM) neglect the hesitant judgments of experts, possibly making the group decision outcomes unreasonable. Considering the hesitation degrees of experts in decision judgments, an approach is proposed based on the triangular intuitionistic fuzzy numbers (TIFNs) and TODIM (interactive and multiple criteria decision-making) method for GDM and consensus measure. First, TIFNs are applied to handle incomplete information due to the hesitant judgments of experts. Second, considering the risk attitudes of experts, a decision-making model is proposed to rank alternatives for GDM with incomplete information. Subsequently, based on measuring the concordance between solutions, a consensus model is presented to measure the group’s and individual’s consensus degrees. Finally, an illustrative example is presented to show the detailed implementation procedure of the proposed approach. The comparisons with some existing estimation methods verify the effectiveness of the proposed approach for handling incomplete information. The impacts and necessities of experts’ hesitation degrees are discussed by a sensitivity analysis.
{"title":"An approach for fuzzy group decision making and consensus measure with hesitant judgments of experts","authors":"Chao Huang, Xiaoyue Wu, Mingwei Lin, Zeshui Xu","doi":"10.1007/s10115-024-02098-3","DOIUrl":"https://doi.org/10.1007/s10115-024-02098-3","url":null,"abstract":"<p>In some actual decision-making problems, experts may be hesitant to judge the performances of alternatives, which leads to experts providing decision matrices with incomplete information. However, most existing estimation methods for incomplete information in group decision-making (GDM) neglect the hesitant judgments of experts, possibly making the group decision outcomes unreasonable. Considering the hesitation degrees of experts in decision judgments, an approach is proposed based on the triangular intuitionistic fuzzy numbers (TIFNs) and TODIM (interactive and multiple criteria decision-making) method for GDM and consensus measure. First, TIFNs are applied to handle incomplete information due to the hesitant judgments of experts. Second, considering the risk attitudes of experts, a decision-making model is proposed to rank alternatives for GDM with incomplete information. Subsequently, based on measuring the concordance between solutions, a consensus model is presented to measure the group’s and individual’s consensus degrees. Finally, an illustrative example is presented to show the detailed implementation procedure of the proposed approach. The comparisons with some existing estimation methods verify the effectiveness of the proposed approach for handling incomplete information. The impacts and necessities of experts’ hesitation degrees are discussed by a sensitivity analysis.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"7 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-17DOI: 10.1007/s10115-024-02115-5
Amir Dehaki Toroghi, Javad Hamidzadeh
Today, the rapid development of online social networks, as well as low costs, easy communication, and quick access with minimal facilities have made social networks an attractive and very influential phenomenon among people. The users of these networks tend to share their sensitive and private information with friends and acquaintances. This has caused the data of these networks to become a very important source of information about users, their interests, feelings, and activities. Analyzing this information can be very useful in predicting the behavior of users in dealing with various issues. But publishing this data for data mining can violate the privacy of users. As a result, data privacy protection of social networks has become an important and attractive research topic. In this context, various algorithms have been proposed, all of which meet privacy requirements by making changes in the information as well as the graph structure. But due to high processing costs and long execution times, these algorithms are not very appropriate for anonymizing big data. In this research, we improved the speed of data anonymization by using the number factorization technique to select and delete the best edges in the graph correction stage. We also used the chaotic krill herd algorithm to add edges, and considering the effect of all edges together on the structure of the graph, we selected edges and added them to the graph so that it preserved the graph’s utility. The evaluation results on the real-world datasets, show the efficiency of the proposed algorithm in comparison with the state-of-the-art methods to reduce the execution time and maintain the utility of the anonymous graph.
{"title":"Protecting the privacy of social network data using graph correction","authors":"Amir Dehaki Toroghi, Javad Hamidzadeh","doi":"10.1007/s10115-024-02115-5","DOIUrl":"https://doi.org/10.1007/s10115-024-02115-5","url":null,"abstract":"<p>Today, the rapid development of online social networks, as well as low costs, easy communication, and quick access with minimal facilities have made social networks an attractive and very influential phenomenon among people. The users of these networks tend to share their sensitive and private information with friends and acquaintances. This has caused the data of these networks to become a very important source of information about users, their interests, feelings, and activities. Analyzing this information can be very useful in predicting the behavior of users in dealing with various issues. But publishing this data for data mining can violate the privacy of users. As a result, data privacy protection of social networks has become an important and attractive research topic. In this context, various algorithms have been proposed, all of which meet privacy requirements by making changes in the information as well as the graph structure. But due to high processing costs and long execution times, these algorithms are not very appropriate for anonymizing big data. In this research, we improved the speed of data anonymization by using the number factorization technique to select and delete the best edges in the graph correction stage. We also used the chaotic krill herd algorithm to add edges, and considering the effect of all edges together on the structure of the graph, we selected edges and added them to the graph so that it preserved the graph’s utility. The evaluation results on the real-world datasets, show the efficiency of the proposed algorithm in comparison with the state-of-the-art methods to reduce the execution time and maintain the utility of the anonymous graph.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"28 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-04-16DOI: 10.1007/s10115-024-02105-7
D. Sudha, M. Krishnamurthy
A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.
{"title":"A fuzzy rough set-based horse herd optimization algorithm for map reduce framework for customer behavior data","authors":"D. Sudha, M. Krishnamurthy","doi":"10.1007/s10115-024-02105-7","DOIUrl":"https://doi.org/10.1007/s10115-024-02105-7","url":null,"abstract":"<p>A large number of association rules often minimizes the reliability of data mining results; hence, a dimensionality reduction technique is crucial for data analysis. When analyzing massive datasets, existing models take more time to scan the entire database because they discover unnecessary items and transactions that are not necessary for data analysis. For this purpose, the Fuzzy Rough Set-based Horse Herd Optimization (FRS-HHO) algorithm is proposed to be integrated with the Map Reduce algorithm to minimize query retrieval time and improve performance. The HHO algorithm minimizes the number of unnecessary items and transactions with minimal support value from the dataset to maximize fitness based on multiple objectives such as support, confidence, interestingness, and lift to evaluate the quality of association rules. The feature value of each item in the population is obtained by a Map Reduce-based fitness function to generate optimal frequent itemsets with minimum time. The Horse Herd Optimization (HHO) is employed to solve the high-dimensional optimization problems. The proposed FRS-HHO approach takes less time to execute for dimensions and has a space complexity of 38% for a total of 10 k transactions. Also, the FRS-HHO approach offers a speedup rate of 17% and a 12% decrease in input–output communication cost when compared to other approaches. The proposed FRS-HHO model enhances performance in terms of execution time, space complexity, and speed.</p>","PeriodicalId":54749,"journal":{"name":"Knowledge and Information Systems","volume":"35 1","pages":""},"PeriodicalIF":2.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}