Journal of Intelligent Information Systems最新文献_第3页

Heuristic approaches for non-exhaustive pattern-based change detection in dynamic networks 动态网络中基于模式的非穷举变化检测启发式方法

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-07-02 DOI: 10.1007/s10844-024-00866-9

Corrado Loglisci, Angelo Impedovo, Toon Calders, Michelangelo Ceci

Dynamic networks are ubiquitous in many domains for modelling evolving graph-structured data and detecting changes allows us to understand the dynamic of the domain represented. A category of computational solutions is represented by the pattern-based change detectors (PBCDs), which are non-parametric unsupervised change detection methods based on observed changes in sets of frequent patterns over time. Patterns have the ability to depict the structural information of the sub-graphs, becoming a useful tool in the interpretation of the changes. Existing PBCDs often rely on exhaustive mining, which corresponds to the worst-case exponential time complexity, making this category of algorithms inefficient in practice. In fact, in such a case, the pattern mining process is even more time-consuming and inefficient due to the combinatorial explosion of the sub-graph pattern space caused by the inherent complexity of the graph structure. Non-exhaustive search strategies can represent a possible approach to this problem, also because not all the possible frequent patterns contribute to changes in the time-evolving data. In this paper, we investigate the viability of different heuristic approaches which prevent the complete exploration of the search space, by returning a concise set of sub-graph patterns (compared to the exhaustive case). The heuristics differ on the criterion used to select representative patterns. The results obtained on real-world and synthetic dynamic networks show that these solutions are effective, when mining patterns, and even more accurate when detecting changes.

动态网络在许多领域都无处不在，用于模拟不断演化的图结构数据，而检测变化可以让我们了解所代表领域的动态。基于模式的变化检测器（PBCD）是一类计算解决方案，它是一种非参数无监督变化检测方法，基于观察到的频繁模式集随时间的变化。模式能够描述子图的结构信息，成为解释变化的有用工具。现有的 PBCD 通常依赖于穷举挖掘，这相当于最坏情况下的指数时间复杂度，使得这类算法在实践中效率低下。事实上，在这种情况下，由于图结构固有的复杂性导致子图模式空间的组合爆炸，模式挖掘过程会更加耗时和低效。非穷举搜索策略是解决这一问题的一种可行方法，这也是因为并非所有可能的频繁模式都会导致数据随时间不断变化。在本文中，我们研究了不同启发式方法的可行性，这些方法通过返回一组简洁的子图模式（与穷举式相比），阻止了对搜索空间的完全探索。启发式方法在选择代表性模式的标准上有所不同。在真实世界和合成动态网络上获得的结果表明，这些解决方案在挖掘模式时非常有效，而在检测变化时则更加准确。

{"title":"Heuristic approaches for non-exhaustive pattern-based change detection in dynamic networks","authors":"Corrado Loglisci, Angelo Impedovo, Toon Calders, Michelangelo Ceci","doi":"10.1007/s10844-024-00866-9","DOIUrl":"https://doi.org/10.1007/s10844-024-00866-9","url":null,"abstract":"Dynamic networks are ubiquitous in many domains for modelling evolving graph-structured data and detecting changes allows us to understand the dynamic of the domain represented. A category of computational solutions is represented by the pattern-based change detectors (PBCDs), which are non-parametric unsupervised change detection methods based on observed changes in sets of frequent patterns over time. Patterns have the ability to depict the structural information of the sub-graphs, becoming a useful tool in the interpretation of the changes. Existing PBCDs often rely on exhaustive mining, which corresponds to the worst-case exponential time complexity, making this category of algorithms inefficient in practice. In fact, in such a case, the pattern mining process is even more time-consuming and inefficient due to the combinatorial explosion of the sub-graph pattern space caused by the inherent complexity of the graph structure. Non-exhaustive search strategies can represent a possible approach to this problem, also because not all the possible frequent patterns contribute to changes in the time-evolving data. In this paper, we investigate the viability of different heuristic approaches which prevent the complete exploration of the search space, by returning a concise set of sub-graph patterns (compared to the exhaustive case). The heuristics differ on the criterion used to select representative patterns. The results obtained on real-world and synthetic dynamic networks show that these solutions are effective, when mining patterns, and even more accurate when detecting changes.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"18 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving graph collaborative filtering with view explorer for social recommendation 利用视图探索器改进图协同过滤，实现社交推荐

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-06-26 DOI: 10.1007/s10844-024-00865-w

Yongrui Duan, Yijun Tu, Yusheng Lu, Xiaofeng Wang

Social recommender systems (SRS) have garnered adequate attention due to the supplementary information provided by social network, which aids in making recommendations. However, social network information contains noise, which can be detrimental to recommendation performance. Current social recommendation models are deficient in feature validation and extraction of social data. To fill that gap, we propose a novel model called Social View Explorer Collaborative Filtering (SVE-CF) which aims to extract significant consistent signals from the noisy social network. First, SVE-CF correlates users’ social and interaction behaviors, creating follow, joint, and interaction views to represent all interaction patterns. Second, it samples unlabeled examples from users to assess consistency across the three views, assigning pseudo-labels as evidence of social homophily. Third, it selects top-k pseudo-labels to amplify significant consistent signals and minimize noise through tri-view joint learning. Extensive experiments are conducted to demonstrate the effectiveness of the proposed model over the commonly used state-of-the-art (SOTA) methods.

社交推荐系统（SRS）因社交网络提供的补充信息而受到广泛关注，这些信息有助于进行推荐。然而，社交网络信息包含噪音，可能会影响推荐性能。当前的社交推荐模型在特征验证和社交数据提取方面存在不足。为了填补这一空白，我们提出了一种名为 "社交观点探索者协同过滤（SVE-CF）"的新模型，旨在从嘈杂的社交网络中提取重要的一致信号。首先，SVE-CF 将用户的社交和互动行为关联起来，创建关注视图、联合视图和互动视图来代表所有互动模式。其次，SVE-CF 从用户中抽取未标记的示例来评估这三种视图的一致性，并分配伪标签作为社交亲缘关系的证据。第三，它通过三视图联合学习，选择前 k 个伪标签，以放大重要的一致信号，尽量减少噪音。我们进行了广泛的实验，以证明所提出的模型比常用的最先进（SOTA）方法更有效。

{"title":"Improving graph collaborative filtering with view explorer for social recommendation","authors":"Yongrui Duan, Yijun Tu, Yusheng Lu, Xiaofeng Wang","doi":"10.1007/s10844-024-00865-w","DOIUrl":"https://doi.org/10.1007/s10844-024-00865-w","url":null,"abstract":"Social recommender systems (SRS) have garnered adequate attention due to the supplementary information provided by social network, which aids in making recommendations. However, social network information contains noise, which can be detrimental to recommendation performance. Current social recommendation models are deficient in feature validation and extraction of social data. To fill that gap, we propose a novel model called Social View Explorer Collaborative Filtering (SVE-CF) which aims to extract significant consistent signals from the noisy social network. First, SVE-CF correlates users’ social and interaction behaviors, creating follow, joint, and interaction views to represent all interaction patterns. Second, it samples unlabeled examples from users to assess consistency across the three views, assigning pseudo-labels as evidence of social homophily. Third, it selects top-k pseudo-labels to amplify significant consistent signals and minimize noise through tri-view joint learning. Extensive experiments are conducted to demonstrate the effectiveness of the proposed model over the commonly used state-of-the-art (SOTA) methods.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"46 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving search and rescue planning and resource allocation through case-based and concept-based retrieval 通过基于案例和概念的检索改进搜救规划和资源分配

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-06-01 DOI: 10.1007/s10844-024-00861-0

Wajeeha Nasar, Ricardo da Silva Torres, Odd Erik Gundersen, Anniken Susanne Thoresen Karlsen

The need for effective and efficient search and rescue operations is more important than ever as the frequency and severity of disasters increase due to the escalating effects of climate change. Recognizing the value of personal knowledge and past experiences of experts, in this paper, we present findings of an investigation of how past knowledge and experts’ experiences can be effectively integrated with current search and rescue practices to improve rescue planning and resource allocation. A special focus is on investigating and demonstrating the potential associated with integrating knowledge graphs and case-based reasoning as a viable approach for search and rescue decision support. As part of our investigation, we have implemented a demonstrator system using a Norwegian search and rescue dataset and case-based and concept-based similarity retrieval. The main contribution of the paper is insight into how case-based and concept-based retrieval services can be designed to improve the effectiveness of search and rescue planning. To evaluate the validity of ranked cases in terms of how they align with the existing knowledge and insights of search and rescue experts, we use evaluation measures such as precision and recall. In our evaluation, we observed that attributes, such as the rescue operation type, have high precision, while the precision associated with the objects involved is relatively low. Central findings from our evaluation process are that knowledge-based creation, as well as case- and concept-based similarity retrieval services, can be beneficial in optimizing search and rescue planning time and allocating appropriate resources according to search and rescue incident descriptions.

由于气候变化的影响日益加剧，灾害发生的频率和严重程度都在增加，因此，有效和高效的搜救行动比以往任何时候都更加重要。认识到专家个人知识和过往经验的价值，我们在本文中介绍了如何将过往知识和专家经验与当前搜救实践有效结合，以改进救援规划和资源分配的调查结果。本文的一个特别重点是研究和论证将知识图谱和基于案例的推理作为搜救决策支持的一种可行方法的潜力。作为研究的一部分，我们利用挪威的搜救数据集以及基于案例和概念的相似性检索实施了一个示范系统。本文的主要贡献在于深入探讨了如何设计基于案例和概念的检索服务，以提高搜救计划的有效性。为了评估已排序案例的有效性，即它们如何与搜救专家的现有知识和见解保持一致，我们使用了精确度和召回率等评估指标。在评估过程中，我们发现救援行动类型等属性的精确度较高，而与涉及对象相关的精确度则相对较低。我们在评估过程中得出的主要结论是，基于知识的创建以及基于案例和概念的相似性检索服务可有助于优化搜救计划时间，并根据搜救事件描述分配适当的资源。

{"title":"Improving search and rescue planning and resource allocation through case-based and concept-based retrieval","authors":"Wajeeha Nasar, Ricardo da Silva Torres, Odd Erik Gundersen, Anniken Susanne Thoresen Karlsen","doi":"10.1007/s10844-024-00861-0","DOIUrl":"https://doi.org/10.1007/s10844-024-00861-0","url":null,"abstract":"The need for effective and efficient search and rescue operations is more important than ever as the frequency and severity of disasters increase due to the escalating effects of climate change. Recognizing the value of personal knowledge and past experiences of experts, in this paper, we present findings of an investigation of how past knowledge and experts’ experiences can be effectively integrated with current search and rescue practices to improve rescue planning and resource allocation. A special focus is on investigating and demonstrating the potential associated with integrating knowledge graphs and case-based reasoning as a viable approach for search and rescue decision support. As part of our investigation, we have implemented a demonstrator system using a Norwegian search and rescue dataset and case-based and concept-based similarity retrieval. The main contribution of the paper is insight into how case-based and concept-based retrieval services can be designed to improve the effectiveness of search and rescue planning. To evaluate the validity of ranked cases in terms of how they align with the existing knowledge and insights of search and rescue experts, we use evaluation measures such as precision and recall. In our evaluation, we observed that attributes, such as the rescue operation type, have high precision, while the precision associated with the objects involved is relatively low. Central findings from our evaluation process are that knowledge-based creation, as well as case- and concept-based similarity retrieval services, can be beneficial in optimizing search and rescue planning time and allocating appropriate resources according to search and rescue incident descriptions.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"17 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141194227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generative adversarial meta-learning knowledge graph completion for large-scale complex knowledge graphs 大规模复杂知识图谱的生成对抗元学习知识图谱补全

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-05-28 DOI: 10.1007/s10844-024-00860-1

Weiming Tong, Xu Chu, Zhongwei Li, Liguo Tan, Jinxiao Zhao, Feng Pan

In the study of large-scale complex knowledge graphs, due to the incompleteness of knowledge and the existence of low-frequency knowledge samples, existing knowledge graph complementation methods are often limited by the amount of data and ignore the complex semantic information. To solve this problem, this paper proposes a knowledge graph completion method CGAML based on the combination of Conditional Generative Adversarial Network and Meta-Learning, which utilizes the hierarchical background knowledge as the basis and introduces conditional variables in the Generative Adversarial Network to represent the required semantic information to constrain the semantic attributes of the generated knowledge. In addition, we design a meta-learning multi-task framework to embed Conditional Generative Adversarial Networks into the meta-learning process and propose local constraints and global gradient optimization strategies to quickly adapt to new tasks and improve computational efficiency. Empirically, our method demonstrates superior performance in realizing few-shot link prediction when compared to existing representative methods.

在大规模复杂知识图谱的研究中，由于知识的不完整性和低频知识样本的存在，现有的知识图谱补全方法往往受限于数据量而忽略了复杂的语义信息。为解决这一问题，本文提出了一种基于条件生成对抗网络和元学习相结合的知识图谱补全方法 CGAML，该方法以分层背景知识为基础，在生成对抗网络中引入条件变量来表示所需的语义信息，从而约束生成知识的语义属性。此外，我们还设计了元学习多任务框架，将条件生成对抗网络嵌入元学习过程，并提出了局部约束和全局梯度优化策略，以快速适应新任务并提高计算效率。从经验上看，与现有的代表性方法相比，我们的方法在实现少量链接预测方面表现出了卓越的性能。

{"title":"Generative adversarial meta-learning knowledge graph completion for large-scale complex knowledge graphs","authors":"Weiming Tong, Xu Chu, Zhongwei Li, Liguo Tan, Jinxiao Zhao, Feng Pan","doi":"10.1007/s10844-024-00860-1","DOIUrl":"https://doi.org/10.1007/s10844-024-00860-1","url":null,"abstract":"In the study of large-scale complex knowledge graphs, due to the incompleteness of knowledge and the existence of low-frequency knowledge samples, existing knowledge graph complementation methods are often limited by the amount of data and ignore the complex semantic information. To solve this problem, this paper proposes a knowledge graph completion method CGAML based on the combination of Conditional Generative Adversarial Network and Meta-Learning, which utilizes the hierarchical background knowledge as the basis and introduces conditional variables in the Generative Adversarial Network to represent the required semantic information to constrain the semantic attributes of the generated knowledge. In addition, we design a meta-learning multi-task framework to embed Conditional Generative Adversarial Networks into the meta-learning process and propose local constraints and global gradient optimization strategies to quickly adapt to new tasks and improve computational efficiency. Empirically, our method demonstrates superior performance in realizing few-shot link prediction when compared to existing representative methods.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"28 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Improving the clarity of questions in Community Question Answering networks 提高社区答疑网络中问题的清晰度

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-05-02 DOI: 10.1007/s10844-024-00847-y

Alireza Khabbazan, Ahmad Ali Abin, Viet-Vu Vu

Every day, thousands of questions are asked on the Community Question Answering network, making these questions and answers extremely valuable for information seekers around the world. However, a significant proportion of these questions do not elicit proper answers. There are several reasons for this, with the lack of clarity in questions being one of the most crucial factors. In this study, our primary focus is on enhancing the clarity of unclear questions in Community Question Answering networks. In the first step, DistilBERT, which uses Siamese and triplet network structures for meaningful sentence embeddings, is combined with HDBSCAN, effective in diverse noise datasets and less sensitive to density variations, to extract unique features from each question. Questions were then categorized as clear or unclear using an Extremely Randomized Trees ensemble model, known for its robust resistance to class imbalance, with more than 90% accuracy. Next, efforts were made to extract information that could enhance the clarity of unclear questions by comparing them with similar, clearer questions using Dynamic Time Warping, a versatile technique suitable for time series analyses in information systems and applicable across various domains. Finally, the extracted information was incorporated into the feature vector of unclear questions based on histogram-coverage methods to enhance their clarity. When a question is made clearer, the missing information and its importance are shown to the questioner. This enables the questioner to be aware of the missing information and facilitates them in clarifying the question.

每天都有成千上万的问题在社区问题解答网络上提出，这些问题和答案对于世界各地的信息查询者来说极为宝贵。然而，这些问题中有很大一部分并没有得到正确的回答。造成这种情况的原因有几个，其中最关键的因素是问题不够清晰。在本研究中，我们的主要重点是提高社区问题解答网络中不明确问题的清晰度。第一步，我们将使用连体网络结构和三元组网络结构进行有意义句子嵌入的 DistilBERT 与在各种噪声数据集中都很有效且对密度变化不太敏感的 HDBSCAN 结合起来，从每个问题中提取独特的特征。然后，使用极随机树集合模型将问题分为清楚或不清楚的类别，该模型以其对类别不平衡的强大抵抗力而著称，准确率超过 90%。接下来，我们使用动态时间扭曲技术（一种适用于信息系统中时间序列分析的通用技术，适用于各种领域），通过将不清楚的问题与类似的、更清楚的问题进行比较，努力提取可提高不清楚问题清晰度的信息。最后，根据直方图覆盖法，将提取的信息纳入不清晰问题的特征向量，以提高问题的清晰度。当问题更清晰时，缺失的信息及其重要性就会显示给提问者。这能让提问者意识到缺失的信息，便于他们澄清问题。

{"title":"Improving the clarity of questions in Community Question Answering networks","authors":"Alireza Khabbazan, Ahmad Ali Abin, Viet-Vu Vu","doi":"10.1007/s10844-024-00847-y","DOIUrl":"https://doi.org/10.1007/s10844-024-00847-y","url":null,"abstract":"Every day, thousands of questions are asked on the Community Question Answering network, making these questions and answers extremely valuable for information seekers around the world. However, a significant proportion of these questions do not elicit proper answers. There are several reasons for this, with the lack of clarity in questions being one of the most crucial factors. In this study, our primary focus is on enhancing the clarity of unclear questions in Community Question Answering networks. In the first step, DistilBERT, which uses Siamese and triplet network structures for meaningful sentence embeddings, is combined with HDBSCAN, effective in diverse noise datasets and less sensitive to density variations, to extract unique features from each question. Questions were then categorized as clear or unclear using an Extremely Randomized Trees ensemble model, known for its robust resistance to class imbalance, with more than 90% accuracy. Next, efforts were made to extract information that could enhance the clarity of unclear questions by comparing them with similar, clearer questions using Dynamic Time Warping, a versatile technique suitable for time series analyses in information systems and applicable across various domains. Finally, the extracted information was incorporated into the feature vector of unclear questions based on histogram-coverage methods to enhance their clarity. When a question is made clearer, the missing information and its importance are shown to the questioner. This enables the questioner to be aware of the missing information and facilitates them in clarifying the question.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"30 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140888363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Relation representation based on private and shared features for adaptive few-shot link prediction 基于私人和共享特征的关系表征，用于自适应少量链接预测

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-04-10 DOI: 10.1007/s10844-024-00856-x

Weiwen Zhang, Canqun Yang

Although Knowledge Graphs (KGs) provide great value in many applications, they are often incomplete with many missing facts. KG Completion (KGC) is a popular technique for knowledge supplement. However, there are two fundamental challenges for KGC. One challenge is that few entity pairs are often available for most relations, and the other is that there exists complex relations, including one-to-many (1-N), many-to-one (N-1), and many-to-many (N-N). In this paper, we propose a new model to accomplish Few-shot KG Completion (FKGC) under complex relations, which is called Relation representation based on Private and Shared features for Adaptive few-shot link prediction (RPSA). In this model, we utilize the hierarchical attention mechanism for extracting the essential and crucial hidden information regarding the entity’s neighborhood so as to improve its representation. To enhance the representation of few-shot relations, we extract the private features (i.e., unique feature of each entity pair that represents the few-shot relation) and shared features (i.e., one or more commonalities among a few entity pairs that represent the few-shot relation). Specifically, a private feature extractor is used to extract the private semantic feature of the few-shot relation in the entity pair. After that, we design a shared feature extractor to extract the shared semantic features among a few reference entity pairs in the few-shot relation. Moreover, an adaptive aggregator aggregates several representations of the few-shot relation about the query. We conduct experiments on three datasets, including NELL-One, CoDEx-S-One and CoDEx-M-One datasets. According to the experimental results, the RPSA’s performance is better than that of the existing FKGC models. In addition, the RPSA model can also handle complex relations well, even in the few-shot scenario.

尽管知识图谱（KG）在许多应用中都具有重要价值，但它们往往并不完整，存在许多缺失的事实。知识图谱补全（KGC）是一种流行的知识补充技术。然而，KGC 面临两个基本挑战。一个挑战是大多数关系通常只有很少的实体对，另一个挑战是存在复杂的关系，包括一对多（1-N）、多对一（N-1）和多对多（N-N）关系。在本文中，我们提出了一种新的模型来完成复杂关系下的少量链接完成（FKGC），即基于私有和共享特征的自适应少量链接预测关系表示（RPSA）。在该模型中，我们利用分层关注机制来提取实体邻域的重要隐藏信息，从而改进实体的表示。为了增强少许关系的表示，我们提取了私有特征（即代表少许关系的每对实体的唯一特征）和共享特征（即代表少许关系的几对实体之间的一个或多个共性）。具体来说，私人特征提取器用于提取实体对中少数关系的私人语义特征。然后，我们设计了一个共享特征提取器，以提取少数几个参考实体对中的少数几个共享语义特征。此外，自适应聚合器还能聚合有关查询的少数几个关系表征。我们在三个数据集上进行了实验，包括 NELL-One、CoDEx-S-One 和 CoDEx-M-One 数据集。实验结果表明，RPSA 的性能优于现有的 FKGC 模型。此外，RPSA 模型还能很好地处理复杂的关系，即使是在少数几个镜头的情况下也是如此。

{"title":"Relation representation based on private and shared features for adaptive few-shot link prediction","authors":"Weiwen Zhang, Canqun Yang","doi":"10.1007/s10844-024-00856-x","DOIUrl":"https://doi.org/10.1007/s10844-024-00856-x","url":null,"abstract":"Although Knowledge Graphs (KGs) provide great value in many applications, they are often incomplete with many missing facts. KG Completion (KGC) is a popular technique for knowledge supplement. However, there are two fundamental challenges for KGC. One challenge is that few entity pairs are often available for most relations, and the other is that there exists complex relations, including one-to-many (1-N), many-to-one (N-1), and many-to-many (N-N). In this paper, we propose a new model to accomplish Few-shot KG Completion (FKGC) under complex relations, which is called Relation representation based on Private and Shared features for Adaptive few-shot link prediction (RPSA). In this model, we utilize the hierarchical attention mechanism for extracting the essential and crucial hidden information regarding the entity’s neighborhood so as to improve its representation. To enhance the representation of few-shot relations, we extract the private features (i.e., unique feature of each entity pair that represents the few-shot relation) and shared features (i.e., one or more commonalities among a few entity pairs that represent the few-shot relation). Specifically, a private feature extractor is used to extract the private semantic feature of the few-shot relation in the entity pair. After that, we design a shared feature extractor to extract the shared semantic features among a few reference entity pairs in the few-shot relation. Moreover, an adaptive aggregator aggregates several representations of the few-shot relation about the query. We conduct experiments on three datasets, including NELL-One, CoDEx-S-One and CoDEx-M-One datasets. According to the experimental results, the RPSA’s performance is better than that of the existing FKGC models. In addition, the RPSA model can also handle complex relations well, even in the few-shot scenario.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"16 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140568877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ERABQS: entity resolution based on active machine learning and balancing query strategy ERABQS：基于主动机器学习和平衡查询策略的实体解析

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-03-26 DOI: 10.1007/s10844-024-00853-0

Jabrane Mourad, Tabbaa Hiba, Rochd Yassir, Hafidi Imad

Entity Resolution (ER) is a crucial process in the field of data management and integration. The primary goal of ER is to identify different profiles (or records) that refer to the same real-world entity across databases. The challenging problem is that labeling a large sample of profiles can be very expensive and time-consuming. Active Machine Learning (ActiveML) addresses this issue by selecting the most representative or informative profiles pairs to be labeled. The informativeness is determined by the capacity to diminish the uncertainty of the model. Conversely, representativeness evaluates whether a selected instance effectively reflects the overall input patterns of unlabeled data. Traditional ActiveML techniques typically rely on one strategy, Which may severely restrict the performance of the ActiveML process and lead to slow convergence. Especially in ER problems with a lack of initial training data. In this paper, we overcame this issue by inventing an approach for balancing the two above strategies. The implemented solution named EBEES (Epsilon-based Balancing Exploration and Exploitation Strategy), Which contains two variations: Adaptive-(epsilon ) and (epsilon )-decreasing. We evaluated the EBEES on twelve datasets. Comparing the EBEES strategy against the state-of-the-art methods, without an initial training data, showed an enhanced performance in terms of F1-score, model stability, and rapid convergence.

实体解析（ER）是数据管理和集成领域的一个重要过程。实体解析的主要目标是识别数据库中指向同一现实世界实体的不同配置文件（或记录）。具有挑战性的问题是，标注大量档案样本可能非常昂贵和耗时。主动机器学习（ActiveML）通过选择最具代表性或信息量最大的配置文件对进行标注来解决这一问题。信息量取决于降低模型不确定性的能力。反之，代表性则评估所选实例是否能有效反映未标记数据的整体输入模式。传统的 ActiveML 技术通常依赖于一种策略，这可能会严重限制 ActiveML 过程的性能，导致收敛缓慢。尤其是在缺乏初始训练数据的 ER 问题中。在本文中，我们发明了一种平衡上述两种策略的方法，从而克服了这一问题。所实现的解决方案被命名为 EBEES（基于 Epsilon 的平衡探索和利用策略），它包含两种变化：自适应-（epsilon ）和（epsilon ）-递减。我们在 12 个数据集上对 EBEES 进行了评估。在没有初始训练数据的情况下，将 EBEES 策略与最先进的方法进行比较，结果显示，EBEES 在 F1 分数、模型稳定性和快速收敛方面的性能都有所提高。

{"title":"ERABQS: entity resolution based on active machine learning and balancing query strategy","authors":"Jabrane Mourad, Tabbaa Hiba, Rochd Yassir, Hafidi Imad","doi":"10.1007/s10844-024-00853-0","DOIUrl":"https://doi.org/10.1007/s10844-024-00853-0","url":null,"abstract":"Entity Resolution (ER) is a crucial process in the field of data management and integration. The primary goal of ER is to identify different profiles (or records) that refer to the same real-world entity across databases. The challenging problem is that labeling a large sample of profiles can be very expensive and time-consuming. Active Machine Learning (ActiveML) addresses this issue by selecting the most representative or informative profiles pairs to be labeled. The informativeness is determined by the capacity to diminish the uncertainty of the model. Conversely, representativeness evaluates whether a selected instance effectively reflects the overall input patterns of unlabeled data. Traditional ActiveML techniques typically rely on one strategy, Which may severely restrict the performance of the ActiveML process and lead to slow convergence. Especially in ER problems with a lack of initial training data. In this paper, we overcame this issue by inventing an approach for balancing the two above strategies. The implemented solution named EBEES (Epsilon-based Balancing Exploration and Exploitation Strategy), Which contains two variations: Adaptive-(epsilon ) and (epsilon )-decreasing. We evaluated the EBEES on twelve datasets. Comparing the EBEES strategy against the state-of-the-art methods, without an initial training data, showed an enhanced performance in terms of F1-score, model stability, and rapid convergence.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"63 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploring and mitigating gender bias in book recommender systems with explicit feedback 利用明确反馈探索和减轻图书推荐系统中的性别偏见

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-03-25 DOI: 10.1007/s10844-023-00827-8

Abstract

Recommender systems are indispensable because they influence our day-to-day behavior and decisions by giving us personalized suggestions. Services like Kindle, YouTube, and Netflix depend heavily on the performance of their recommender systems to ensure that their users have a good experience and to increase revenues. Despite their popularity, it has been shown that recommender systems reproduce and amplify the bias present in the real world. The resulting feedback creates a self-perpetuating loop that deteriorates the user experience and results in homogenizing recommendations over time. Further, biased recommendations can also reinforce stereotypes based on gender or ethnicity, thus reinforcing the filter bubbles that we live in. In this paper, we address the problem of gender bias in recommender systems with explicit feedback. We propose a model to quantify the gender bias present in book rating datasets and in the recommendations produced by the recommender systems. Our main contribution is to provide a principled approach to mitigate the bias being produced in the recommendations. We theoretically show that the proposed approach provides unbiased recommendations despite biased data. Through empirical evaluation of publicly available book rating datasets, we further show that the proposed model can significantly reduce bias without significant impact on accuracy and outperforms the existing model in terms of bias. Our method is model-agnostic and can be applied to any recommender system. To demonstrate the performance of our model, we present the results on four recommender algorithms, two from the K-nearest neighbors family, UserKNN and ItemKNN, and the other two from the matrix factorization family, Alternating Least Square and Singular Value Decomposition. The extensive simulations of various recommender algorithms show the generality of the proposed approach.

摘要推荐系统是不可或缺的，因为它通过向我们提供个性化建议来影响我们的日常行为和决策。Kindle、YouTube 和 Netflix 等服务都非常依赖其推荐系统的性能，以确保用户获得良好体验并增加收入。尽管推荐系统大受欢迎，但事实证明，它复制并放大了现实世界中存在的偏见。由此产生的反馈会形成一个自我循环，随着时间的推移，用户体验会越来越差，推荐也会越来越同质化。此外，有偏见的推荐还会强化基于性别或种族的刻板印象，从而强化我们生活中的过滤泡沫。在本文中，我们通过明确的反馈来解决推荐系统中的性别偏见问题。我们提出了一个模型，用于量化图书评级数据集和推荐系统所产生的推荐中存在的性别偏见。我们的主要贡献在于提供了一种有原则的方法来减少推荐中产生的偏差。我们从理论上证明，尽管数据存在偏差，所提出的方法仍能提供无偏见的推荐。通过对公开的图书评级数据集进行实证评估，我们进一步表明，所提出的模型可以在不对准确性产生重大影响的情况下显著减少偏差，而且在偏差方面优于现有模型。我们的方法与模型无关，可应用于任何推荐系统。为了证明我们模型的性能，我们展示了四种推荐算法的结果，其中两种是 K 近邻算法系列：UserKNN 和 ItemKNN，另外两种是矩阵因式分解算法系列：交替最小平方和奇异值分解。对各种推荐算法的大量模拟显示了所提方法的通用性。

{"title":"Exploring and mitigating gender bias in book recommender systems with explicit feedback","authors":"","doi":"10.1007/s10844-023-00827-8","DOIUrl":"https://doi.org/10.1007/s10844-023-00827-8","url":null,"abstract":"<h3>Abstract</h3> Recommender systems are indispensable because they influence our day-to-day behavior and decisions by giving us personalized suggestions. Services like Kindle, YouTube, and Netflix depend heavily on the performance of their recommender systems to ensure that their users have a good experience and to increase revenues. Despite their popularity, it has been shown that recommender systems reproduce and amplify the bias present in the real world. The resulting feedback creates a self-perpetuating loop that deteriorates the user experience and results in homogenizing recommendations over time. Further, biased recommendations can also reinforce stereotypes based on gender or ethnicity, thus reinforcing the filter bubbles that we live in. In this paper, we address the problem of gender bias in recommender systems with explicit feedback. We propose a model to quantify the gender bias present in book rating datasets and in the recommendations produced by the recommender systems. Our main contribution is to provide a principled approach to mitigate the bias being produced in the recommendations. We theoretically show that the proposed approach provides unbiased recommendations despite biased data. Through empirical evaluation of publicly available book rating datasets, we further show that the proposed model can significantly reduce bias without significant impact on accuracy and outperforms the existing model in terms of bias. Our method is model-agnostic and can be applied to any recommender system. To demonstrate the performance of our model, we present the results on four recommender algorithms, two from the K-nearest neighbors family, UserKNN and ItemKNN, and the other two from the matrix factorization family, Alternating Least Square and Singular Value Decomposition. The extensive simulations of various recommender algorithms show the generality of the proposed approach.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"27 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140298886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Leveraging distant supervision and deep learning for twitter sentiment and emotion classification 利用远程监督和深度学习进行 twitter 情感和情绪分类

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-03-22 DOI: 10.1007/s10844-024-00845-0

Muhamet Kastrati, Zenun Kastrati, Ali Shariq Imran, Marenglen Biba

Nowadays, various applications across industries, healthcare, and security have begun adopting automatic sentiment analysis and emotion detection in short texts, such as posts from social media. Twitter stands out as one of the most popular online social media platforms due to its easy, unique, and advanced accessibility using the API. On the other hand, supervised learning is the most widely used paradigm for tasks involving sentiment polarity and fine-grained emotion detection in short and informal texts, such as Twitter posts. However, supervised learning models are data-hungry and heavily reliant on abundant labeled data, which remains a challenge. This study aims to address this challenge by creating a large-scale real-world dataset of 17.5 million tweets. A distant supervision approach relying on emojis available in tweets is applied to label tweets corresponding to Ekman’s six basic emotions. Additionally, we conducted a series of experiments using various conventional machine learning models and deep learning, including transformer-based models, on our dataset to establish baseline results. The experimental results and an extensive ablation analysis on the dataset showed that BiLSTM with FastText and an attention mechanism outperforms other models in both classification tasks, achieving an F1-score of 70.92% for sentiment classification and 54.85% for emotion detection.

如今，各行各业、医疗保健和安全领域的各种应用都开始采用自动情感分析和情感检测短文，如社交媒体上的帖子。Twitter 是最受欢迎的在线社交媒体平台之一，因为它可以使用 API 进行简单、独特和先进的访问。另一方面，对于涉及情感极性和短篇非正式文本（如 Twitter 帖子）中细粒度情感检测的任务，监督学习是最广泛使用的范式。然而，监督学习模型对数据要求较高，严重依赖丰富的标记数据，这仍然是一个挑战。本研究旨在通过创建一个包含 1750 万条推文的大规模真实世界数据集来应对这一挑战。我们采用了一种远距离监督方法，依靠推文中的表情符号来标记与埃克曼的六种基本情绪相对应的推文。此外，我们还在数据集上使用各种传统机器学习模型和深度学习（包括基于变换器的模型）进行了一系列实验，以确定基线结果。实验结果和对数据集的广泛消融分析表明，带有 FastText 和注意力机制的 BiLSTM 在两项分类任务中都优于其他模型，情感分类的 F1 分数达到 70.92%，情感检测的 F1 分数达到 54.85%。

{"title":"Leveraging distant supervision and deep learning for twitter sentiment and emotion classification","authors":"Muhamet Kastrati, Zenun Kastrati, Ali Shariq Imran, Marenglen Biba","doi":"10.1007/s10844-024-00845-0","DOIUrl":"https://doi.org/10.1007/s10844-024-00845-0","url":null,"abstract":"Nowadays, various applications across industries, healthcare, and security have begun adopting automatic sentiment analysis and emotion detection in short texts, such as posts from social media. Twitter stands out as one of the most popular online social media platforms due to its easy, unique, and advanced accessibility using the API. On the other hand, supervised learning is the most widely used paradigm for tasks involving sentiment polarity and fine-grained emotion detection in short and informal texts, such as Twitter posts. However, supervised learning models are data-hungry and heavily reliant on abundant labeled data, which remains a challenge. This study aims to address this challenge by creating a large-scale real-world dataset of 17.5 million tweets. A distant supervision approach relying on emojis available in tweets is applied to label tweets corresponding to Ekman’s six basic emotions. Additionally, we conducted a series of experiments using various conventional machine learning models and deep learning, including transformer-based models, on our dataset to establish baseline results. The experimental results and an extensive ablation analysis on the dataset showed that BiLSTM with FastText and an attention mechanism outperforms other models in both classification tasks, achieving an F1-score of 70.92% for sentiment classification and 54.85% for emotion detection.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"77 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140197884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Knowledge-aware adaptive graph network for commonsense question answering 用于常识性问题解答的知识感知自适应图网络

IF 3.4 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Journal of Intelligent Information Systems

Pub Date : 2024-03-19 DOI: 10.1007/s10844-024-00854-z

Long Kang, Xiaoge Li, Xiaochun An

Commonsense Question Answering (CQA) aims to select the correct answers to common knowledge questions. Most existing approaches focus on integrating external knowledge graph (KG) representations with question context representations to facilitate reasoning. However, the approaches cannot effectively select the correct answer due to (i) the incomplete reasoning chains when using knowledge graphs as external knowledge, and (ii) the insufficient understanding of semantic information of the question during the reasoning process. Here we propose a novel model, KA-AGN. First, we utilize a joint representation of dependency parse trees and language models to describe QA pairs. Next, we introduce question semantic information as nodes into a knowledge subgraph and compute the correlations between nodes using adaptive graph networks. Finally, bidirectional attention and graph pruning are employed to update the question representation and the knowledge subgraph representation. To evaluate the performance of our method, we conducted experiments on two widely used benchmark datasets: CommonsenseQA and OpenBookQA. The ablation experiment results demonstrate the effectiveness of the adaptive graph network in enhancing reasoning chains, while showing the ability of the joint representation of dependency parse trees and language models to correctly understand question semantics. Our code is publicly available at https://github.com/agfsghfdhg/KAAGN-main.

常识性问题解答（CQA）旨在为常识性问题选择正确答案。现有的大多数方法都侧重于将外部知识图谱（KG）表示法与问题上下文表示法相结合，以促进推理。然而，由于(i) 使用知识图谱作为外部知识时推理链不完整，以及(ii) 在推理过程中对问题的语义信息理解不足，这些方法无法有效地选出正确答案。在此，我们提出了一个新颖的模型--KA-AGN。首先，我们利用依赖解析树和语言模型的联合表示来描述 QA 对。接下来，我们将问题语义信息作为节点引入知识子图，并利用自适应图网络计算节点之间的相关性。最后，我们采用双向关注和图修剪来更新问题表征和知识子图表征。为了评估我们方法的性能，我们在两个广泛使用的基准数据集上进行了实验：CommonsenseQA 和 OpenBookQA。消融实验结果证明了自适应图网络在增强推理链方面的有效性，同时也展示了依赖解析树和语言模型的联合表示法正确理解问题语义的能力。我们的代码可在 https://github.com/agfsghfdhg/KAAGN-main 上公开获取。

{"title":"Knowledge-aware adaptive graph network for commonsense question answering","authors":"Long Kang, Xiaoge Li, Xiaochun An","doi":"10.1007/s10844-024-00854-z","DOIUrl":"https://doi.org/10.1007/s10844-024-00854-z","url":null,"abstract":"Commonsense Question Answering (CQA) aims to select the correct answers to common knowledge questions. Most existing approaches focus on integrating external knowledge graph (KG) representations with question context representations to facilitate reasoning. However, the approaches cannot effectively select the correct answer due to (i) the incomplete reasoning chains when using knowledge graphs as external knowledge, and (ii) the insufficient understanding of semantic information of the question during the reasoning process. Here we propose a novel model, KA-AGN. First, we utilize a joint representation of dependency parse trees and language models to describe QA pairs. Next, we introduce question semantic information as nodes into a knowledge subgraph and compute the correlations between nodes using adaptive graph networks. Finally, bidirectional attention and graph pruning are employed to update the question representation and the knowledge subgraph representation. To evaluate the performance of our method, we conducted experiments on two widely used benchmark datasets: CommonsenseQA and OpenBookQA. The ablation experiment results demonstrate the effectiveness of the adaptive graph network in enhancing reasoning chains, while showing the ability of the joint representation of dependency parse trees and language models to correctly understand question semantics. Our code is publicly available at https://github.com/agfsghfdhg/KAAGN-main.","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"70 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140168679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0