首页 > 最新文献

Information Processing & Management最新文献

英文 中文
Adaptive overlap penalization and probabilistic modeling in hypergraph influence maximization 超图中的自适应重叠惩罚和概率建模影响最大化
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-03 DOI: 10.1016/j.ipm.2025.104594
Lingyu Wu , Cong Li , Bo Qu , Xiang Li
Influence maximization (IM) algorithms aim to iteratively identify a seed set that could maximize the spreading range. In this paper, we concentrate on the hypergraph influence maximization (HyperIM) problem. The overlapping neighborhood caused by the higher-order interactions leads to an overestimation of the diffusion capability of candidate nodes. Moreover, selecting the candidate node with a high infected probability as a new seed node is low payoff with a small influence range gain. Thus, we develop adaptive metrics and propose two algorithms, i.e., high adaptive contact efficiency (HACE) algorithm and high contact with a low infected probability (HCLI) algorithm. First, we penalize the contribution of the neighborhood of the seed set to the evaluation of the influence gain to the seed set to correct the impact of overlapping influence. Additionally, the proposed HACE algorithm uses the being contacted capability to reveal the infected possibility of candidate nodes, while the proposed HCLI algorithm estimates the global infected probability of nodes. The experiments and analysis on eight real-world hypergraphs demonstrate the better balance of the HACE and HCLI algorithms than the state-of-the-art (SOTA) algorithms in selecting influential seed set and ensuring computational efficiency. Compared with the existing SOTA algorithms, HACE and HCLI run at least ten times faster than SOTA, and at most nearly 70 times faster. On large-scale hypergraphs, the HACE and HCLI algorithms still show great computational efficiency and significantly improved performances compared with other low-time complexity algorithms.
影响最大化算法的目标是迭代地确定一个能够使传播范围最大化的种子集。本文主要研究超图影响最大化问题。由高阶相互作用引起的重叠邻域导致对候选节点扩散能力的高估。此外,选择感染概率高的候选节点作为新的种子节点,其收益低,影响范围增益小。因此,我们开发了自适应度量并提出了两种算法,即高自适应接触效率(HACE)算法和高接触低感染概率(HCLI)算法。首先,我们惩罚种子集的邻域对种子集的影响增益评价的贡献,以纠正重叠影响的影响。此外,HACE算法利用被接触能力来揭示候选节点的感染可能性,HCLI算法估计节点的全局感染概率。在八个真实超图上的实验和分析表明,HACE和HCLI算法在选择有影响的种子集和保证计算效率方面比最先进的SOTA算法更好地平衡了计算效率。与现有的SOTA算法相比,HACE和HCLI的运行速度比SOTA至少快10倍,最多快近70倍。在大规模超图上,与其他低时间复杂度算法相比,HACE和HCLI算法仍然显示出很高的计算效率和显著的性能提升。
{"title":"Adaptive overlap penalization and probabilistic modeling in hypergraph influence maximization","authors":"Lingyu Wu ,&nbsp;Cong Li ,&nbsp;Bo Qu ,&nbsp;Xiang Li","doi":"10.1016/j.ipm.2025.104594","DOIUrl":"10.1016/j.ipm.2025.104594","url":null,"abstract":"<div><div>Influence maximization (IM) algorithms aim to iteratively identify a seed set that could maximize the spreading range. In this paper, we concentrate on the hypergraph influence maximization (HyperIM) problem. The overlapping neighborhood caused by the higher-order interactions leads to an overestimation of the diffusion capability of candidate nodes. Moreover, selecting the candidate node with a high infected probability as a new seed node is low payoff with a small influence range gain. Thus, we develop adaptive metrics and propose two algorithms, i.e., high adaptive contact efficiency (HACE) algorithm and high contact with a low infected probability (HCLI) algorithm. First, we penalize the contribution of the neighborhood of the seed set to the evaluation of the influence gain to the seed set to correct the impact of overlapping influence. Additionally, the proposed HACE algorithm uses the being contacted capability to reveal the infected possibility of candidate nodes, while the proposed HCLI algorithm estimates the global infected probability of nodes. The experiments and analysis on eight real-world hypergraphs demonstrate the better balance of the HACE and HCLI algorithms than the state-of-the-art (SOTA) algorithms in selecting influential seed set and ensuring computational efficiency. Compared with the existing SOTA algorithms, HACE and HCLI run at least ten times faster than SOTA, and at most nearly 70 times faster. On large-scale hypergraphs, the HACE and HCLI algorithms still show great computational efficiency and significantly improved performances compared with other low-time complexity algorithms.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104594"},"PeriodicalIF":6.9,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Developing Fairness, Accuracy, and Serendipity Objective Functions for Recommendation System and Establishing Trade-off through Multi-Objective Evolutionary Optimization 基于多目标进化优化的推荐系统公平性、准确性和偶然性目标函数建立权衡关系
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-03 DOI: 10.1016/j.ipm.2025.104604
Shresth Khaitan , Rahul Shrivastava
Balancing accuracy while establishing a trade-off optimization with fairness and serendipity remains a challenging problem in commercial recommender systems. However, recent multi-objective recommendation methods have often overlooked the need to investigate pleasantly surprising items, thereby mitigating popularity bias and ensuring the equitable inclusion of items in the recommendation list. Hence, this study develops the objective functions for Fairness, Accuracy, and Serendipity and integrates them into a proposed unified Multi-Objective Evolutionary Algorithm-Based Recommendation Framework (FAS-MOEA). The proposed objective functions for accuracy ensure the balanced inclusion of long-tail and popular items through weighted evaluation. The fairness-based objective function incorporates genre-aware fairness, aligning recommendation distributions with both global and user-specific genre profiles. The serendipity-based proposed objective function learns implicit, context-sensitive preferences for novel yet relevant items. Lastly, the proposed framework establishes the balanced trade-off among these competing objectives to generate the Pareto optimal recommendation solution. The proposed models' validation demonstrates substantial improvement over the competing models on three benchmark datasets, MovieLens 100K, MovieLens 1M, and Amazon Electronics (5-core), attaining an enhancement of 27.21% in F1-score, 8.44% in fairness, and 16.66% in serendipity score. The generated Pareto front exhibits the models' ability to navigate trade-offs among these competing goals and develop an accurate, fair, and pleasantly surprising recommendation.
在商业推荐系统中,平衡准确性的同时建立公平性和偶然性的权衡优化仍然是一个具有挑战性的问题。然而,最近的多目标推荐方法往往忽略了调查令人惊喜的项目的需要,从而减轻了受欢迎程度的偏见,并确保在推荐列表中公平地包含项目。因此,本研究开发了公平性、准确性和偶然性的目标函数,并将它们整合到一个统一的多目标进化算法推荐框架(FAS-MOEA)中。所提出的精度目标函数通过加权评价确保了长尾项目和热门项目的均衡纳入。基于公平性的目标函数结合了类型感知公平性,将推荐分布与全局和用户特定类型配置文件对齐。基于偶然性的目标函数学习对新颖但相关的物品的内隐的、上下文敏感的偏好。最后,提出的框架在这些竞争目标之间建立平衡权衡,以生成帕累托最优推荐解。在MovieLens 100K、MovieLens 1M和Amazon Electronics(5核)三个基准数据集上,所提出模型的验证表明,与竞争模型相比,该模型有了很大的改进,f1得分提高了27.21%,公平性提高了8.44%,意外得分提高了16.66%。生成的帕累托前沿展示了模型在这些相互竞争的目标之间进行权衡的能力,并开发出准确、公平和令人惊喜的建议。
{"title":"Developing Fairness, Accuracy, and Serendipity Objective Functions for Recommendation System and Establishing Trade-off through Multi-Objective Evolutionary Optimization","authors":"Shresth Khaitan ,&nbsp;Rahul Shrivastava","doi":"10.1016/j.ipm.2025.104604","DOIUrl":"10.1016/j.ipm.2025.104604","url":null,"abstract":"<div><div>Balancing accuracy while establishing a trade-off optimization with fairness and serendipity remains a challenging problem in commercial recommender systems. However, recent multi-objective recommendation methods have often overlooked the need to investigate pleasantly surprising items, thereby mitigating popularity bias and ensuring the equitable inclusion of items in the recommendation list. Hence, this study develops the objective functions for Fairness, Accuracy, and Serendipity and integrates them into a proposed unified Multi-Objective Evolutionary Algorithm-Based Recommendation Framework (FAS-MOEA). The proposed objective functions for accuracy ensure the balanced inclusion of long-tail and popular items through weighted evaluation. The fairness-based objective function incorporates genre-aware fairness, aligning recommendation distributions with both global and user-specific genre profiles. The serendipity-based proposed objective function learns implicit, context-sensitive preferences for novel yet relevant items. Lastly, the proposed framework establishes the balanced trade-off among these competing objectives to generate the Pareto optimal recommendation solution. The proposed models' validation demonstrates substantial improvement over the competing models on three benchmark datasets, MovieLens 100K, MovieLens 1M, and Amazon Electronics (5-core), attaining an enhancement of 27.21% in F1-score, 8.44% in fairness, and 16.66% in serendipity score. The generated Pareto front exhibits the models' ability to navigate trade-offs among these competing goals and develop an accurate, fair, and pleasantly surprising recommendation.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104604"},"PeriodicalIF":6.9,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language models for environmental, social, and governance analysis: A review 环境、社会和治理分析的语言模型:综述
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-02 DOI: 10.1016/j.ipm.2025.104596
Kelvin Du , Rui Mao , Frank Xing , Gianmarco Mengaldo , Erik Cambria
Language models have revolutionized information processing, elevating it to new levels and generating opportunities to positively impact our society, e.g., in Environmental, Social, and Governance (ESG) domains. This article surveys the current use of language models for ESG analysis, focusing on their applicable scope, effectiveness, and transformative impact. It highlights how these models facilitate a deeper understanding of ESG practices and impacts by integrating unstructured data while acknowledging existing limitations and challenges. Specifically, based on a review of over ninety ESG studies published since the introduction of Transformers in 2018, we discovered that the potential of language models is particularly notable in four primary themes: (1) ESG Frameworks and Standards, which involve the classification of ESG-related texts into binary categories, coarse-grained ESG factors, or fine-grained ESG topics. This theme also includes identifying ESG topic trends and assessing the alignment of corporate ESG disclosures with sustainable development goals; (2) ESG Reporting and Disclosure, which include ESG narrative processing, ESG reporting assurance and ESG report generation; (3) ESG Measurement and Evaluation, which involves calculating ESG ratings, extracting key performance indicators (KPIs), assessing ESG risks, detecting ESG controversy categories, analyzing ESG impact and duration, and assessing the effects of ESG on sustainable growth and corporate financial performance, among other functions; (4) ESG Integration and Application, aiming to incorporate ESG factors into broader financial applications and thereby innovate financial tasks, including ESG sentiment analysis, ESG chatbots and AI assistants, ESG-based financial risk and credit analysis, and ESG investing strategies. We conclude by emphasizing the significance of language models in advancing ESG studies and discussing future research directions.
语言模型已经彻底改变了信息处理,将其提升到新的水平,并产生了积极影响我们社会的机会,例如在环境,社会和治理(ESG)领域。本文调查了ESG分析中语言模型的当前使用情况,重点关注它们的适用范围、有效性和变革影响。它强调了这些模型如何通过整合非结构化数据促进对ESG实践和影响的更深入理解,同时承认现有的局限性和挑战。具体来说,基于对自2018年《变压器》推出以来发表的90多项ESG研究的回顾,我们发现语言模型的潜力在四个主要主题中尤为显著:(1)ESG框架和标准,其中涉及将ESG相关文本分为二元类别、粗粒度ESG因素或细粒度ESG主题。该主题还包括确定ESG主题趋势,评估企业ESG披露与可持续发展目标的一致性;(2) ESG报告与披露,包括ESG叙事处理、ESG报告保证和ESG报告生成;(3) ESG测量与评估,包括计算ESG评级、提取关键绩效指标、评估ESG风险、发现ESG争议类别、分析ESG影响和持续时间、评估ESG对可持续增长和公司财务绩效的影响等功能;(4) ESG整合与应用,旨在将ESG因素纳入更广泛的金融应用,从而创新金融任务,包括ESG情绪分析、ESG聊天机器人和人工智能助手、基于ESG的金融风险和信用分析以及ESG投资策略。最后,我们强调了语言模型在推进ESG研究中的重要意义,并讨论了未来的研究方向。
{"title":"Language models for environmental, social, and governance analysis: A review","authors":"Kelvin Du ,&nbsp;Rui Mao ,&nbsp;Frank Xing ,&nbsp;Gianmarco Mengaldo ,&nbsp;Erik Cambria","doi":"10.1016/j.ipm.2025.104596","DOIUrl":"10.1016/j.ipm.2025.104596","url":null,"abstract":"<div><div>Language models have revolutionized information processing, elevating it to new levels and generating opportunities to positively impact our society, e.g., in Environmental, Social, and Governance (ESG) domains. This article surveys the current use of language models for ESG analysis, focusing on their applicable scope, effectiveness, and transformative impact. It highlights how these models facilitate a deeper understanding of ESG practices and impacts by integrating unstructured data while acknowledging existing limitations and challenges. Specifically, based on a review of over ninety ESG studies published since the introduction of Transformers in 2018, we discovered that the potential of language models is particularly notable in four primary themes: (1) <strong>ESG Frameworks and Standards</strong>, which involve the classification of ESG-related texts into binary categories, coarse-grained ESG factors, or fine-grained ESG topics. This theme also includes identifying ESG topic trends and assessing the alignment of corporate ESG disclosures with sustainable development goals; (2) <strong>ESG Reporting and Disclosure</strong>, which include ESG narrative processing, ESG reporting assurance and ESG report generation; (3) <strong>ESG Measurement and Evaluation</strong>, which involves calculating ESG ratings, extracting key performance indicators (KPIs), assessing ESG risks, detecting ESG controversy categories, analyzing ESG impact and duration, and assessing the effects of ESG on sustainable growth and corporate financial performance, among other functions; (4) <strong>ESG Integration and Application</strong>, aiming to incorporate ESG factors into broader financial applications and thereby innovate financial tasks, including ESG sentiment analysis, ESG chatbots and AI assistants, ESG-based financial risk and credit analysis, and ESG investing strategies. We conclude by emphasizing the significance of language models in advancing ESG studies and discussing future research directions.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104596"},"PeriodicalIF":6.9,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view clustering based on the association of graph structure and feature distribution 基于图结构与特征分布关联的多视图聚类
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-02 DOI: 10.1016/j.ipm.2025.104586
Chenhui Shi , Yongjie Xin , Haifeng Yang , Jianghui Cai , Jie Wang , Lichan Zhou , Yanting He , Fuxing Cui , Xujun Zhao , Yaling Xun
Graph-based multi-view clustering methods have gained considerable attention in recent years. However, most existing techniques ignore the association of graph and feature distributions between different views. In addition, noise and redundant information in data will leads to an inability to accurately learn consistent distributions among multiple views. To overcome these issues, this study proposes a framework termed “multi-view clustering based on the association of graph structure and feature distribution” (MLGF). Specifically, we provide collaborative training based on a similar distribution comparison mechanism that unifies the graph structures and feature distributions of different views, to construct multiple high-quality similarity matrices. Noisy information is effectively eliminated from the raw data by embedding graph spectral decomposition and automatic weighting methods into the graph encoder to learn clean, low-dimensional embedded representations of the data. Finally, multiple similarity matrices are fused in a locally weighted manner to obtain consistent similarity matrices. Experiments on five benchmark datasets demonstrated the superiority of our method, achieving 100%, 97.28% on COIL-20 and Handwritten datasets. This is attributed to the effective joint optimization of graph structure and feature distribution, which is validated by its outstanding performance across diverse datasets. The code will be available at https://github.com/shichenhui/MLGF.
基于图的多视图聚类方法近年来得到了广泛的关注。然而,大多数现有的技术忽略了不同视图之间的图和特征分布的关联。此外,数据中的噪声和冗余信息将导致无法准确地学习多个视图之间的一致分布。为了克服这些问题,本研究提出了一种基于图结构和特征分布关联的多视图聚类框架(MLGF)。具体来说,我们提供了基于相似分布比较机制的协同训练,该机制统一了不同视图的图结构和特征分布,以构建多个高质量的相似矩阵。通过将图谱分解和自动加权方法嵌入到图编码器中,以学习数据的干净、低维嵌入表示,有效地消除了原始数据中的噪声信息。最后,对多个相似矩阵进行局部加权融合,得到一致性相似矩阵。在5个基准数据集上的实验证明了该方法的优越性,在COIL-20和手写数据集上的准确率分别为100%、97.28%。这归功于图结构和特征分布的有效联合优化,其在不同数据集上的出色性能验证了这一点。代码可在https://github.com/shichenhui/MLGF上获得。
{"title":"Multi-view clustering based on the association of graph structure and feature distribution","authors":"Chenhui Shi ,&nbsp;Yongjie Xin ,&nbsp;Haifeng Yang ,&nbsp;Jianghui Cai ,&nbsp;Jie Wang ,&nbsp;Lichan Zhou ,&nbsp;Yanting He ,&nbsp;Fuxing Cui ,&nbsp;Xujun Zhao ,&nbsp;Yaling Xun","doi":"10.1016/j.ipm.2025.104586","DOIUrl":"10.1016/j.ipm.2025.104586","url":null,"abstract":"<div><div>Graph-based multi-view clustering methods have gained considerable attention in recent years. However, most existing techniques ignore the association of graph and feature distributions between different views. In addition, noise and redundant information in data will leads to an inability to accurately learn consistent distributions among multiple views. To overcome these issues, this study proposes a framework termed “multi-view clustering based on the association of graph structure and feature distribution” (MLGF). Specifically, we provide collaborative training based on a similar distribution comparison mechanism that unifies the graph structures and feature distributions of different views, to construct multiple high-quality similarity matrices. Noisy information is effectively eliminated from the raw data by embedding graph spectral decomposition and automatic weighting methods into the graph encoder to learn clean, low-dimensional embedded representations of the data. Finally, multiple similarity matrices are fused in a locally weighted manner to obtain consistent similarity matrices. Experiments on five benchmark datasets demonstrated the superiority of our method, achieving 100%, 97.28% on COIL-20 and Handwritten datasets. This is attributed to the effective joint optimization of graph structure and feature distribution, which is validated by its outstanding performance across diverse datasets. The code will be available at <span><span>https://github.com/shichenhui/MLGF</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104586"},"PeriodicalIF":6.9,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886174","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A multi-criteria sorting method for preference maps based on Nash-Stackelberg game 基于Nash-Stackelberg博弈的偏好图多准则排序方法
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-02 DOI: 10.1016/j.ipm.2025.104587
Xinru Han , Yukun Bao , Jianming Zhan , Yufeng Shen
Existing multi-criteria sorting methods predominantly rely on preset classification thresholds or fixed numbers of alternatives for classification, exhibiting strong subjectivity and overlooking potential consensus correlations between classifications. In group decision-making (GDM), the consensus feedback mechanism drives the consensus reaching process (CRP) and gives rise to the problem of adjustment amount allocation among decision-makers (DMs). However, existing studies over-rely on consensus thresholds and neglect differences in DMs’ adjustment capabilities and sequences, which significantly reduces the applicability and accuracy of the methods. To address the above issues, this study proposes a novel group consensus method (NS-FPR-PM) integrating the Nash-Stackelberg game and preference maps within the framework of fuzzy preference relations (FPRs). Specifically, class probability thresholds are objectively derived through an optimization model; the classification results are then converted into preference maps based on these class probability thresholds to explore the inherent consensus relations, thereby eliminating reliance on consensus thresholds. The Nash-Stackelberg game model can characterize the differences in bargaining power among DMs, and an asynchronous adjustment mechanism is designed accordingly to achieve fair allocation of adjustment amount. Finally, we provide an example to illustrate the proposed method, the experimental results and analysis demonstrate that the method exhibits significant advantages over similar methods in terms of consensus reaching efficiency and unit adjustment conversion rate.
现有的多标准分类方法主要依赖于预设的分类阈值或固定数量的备选分类,表现出很强的主观性,忽略了分类之间潜在的共识相关性。在群体决策(GDM)中,共识反馈机制推动了共识达成过程(CRP),并产生了决策者之间调整量分配的问题。然而,现有研究过度依赖共识阈值,忽视了dm调整能力和序列的差异,大大降低了方法的适用性和准确性。为了解决上述问题,本研究提出了一种新的群体共识方法(NS-FPR-PM),该方法将纳什- stackelberg博弈和模糊偏好关系(FPRs)框架下的偏好图相结合。具体而言,通过优化模型客观地推导出类概率阈值;然后将分类结果转换为基于这些类概率阈值的偏好图,以探索固有的共识关系,从而消除对共识阈值的依赖。Nash-Stackelberg博弈模型可以表征dm之间议价能力的差异,并据此设计异步调整机制,实现调整金额的公平分配。最后,给出了一个算例,实验结果和分析表明,该方法在共识达成效率和单位调整转化率方面比同类方法具有显著的优势。
{"title":"A multi-criteria sorting method for preference maps based on Nash-Stackelberg game","authors":"Xinru Han ,&nbsp;Yukun Bao ,&nbsp;Jianming Zhan ,&nbsp;Yufeng Shen","doi":"10.1016/j.ipm.2025.104587","DOIUrl":"10.1016/j.ipm.2025.104587","url":null,"abstract":"<div><div>Existing multi-criteria sorting methods predominantly rely on preset classification thresholds or fixed numbers of alternatives for classification, exhibiting strong subjectivity and overlooking potential consensus correlations between classifications. In group decision-making (GDM), the consensus feedback mechanism drives the consensus reaching process (CRP) and gives rise to the problem of adjustment amount allocation among decision-makers (DMs). However, existing studies over-rely on consensus thresholds and neglect differences in DMs’ adjustment capabilities and sequences, which significantly reduces the applicability and accuracy of the methods. To address the above issues, this study proposes a novel group consensus method (NS-FPR-PM) integrating the Nash-Stackelberg game and preference maps within the framework of fuzzy preference relations (FPRs). Specifically, class probability thresholds are objectively derived through an optimization model; the classification results are then converted into preference maps based on these class probability thresholds to explore the inherent consensus relations, thereby eliminating reliance on consensus thresholds. The Nash-Stackelberg game model can characterize the differences in bargaining power among DMs, and an asynchronous adjustment mechanism is designed accordingly to achieve fair allocation of adjustment amount. Finally, we provide an example to illustrate the proposed method, the experimental results and analysis demonstrate that the method exhibits significant advantages over similar methods in terms of consensus reaching efficiency and unit adjustment conversion rate.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104587"},"PeriodicalIF":6.9,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less is more: Towards green code large language models via unified structural pruning 少即是多:通过统一的结构修剪实现绿色代码大型语言模型
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-01 DOI: 10.1016/j.ipm.2025.104580
Guang Yang , Yu Zhou , Xiangyu Zhang , Wei Cheng , Ke Liu , Xiang Chen , Terry Yue Zhuo , Taolue Chen
The extensive application of Large Language Models (LLMs) in generative coding tasks has raised concerns due to their high computational demands and energy consumption. Unlike previous structural pruning methods designed for classification models that deal with low-dimensional classification logits, generative Code LLMs produce high-dimensional token logit sequences, making traditional pruning objectives inherently limited. Moreover, existing single-component pruning approaches further constrain the effectiveness when applied to generative Code LLMs. In response, we propose Flab-Pruner, an innovative unified structural pruning method that combines vocabulary, layer, and Feed-Forward Network (FFN) pruning. This approach effectively reduces model parameters while maintaining performance. Additionally, we introduce a customized code instruction data strategy for coding tasks to enhance the performance recovery efficiency of the pruned model. Through extensive evaluations on three state-of-the-art Code LLMs across multiple generative coding tasks, the results demonstrate that Flab-Pruner retains 97% of the original code generation performance on average after pruning 22% of the parameters, and achieves the same or even better performance after post-training. The pruned models exhibit significant improvements in storage, GPU usage, computational efficiency, and environmental impact, while maintaining well robustness. Our research provides a sustainable solution for green software engineering and promotes the efficient deployment of LLMs in real-world generative coding intelligence applications.
大语言模型(LLMs)在生成式编码任务中的广泛应用由于其高计算量和高能耗而引起了人们的关注。与以前为处理低维分类逻辑的分类模型设计的结构修剪方法不同,生成代码llm产生高维令牌逻辑序列,这使得传统的修剪目标固有地受到限制。此外,现有的单组件修剪方法进一步限制了应用于生成代码llm时的有效性。为此,我们提出了Flab-Pruner,这是一种结合词汇、层和前馈网络(FFN)修剪的创新的统一结构修剪方法。这种方法在保持性能的同时有效地减少了模型参数。此外,我们还引入了针对编码任务的定制代码指令数据策略,以提高剪枝模型的性能恢复效率。通过对三个最先进的代码llm在多个生成编码任务中的广泛评估,结果表明,在修剪22%的参数后,Flab-Pruner平均保留了原始代码生成性能的97%,并且在训练后达到相同甚至更好的性能。修剪后的模型在存储、GPU使用、计算效率和环境影响方面都有显著改善,同时保持了良好的鲁棒性。我们的研究为绿色软件工程提供了一个可持续的解决方案,并促进了llm在现实世界生成编码智能应用中的有效部署。
{"title":"Less is more: Towards green code large language models via unified structural pruning","authors":"Guang Yang ,&nbsp;Yu Zhou ,&nbsp;Xiangyu Zhang ,&nbsp;Wei Cheng ,&nbsp;Ke Liu ,&nbsp;Xiang Chen ,&nbsp;Terry Yue Zhuo ,&nbsp;Taolue Chen","doi":"10.1016/j.ipm.2025.104580","DOIUrl":"10.1016/j.ipm.2025.104580","url":null,"abstract":"<div><div>The extensive application of Large Language Models (LLMs) in generative coding tasks has raised concerns due to their high computational demands and energy consumption. Unlike previous structural pruning methods designed for classification models that deal with low-dimensional classification logits, generative Code LLMs produce high-dimensional token logit sequences, making traditional pruning objectives inherently limited. Moreover, existing single-component pruning approaches further constrain the effectiveness when applied to generative Code LLMs. In response, we propose <span>Flab-Pruner</span>, an innovative unified structural pruning method that combines vocabulary, layer, and Feed-Forward Network (FFN) pruning. This approach effectively reduces model parameters while maintaining performance. Additionally, we introduce a customized code instruction data strategy for coding tasks to enhance the performance recovery efficiency of the pruned model. Through extensive evaluations on three state-of-the-art Code LLMs across multiple generative coding tasks, the results demonstrate that <span>Flab-Pruner</span> retains 97% of the original code generation performance on average after pruning 22% of the parameters, and achieves the same or even better performance after post-training. The pruned models exhibit significant improvements in storage, GPU usage, computational efficiency, and environmental impact, while maintaining well robustness. Our research provides a sustainable solution for green software engineering and promotes the efficient deployment of LLMs in real-world generative coding intelligence applications.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104580"},"PeriodicalIF":6.9,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SemiBCP-SAM2 : Semi-supervised model via enhanced bidirectional copy-paste based on SAM2 for medical image segmentation SemiBCP-SAM2:基于SAM2的增强双向复制粘贴半监督模型,用于医学图像分割
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-31 DOI: 10.1016/j.ipm.2025.104576
Guangqi Yang , Xiaoxin Guo , Haoran Zhang , Zhenyuan Zheng , Hongliang Dong , Songbai Xu
Insufficient use of unlabeled data often leads to inaccurate medical image segmentation, and noise in pseudo-labels can further destabilize training. In this paper, we propose a semi-supervised model based on the SAM2 combined with a bidirectional copy-paste mean teacher model (SemiBCP-SAM2). Specifically, we use a student model to generate segmentation results, which are then used as input prompts for SAM2 to generate additional pseudo-labels, providing auxiliary supervision to guide student learning. We also introduce a Masked Prompt (MP) mechanism that reduces prompt confidence to better handle uncertainty and noise, improving its performance in complex or incomplete information scenarios. Another major contribution is the transplantability of this model that can be achieved by replacing the baseline network in the student-teacher model, and can enhance the performance of other semi-supervised segmentation networks at a lower cost. We conduct comparative experiments and performance evaluations of SemiBCP-SAM2 on the ACDC (100 MRI scans) and PROMISE12 (50 MRI scans) datasets. On ACDC, with 5% and 10% labeled data, SemiBCP-SAM2 improves Dice by 0.29% and 1.16%, and Jaccard by 0.39% and 1.84%. On PROMISE12, with 5% and 20% labeled data, it improves Dice by 1.61% and 2.03%, and Jaccard by 1.99% and 2.79%. Source code is released at https://github.com/ydlam/SemiBCP-SAM2.
未标记数据的使用不足往往导致医学图像分割不准确,而伪标签中的噪声会进一步破坏训练的稳定性。在本文中,我们提出了一个基于SAM2和双向复制粘贴平均教师模型(SemiBCP-SAM2)的半监督模型。具体来说,我们使用学生模型来生成分割结果,然后将其用作SAM2的输入提示,以生成额外的伪标签,为指导学生学习提供辅助监督。我们还引入了掩蔽提示(MP)机制,该机制降低了提示置信度,以更好地处理不确定性和噪声,提高了其在复杂或不完整信息场景中的性能。另一个主要贡献是该模型的可移植性,可以通过替换学生-教师模型中的基线网络来实现,并且可以以较低的成本提高其他半监督分割网络的性能。我们在ACDC(100次MRI扫描)和PROMISE12(50次MRI扫描)数据集上对SemiBCP-SAM2进行了比较实验和性能评估。在ACDC上,当标记数据分别为5%和10%时,SemiBCP-SAM2分别提高Dice 0.29%和1.16%,Jaccard提高0.39%和1.84%。在PROMISE12上,标记数据分别为5%和20%时,Dice分别提高了1.61%和2.03%,Jaccard分别提高了1.99%和2.79%。源代码发布在https://github.com/ydlam/SemiBCP-SAM2。
{"title":"SemiBCP-SAM2 : Semi-supervised model via enhanced bidirectional copy-paste based on SAM2 for medical image segmentation","authors":"Guangqi Yang ,&nbsp;Xiaoxin Guo ,&nbsp;Haoran Zhang ,&nbsp;Zhenyuan Zheng ,&nbsp;Hongliang Dong ,&nbsp;Songbai Xu","doi":"10.1016/j.ipm.2025.104576","DOIUrl":"10.1016/j.ipm.2025.104576","url":null,"abstract":"<div><div>Insufficient use of unlabeled data often leads to inaccurate medical image segmentation, and noise in pseudo-labels can further destabilize training. In this paper, we propose a semi-supervised model based on the SAM2 combined with a bidirectional copy-paste mean teacher model (SemiBCP-SAM2). Specifically, we use a student model to generate segmentation results, which are then used as input prompts for SAM2 to generate additional pseudo-labels, providing auxiliary supervision to guide student learning. We also introduce a Masked Prompt (MP) mechanism that reduces prompt confidence to better handle uncertainty and noise, improving its performance in complex or incomplete information scenarios. Another major contribution is the transplantability of this model that can be achieved by replacing the baseline network in the student-teacher model, and can enhance the performance of other semi-supervised segmentation networks at a lower cost. We conduct comparative experiments and performance evaluations of SemiBCP-SAM2 on the ACDC (100 MRI scans) and PROMISE12 (50 MRI scans) datasets. On ACDC, with 5% and 10% labeled data, SemiBCP-SAM2 improves Dice by 0.29% and 1.16%, and Jaccard by 0.39% and 1.84%. On PROMISE12, with 5% and 20% labeled data, it improves Dice by 1.61% and 2.03%, and Jaccard by 1.99% and 2.79%. Source code is released at <span><span>https://github.com/ydlam/SemiBCP-SAM2</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104576"},"PeriodicalIF":6.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
From tracking to thinking: Facilitating post-exercise reflection by a large language model-mediated journaling system 从跟踪到思考:通过大型语言模型介导的日志系统促进运动后反思
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-31 DOI: 10.1016/j.ipm.2025.104574
Xianglin Zhao , Yucheng Jin , Annie Yan Wang , Ming Zhang
Wearable devices provide rich quantitative data for self-reflection on physical activity. However, users often struggle to derive meaningful insights from these data, highlighting the need for enhanced support. To investigate whether Large Language Models (LLMs) can facilitate this process, we propose and evaluate a human-LLM collaborative reflective journaling paradigm. We developed PaceMind, an LLM-mediated journaling system that implements this paradigm based on a three-stage reflection framework. It can generate data-driven drafts and personalized questions to guide users in integrating exercise data with personal insights. A two-week within-subjects study (N=21) compared the LLM-mediated system with a template-based journaling baseline. The LLM-mediated design significantly improved the perceived effectiveness of reflection support and increased users’ intention to use the system. However, perceived ease of use did not improve significantly. Users appreciated the LLM’s scaffolding for easing data sense-making, but also reported added cognitive work in verifying and personalizing the LLM-generated content. Although objective activity levels did not change significantly, the LLM-mediated condition showed a trend toward more adaptive exercise planning and sustained engagement. Our findings provide empirical evidence for a human-LLM collaborative reflection paradigm in a data-intensive exercise context. They highlight both the potential to deepen user reflection and underscore the critical design challenge of balancing automation with meaningful cognitive engagement and user control.
可穿戴设备为身体活动的自我反思提供了丰富的定量数据。然而,用户往往很难从这些数据中获得有意义的见解,这突出了对增强支持的需求。为了研究大型语言模型(llm)是否能促进这一过程,我们提出并评估了一个人类- llm协作反射日志范式。我们开发了PaceMind,这是一个基于llm的日志系统,它基于三阶段反射框架实现了这种范式。它可以生成数据驱动的草稿和个性化问题,引导用户将锻炼数据与个人见解相结合。一项为期两周的受试者研究(N=21)将llm介导的系统与基于模板的日志基线进行了比较。法学硕士介导的设计显著提高了反思支持的感知有效性,增加了用户使用系统的意愿。然而,感知易用性并没有显著提高。用户对LLM简化数据意义构建的框架表示赞赏,但也报告了在验证和个性化LLM生成的内容方面增加的认知工作。虽然客观活动水平没有显著变化,但llm介导的条件显示出更适应性的运动计划和持续参与的趋势。我们的研究结果为数据密集型练习环境中的人类-法学硕士协作反思范式提供了经验证据。他们强调了深化用户反思的潜力,并强调了平衡自动化与有意义的认知参与和用户控制的关键设计挑战。
{"title":"From tracking to thinking: Facilitating post-exercise reflection by a large language model-mediated journaling system","authors":"Xianglin Zhao ,&nbsp;Yucheng Jin ,&nbsp;Annie Yan Wang ,&nbsp;Ming Zhang","doi":"10.1016/j.ipm.2025.104574","DOIUrl":"10.1016/j.ipm.2025.104574","url":null,"abstract":"<div><div>Wearable devices provide rich quantitative data for self-reflection on physical activity. However, users often struggle to derive meaningful insights from these data, highlighting the need for enhanced support. To investigate whether Large Language Models (LLMs) can facilitate this process, we propose and evaluate a human-LLM collaborative reflective journaling paradigm. We developed <em>PaceMind</em>, an LLM-mediated journaling system that implements this paradigm based on a three-stage reflection framework. It can generate data-driven drafts and personalized questions to guide users in integrating exercise data with personal insights. A two-week within-subjects study (<span><math><mrow><mi>N</mi><mo>=</mo><mn>21</mn></mrow></math></span>) compared the LLM-mediated system with a template-based journaling baseline. The LLM-mediated design significantly improved the perceived effectiveness of reflection support and increased users’ intention to use the system. However, perceived ease of use did not improve significantly. Users appreciated the LLM’s scaffolding for easing data sense-making, but also reported added cognitive work in verifying and personalizing the LLM-generated content. Although objective activity levels did not change significantly, the LLM-mediated condition showed a trend toward more adaptive exercise planning and sustained engagement. Our findings provide empirical evidence for a human-LLM collaborative reflection paradigm in a data-intensive exercise context. They highlight both the potential to deepen user reflection and underscore the critical design challenge of balancing automation with meaningful cognitive engagement and user control.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104574"},"PeriodicalIF":6.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A text-based emotional pattern discrepancy aware model for enhanced generalization in depression detection 基于文本的情绪模式差异感知模型在抑郁症检测中的增强泛化
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-31 DOI: 10.1016/j.ipm.2025.104575
Haibo Zhang , Zhenyu Liu , Yang Wu , Jiaqian Yuan , Gang Li , Zhijie Ding , Bin Hu
Text-based automated depression detection is one of the current hot topics. However, current research lacks the exploration of key verbal behaviors in depression detection scenarios, resulting in insufficient generalization performance of the models. To address this issue, we propose a depression detection method based on emotional pattern discrepancies, as the discrepancies are one of the fundamental features of depression as an affective disorder. Specifically, we propose an Emotional Pattern Discrepancy Aware Depression Detection Model (EPDAD). The EPDAD employs specially designed modules and loss functions to train the model. This approach enables the model to dynamically and comprehensively perceive the different emotional patterns reflected by depressed and healthy individuals in response to various emotional stimuli. As a result, it enhances the model’s ability to learn the essential features of depression. We evaluate the generalization performance of our model from a cross-dataset and cross-topic perspective using MODMA (52 samples) and MIDD (520 samples) datasets. In cross-topic generalization experiments, our method improves F1 score by 10.39% and 1.77% on MODMA and MIDD, respectively, in comparison to the state-of-the-art method. In cross-dataset generalization experiments, our method improves the F1 score by a maximum of 6.37%. We also compare our model with large language models, and the results indicate it is more effective for depression detection tasks. Our research contributes to the practical application of depression detection models. Our code is available at: https://github.com/hbZhzzz/EPDAD.
基于文本的抑郁症自动检测是当前研究的热点之一。然而,目前的研究缺乏对抑郁症检测场景中关键言语行为的探索,导致模型的泛化性能不足。为了解决这个问题,我们提出了一种基于情绪模式差异的抑郁症检测方法,因为差异是抑郁症作为一种情感障碍的基本特征之一。具体而言,我们提出了一个情绪模式差异感知抑郁检测模型(EPDAD)。EPDAD采用专门设计的模块和损失函数对模型进行训练。该方法使模型能够动态、全面地感知抑郁个体和健康个体对各种情绪刺激所反映的不同情绪模式。因此,它增强了模型学习抑郁症基本特征的能力。我们使用MODMA(52个样本)和MIDD(520个样本)数据集从跨数据集和跨主题的角度评估了我们的模型的泛化性能。在交叉主题泛化实验中,我们的方法在MODMA和MIDD上分别提高了10.39%和1.77%的F1分数。在跨数据集泛化实验中,我们的方法将F1分数提高了6.37%。我们还将我们的模型与大型语言模型进行了比较,结果表明它在抑郁检测任务中更有效。我们的研究有助于抑郁症检测模型的实际应用。我们的代码可在:https://github.com/hbZhzzz/EPDAD。
{"title":"A text-based emotional pattern discrepancy aware model for enhanced generalization in depression detection","authors":"Haibo Zhang ,&nbsp;Zhenyu Liu ,&nbsp;Yang Wu ,&nbsp;Jiaqian Yuan ,&nbsp;Gang Li ,&nbsp;Zhijie Ding ,&nbsp;Bin Hu","doi":"10.1016/j.ipm.2025.104575","DOIUrl":"10.1016/j.ipm.2025.104575","url":null,"abstract":"<div><div>Text-based automated depression detection is one of the current hot topics. However, current research lacks the exploration of key verbal behaviors in depression detection scenarios, resulting in insufficient generalization performance of the models. To address this issue, we propose a depression detection method based on emotional pattern discrepancies, as the discrepancies are one of the fundamental features of depression as an affective disorder. Specifically, we propose an Emotional Pattern Discrepancy Aware Depression Detection Model (EPDAD). The EPDAD employs specially designed modules and loss functions to train the model. This approach enables the model to dynamically and comprehensively perceive the different emotional patterns reflected by depressed and healthy individuals in response to various emotional stimuli. As a result, it enhances the model’s ability to learn the essential features of depression. We evaluate the generalization performance of our model from a cross-dataset and cross-topic perspective using MODMA (52 samples) and MIDD (520 samples) datasets. In cross-topic generalization experiments, our method improves F1 score by 10.39% and 1.77% on MODMA and MIDD, respectively, in comparison to the state-of-the-art method. In cross-dataset generalization experiments, our method improves the F1 score by a maximum of 6.37%. We also compare our model with large language models, and the results indicate it is more effective for depression detection tasks. Our research contributes to the practical application of depression detection models. Our code is available at: <span><span>https://github.com/hbZhzzz/EPDAD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104575"},"PeriodicalIF":6.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Outlier detector fusing latent representation and fuzzy granule 融合潜在表示和模糊颗粒的离群值检测器
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-31 DOI: 10.1016/j.ipm.2025.104571
Xinyu Su , Shihao Wang , Wei Huang , Zheng Li , Hongmei Chen , Zhong Yuan
Unsupervised outlier detection is a critical task in data mining. Two prominent paradigms, fuzzy information granulation and representation learning, have shown promise but face fundamental, opposing limitations. Fuzzy information granulation-based methods excel at modeling data uncertainty but struggle with the curse of dimensionality and noise in high-dimensional spaces. Conversely, representation learning-based methods effectively handle high-dimensional data but often neglect the uncertainty information inherent in data, such as fuzziness. To address these limitations, we propose Latent Representation-based Outlier Detection with fuzzy granule (LROD). In LROD, we utilize representation learning to address the challenges encountered by fuzzy information granulation-based methods in high-dimensional data by deriving a compact and effective representation from the original feature space. The reconstruction error of each sample serves as the first component of the outlier score. This error, derived from representation learning, effectively captures global structural abnormal information in the data. Subsequently, we introduce fuzzy information granulation on this new representation to address data uncertainty. The second component is formed by aggregating abnormal information from fuzzy information granules, which are induced by various attribute subsets. Finally, these two components are fused to produce the final outlier score. Experimental results demonstrate that LROD outperforms 20 competing methods across 15 datasets, achieving improvements of 4.5%, 10.5%, and 3.1% in AUC, AP, and G-mean metrics, respectively, compared to the second-best method, validating its superior effectiveness. This study demonstrates the significant benefits of a hybrid method, providing a new framework for fusing global structural information with local uncertainty measures to achieve state-of-the-art performance in outlier detection. The code is publicly available at https://github.com/Mxeron/LROD.
无监督异常点检测是数据挖掘中的一项关键任务。两个突出的范式,模糊信息粒化和表示学习,已经显示出希望,但面临根本的,相反的限制。基于模糊信息粒化的方法在数据不确定性建模方面具有优势,但在高维空间中存在维数和噪声的问题。相反,基于表示学习的方法可以有效地处理高维数据,但往往忽略了数据固有的不确定性信息,如模糊性。为了解决这些限制,我们提出了基于模糊颗粒(LROD)的潜在表示的离群检测。在LROD中,我们利用表征学习来解决基于模糊信息颗粒化方法在高维数据中遇到的挑战,从原始特征空间中获得紧凑有效的表征。每个样本的重构误差作为离群值的第一个分量。这种误差来源于表示学习,可以有效地捕获数据中的全局结构异常信息。随后,我们在这个新的表示上引入模糊信息粒化来解决数据的不确定性。第二部分是将由不同属性子集引起的模糊信息颗粒中的异常信息聚合而成。最后,将这两个分量融合以产生最终的异常值得分。实验结果表明,LROD在15个数据集上优于20种竞争方法,在AUC、AP和G-mean指标上分别比第二好的方法提高了4.5%、10.5%和3.1%,验证了其优越的有效性。这项研究证明了混合方法的显著优势,为融合全局结构信息和局部不确定性措施提供了一个新的框架,以实现离群值检测的最先进性能。该代码可在https://github.com/Mxeron/LROD上公开获得。
{"title":"Outlier detector fusing latent representation and fuzzy granule","authors":"Xinyu Su ,&nbsp;Shihao Wang ,&nbsp;Wei Huang ,&nbsp;Zheng Li ,&nbsp;Hongmei Chen ,&nbsp;Zhong Yuan","doi":"10.1016/j.ipm.2025.104571","DOIUrl":"10.1016/j.ipm.2025.104571","url":null,"abstract":"<div><div>Unsupervised outlier detection is a critical task in data mining. Two prominent paradigms, fuzzy information granulation and representation learning, have shown promise but face fundamental, opposing limitations. Fuzzy information granulation-based methods excel at modeling data uncertainty but struggle with the curse of dimensionality and noise in high-dimensional spaces. Conversely, representation learning-based methods effectively handle high-dimensional data but often neglect the uncertainty information inherent in data, such as fuzziness. To address these limitations, we propose Latent Representation-based Outlier Detection with fuzzy granule (LROD). In LROD, we utilize representation learning to address the challenges encountered by fuzzy information granulation-based methods in high-dimensional data by deriving a compact and effective representation from the original feature space. The reconstruction error of each sample serves as the first component of the outlier score. This error, derived from representation learning, effectively captures global structural abnormal information in the data. Subsequently, we introduce fuzzy information granulation on this new representation to address data uncertainty. The second component is formed by aggregating abnormal information from fuzzy information granules, which are induced by various attribute subsets. Finally, these two components are fused to produce the final outlier score. Experimental results demonstrate that LROD outperforms 20 competing methods across 15 datasets, achieving improvements of 4.5%, 10.5%, and 3.1% in AUC, AP, and G-mean metrics, respectively, compared to the second-best method, validating its superior effectiveness. This study demonstrates the significant benefits of a hybrid method, providing a new framework for fusing global structural information with local uncertainty measures to achieve state-of-the-art performance in outlier detection. The code is publicly available at <span><span>https://github.com/Mxeron/LROD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104571"},"PeriodicalIF":6.9,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Processing & Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1