首页 > 最新文献

Information Processing & Management最新文献

英文 中文
Uncertainty-penalized reinforcement learning from human feedback with diversified reward LoRA ensembles 基于人类反馈的不确定性惩罚强化学习
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-10 DOI: 10.1016/j.ipm.2025.104548
Yuanzhao Zhai , Yu Lei , Han Zhang , Yue Yu , Kele Xu , Dawei Feng , Bo Ding , Huaimin Wang
Reinforcement Learning from Human Feedback (RLHF) is a leading technique for aligning large language models (LLMs) with human preferences. However, RLHF often faces a prevalent issue known as overoptimization. This occurs when an optimized LLM generates responses that achieve high reward scores but are ultimately misaligned with human preferences. To address this, we propose Uncertainty-Penalized RLHF (UP-RLHF), a novel framework that incorporates two forms of regularization: uncertainty from reward models and Kullback-Leibler (KL) divergence from the initial policy model. A common method for quantifying uncertainty is to use an ensemble of models. Yet, directly applying ensemble methods to LLM-based reward models is parameter-inefficient and often suffers from a lack of diversity among its members. To overcome these limitations, we introduce a diversified ensemble of low-rank adaptations (LoRA) for reward modeling. This approach provides a parameter-efficient and effective way to quantify reward uncertainty. We conducted extensive experiments on two human preference datasets and one mathematical task. Our evaluation of the reward models demonstrates two key findings: encouraging diversity is crucial for LoRA ensembles, and our diversified LoRA ensembles effectively quantify uncertainty. This method improved the OOD AUROC metric by 44 % for OPT-330M and 31 % for Llama-2-7B, compared to standard LoRA ensembles under identical settings. By integrating this uncertainty regularization, UP-RLHF prevents the LLM policy from producing overestimated, low-quality content. Consequently, our framework mitigates overoptimization and enhances alignment performance. In evaluations, LLMs trained with UP-RLHF outperformed those trained with vanilla RLHF, achieving a 12 % improvement on a summarization task and a 56 % GPT-4 judged win rate on a helpful dialogue task.
基于人类反馈的强化学习(RLHF)是一种将大型语言模型(llm)与人类偏好相匹配的领先技术。然而,RLHF经常面临一个普遍的问题,即过度优化。当优化后的LLM生成的反应获得较高的奖励分数,但最终与人类偏好不一致时,就会出现这种情况。为了解决这个问题,我们提出了不确定性惩罚RLHF (UP-RLHF),这是一个包含两种正则化形式的新框架:奖励模型的不确定性和初始策略模型的Kullback-Leibler (KL)发散。量化不确定性的一种常用方法是使用模型集合。然而,直接将集成方法应用于基于llm的奖励模型是参数效率低下的,并且经常受到其成员之间缺乏多样性的影响。为了克服这些限制,我们引入了一种用于奖励建模的低秩适应(LoRA)的多样化集成。该方法为奖励不确定性的量化提供了一种有效的参数化方法。我们在两个人类偏好数据集和一个数学任务上进行了广泛的实验。我们对奖励模型的评估显示了两个关键发现:鼓励多样性对LoRA组合至关重要,我们多样化的LoRA组合有效地量化了不确定性。与相同设置下的标准LoRA组合相比,该方法将OPT-330M的OOD AUROC指标提高了44%,将lama-2- 7b的OOD AUROC指标提高了31%。通过整合这种不确定性正则化,UP-RLHF可以防止LLM策略产生高估的低质量内容。因此,我们的框架减轻了过度优化并增强了对齐性能。在评估中,用UP-RLHF训练的法学硕士比用香草RLHF训练的法学硕士表现更好,在总结任务上提高了12%,在有用的对话任务上提高了56%的GPT-4判断胜率。
{"title":"Uncertainty-penalized reinforcement learning from human feedback with diversified reward LoRA ensembles","authors":"Yuanzhao Zhai ,&nbsp;Yu Lei ,&nbsp;Han Zhang ,&nbsp;Yue Yu ,&nbsp;Kele Xu ,&nbsp;Dawei Feng ,&nbsp;Bo Ding ,&nbsp;Huaimin Wang","doi":"10.1016/j.ipm.2025.104548","DOIUrl":"10.1016/j.ipm.2025.104548","url":null,"abstract":"<div><div>Reinforcement Learning from Human Feedback (RLHF) is a leading technique for aligning large language models (LLMs) with human preferences. However, RLHF often faces a prevalent issue known as overoptimization. This occurs when an optimized LLM generates responses that achieve high reward scores but are ultimately misaligned with human preferences. To address this, we propose Uncertainty-Penalized RLHF (UP-RLHF), a novel framework that incorporates two forms of regularization: uncertainty from reward models and Kullback-Leibler (KL) divergence from the initial policy model. A common method for quantifying uncertainty is to use an ensemble of models. Yet, directly applying ensemble methods to LLM-based reward models is parameter-inefficient and often suffers from a lack of diversity among its members. To overcome these limitations, we introduce a diversified ensemble of low-rank adaptations (LoRA) for reward modeling. This approach provides a parameter-efficient and effective way to quantify reward uncertainty. We conducted extensive experiments on two human preference datasets and one mathematical task. Our evaluation of the reward models demonstrates two key findings: encouraging diversity is crucial for LoRA ensembles, and our diversified LoRA ensembles effectively quantify uncertainty. This method improved the OOD AUROC metric by 44 % for OPT-330M and 31 % for Llama-2-7B, compared to standard LoRA ensembles under identical settings. By integrating this uncertainty regularization, UP-RLHF prevents the LLM policy from producing overestimated, low-quality content. Consequently, our framework mitigates overoptimization and enhances alignment performance. In evaluations, LLMs trained with UP-RLHF outperformed those trained with vanilla RLHF, achieving a 12 % improvement on a summarization task and a 56 % GPT-4 judged win rate on a helpful dialogue task.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104548"},"PeriodicalIF":6.9,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MetaPFS: Memory-efficient node classification on text-attributed graphs via meta-guided progressive feature selection MetaPFS:通过元引导的渐进式特征选择对文本属性图进行内存高效节点分类
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-10 DOI: 10.1016/j.ipm.2025.104542
Yuewei Zhou , Lina Ni , Zhijie Qu , Xuqiang Li , Jinquan Zhang , Yongquan Liang
Node classification in text-attributed graphs (TAGs) is a critical task in applications such as social networks and recommendation systems. While emerging large language models (LLMs) have improved text attribute modeling, scaling up these models does not directly boost node classification performance and leads to higher memory consumption. Meanwhile, the lack of PLM-style pretraining and the prevailing assumption of static node features in GNNs restrict the applicability of trial-and-error feature selection on TAGs. To this end, we propose the Meta-guided Progressive Feature Selection (MetaPFS) paradigm for TAG node classification. Specifically, we first propose a feature-agnostic warm-up training strategy, rapidly injecting semantic priors via global pooling. Secondly, to mitigate feature subspace misalignment, we propose a progressive feature sharpening strategy that smoothly transitions from global soft selection to approximate hard selection. Finally, to help GNNs adapt to dynamic perturbations of node features during “trial-and-error”, we propose a virtual-task-regularized meta learning strategy, virtualizing the node classification task into multiple tasks. Extensive experiments across six representative datasets demonstrate that MetaPFS exhibits significant competitiveness compared to both LLM-driven and GNN-driven baselines. Specifically, MetaPFS consumes only two-thirds the memory of RevGAT and outperforms LLM-driven state-of-the-art baselines like TAPE and LLaGA by over 2 % on nearly all datasets.
文本属性图(tag)中的节点分类是社交网络和推荐系统等应用中的一项关键任务。虽然新兴的大型语言模型(llm)改进了文本属性建模,但扩展这些模型并不能直接提高节点分类性能,而且会导致更高的内存消耗。同时,缺乏plm风格的预训练和gnn中普遍存在的静态节点特征假设,限制了试错特征选择在标签上的适用性。为此,我们提出了元引导渐进式特征选择(MetaPFS)范式用于TAG节点分类。具体来说,我们首先提出了一种特征不可知的预热训练策略,通过全局池化快速注入语义先验。其次,为了缓解特征子空间偏差,我们提出了一种渐进式特征锐化策略,该策略平滑地从全局软选择过渡到近似硬选择。最后,为了帮助gnn在“试错”过程中适应节点特征的动态扰动,我们提出了一种虚拟任务-正则化元学习策略,将节点分类任务虚拟化为多个任务。在六个代表性数据集上进行的广泛实验表明,与llm驱动和gnn驱动的基线相比,MetaPFS具有显著的竞争力。具体来说,MetaPFS只消耗RevGAT的三分之二的内存,并且在几乎所有数据集上都比llm驱动的最先进的基线(如TAPE和LLaGA)高出2%以上。
{"title":"MetaPFS: Memory-efficient node classification on text-attributed graphs via meta-guided progressive feature selection","authors":"Yuewei Zhou ,&nbsp;Lina Ni ,&nbsp;Zhijie Qu ,&nbsp;Xuqiang Li ,&nbsp;Jinquan Zhang ,&nbsp;Yongquan Liang","doi":"10.1016/j.ipm.2025.104542","DOIUrl":"10.1016/j.ipm.2025.104542","url":null,"abstract":"<div><div>Node classification in text-attributed graphs (TAGs) is a critical task in applications such as social networks and recommendation systems. While emerging large language models (LLMs) have improved text attribute modeling, scaling up these models does not directly boost node classification performance and leads to higher memory consumption. Meanwhile, the lack of PLM-style pretraining and the prevailing assumption of static node features in GNNs restrict the applicability of trial-and-error feature selection on TAGs. To this end, we propose the Meta-guided Progressive Feature Selection (MetaPFS) paradigm for TAG node classification. Specifically, we first propose a feature-agnostic warm-up training strategy, rapidly injecting semantic priors via global pooling. Secondly, to mitigate feature subspace misalignment, we propose a progressive feature sharpening strategy that smoothly transitions from global soft selection to approximate hard selection. Finally, to help GNNs adapt to dynamic perturbations of node features during “trial-and-error”, we propose a virtual-task-regularized meta learning strategy, virtualizing the node classification task into multiple tasks. Extensive experiments across six representative datasets demonstrate that MetaPFS exhibits significant competitiveness compared to both LLM-driven and GNN-driven baselines. Specifically, MetaPFS consumes only two-thirds the memory of RevGAT and outperforms LLM-driven state-of-the-art baselines like TAPE and LLaGA by over 2 % on nearly all datasets.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104542"},"PeriodicalIF":6.9,"publicationDate":"2025-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepMEL: A multi-agent collaboration framework for multimodal entity linking DeepMEL:用于多模态实体链接的多代理协作框架
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-09 DOI: 10.1016/j.ipm.2025.104507
Fang Wang , Tianwei Yan , Zonghao Yang , Minghao Hu , Jun Zhang , Zhunchen Luo , Xiaoying Bai
Multimodal Entity Linking (MEL) aims to associate textual and visual mentions with entities in a multimodal knowledge graph. Despite its importance, current methods face challenges such as incomplete contextual information, coarse cross-modal fusion, and the difficulty of jointly large language models (LLMs) and large visual models (LVMs). To address these issues, we propose DeepMEL, a novel framework based on multi-agent collaborative reasoning, which achieves efficient alignment and disambiguation of textual and visual modalities through a role-specialized division strategy. DeepMEL integrates four specialized agents, namely Modal-Fuser, Candidate-Adapter, Entity-Clozer and Role-Orchestrator, to complete end-to-end cross-modal linking through specialized roles and dynamic coordination. DeepMEL adopts a dual-modal alignment path, and combines the fine-grained text semantics generated by the LLM with the structured image representation extracted by the LVM, significantly narrowing the modal gap. We design an adaptive iteration strategy, combines tool-based retrieval and semantic reasoning capabilities to dynamically optimize the candidate set and balance recall and precision. DeepMEL also unifies MEL tasks into a structured cloze prompt to reduce parsing complexity and enhance semantic comprehension. Extensive experiments on five public benchmark datasets demonstrate that DeepMEL achieves state-of-the-art performance, improving ACC by 1 %-57 %. Ablation studies verify the effectiveness of all modules.
多模态实体链接(MEL)旨在将文本和视觉提及与多模态知识图中的实体关联起来。尽管其重要性,但目前的方法面临着诸如上下文信息不完整、跨模态融合粗糙以及大型语言模型(llm)和大型视觉模型(lvm)难以联合的挑战。为了解决这些问题,我们提出了一种基于多智能体协作推理的新框架DeepMEL,该框架通过角色专门化划分策略实现了文本和视觉模式的有效对齐和消歧。DeepMEL集成了Modal-Fuser、Candidate-Adapter、Entity-Clozer和Role-Orchestrator四个专门的agent,通过专门的角色和动态协调完成端到端的跨模态链接。DeepMEL采用双模态对齐路径,将LLM生成的细粒度文本语义与LVM提取的结构化图像表示相结合,显著缩小了模态差距。我们设计了一种自适应迭代策略,结合基于工具的检索和语义推理能力来动态优化候选集,平衡召回率和准确率。DeepMEL还将MEL任务统一为结构化的完形提示,以降低解析复杂性并增强语义理解。在五个公共基准数据集上进行的大量实验表明,DeepMEL达到了最先进的性能,将ACC提高了1% - 57%。烧蚀研究验证了所有模块的有效性。
{"title":"DeepMEL: A multi-agent collaboration framework for multimodal entity linking","authors":"Fang Wang ,&nbsp;Tianwei Yan ,&nbsp;Zonghao Yang ,&nbsp;Minghao Hu ,&nbsp;Jun Zhang ,&nbsp;Zhunchen Luo ,&nbsp;Xiaoying Bai","doi":"10.1016/j.ipm.2025.104507","DOIUrl":"10.1016/j.ipm.2025.104507","url":null,"abstract":"<div><div>Multimodal Entity Linking (MEL) aims to associate textual and visual mentions with entities in a multimodal knowledge graph. Despite its importance, current methods face challenges such as incomplete contextual information, coarse cross-modal fusion, and the difficulty of jointly large language models (LLMs) and large visual models (LVMs). To address these issues, we propose DeepMEL, a novel framework based on multi-agent collaborative reasoning, which achieves efficient alignment and disambiguation of textual and visual modalities through a role-specialized division strategy. DeepMEL integrates four specialized agents, namely Modal-Fuser, Candidate-Adapter, Entity-Clozer and Role-Orchestrator, to complete end-to-end cross-modal linking through specialized roles and dynamic coordination. DeepMEL adopts a dual-modal alignment path, and combines the fine-grained text semantics generated by the LLM with the structured image representation extracted by the LVM, significantly narrowing the modal gap. We design an adaptive iteration strategy, combines tool-based retrieval and semantic reasoning capabilities to dynamically optimize the candidate set and balance recall and precision. DeepMEL also unifies MEL tasks into a structured cloze prompt to reduce parsing complexity and enhance semantic comprehension. Extensive experiments on five public benchmark datasets demonstrate that DeepMEL achieves state-of-the-art performance, improving ACC by 1 %-57 %. Ablation studies verify the effectiveness of all modules.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104507"},"PeriodicalIF":6.9,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Market potential matters: a novel approach for early identification of breakthrough research 市场潜力很重要:一种早期识别突破性研究的新方法
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-09 DOI: 10.1016/j.ipm.2025.104547
Yiming Li , Xukan Xu
Early identification of breakthrough research with market potential is critical to the development of countries and enterprises. This study proposes a novel approach for early identification of breakthrough research that integrates trajectory changes in science, technology, and market. First, the consolidation-or-destabilization (CD) index is combined with topic modeling to capture potential breakthroughs. Second, the trajectory changes in fine-grained markets corresponding to breakthrough topics are detected from the ecological niche perspective. Finally, breakthrough topics with great prospects in future markets are identified. We select 3D printing as a case study, with datasets consisting of 59,474 papers, 12,720 patents, and 95,553 pieces of enterprise data. The empirical results demonstrate that our method exhibits significant advantages. Specifically, by combining market signals, our method achieved an improvement of 7% in the F1-Score. Moreover, compared to the suboptimal baseline, our method increased the F1-Score by 28%. This study provides practical guidance for identifying breakthroughs with market potential and contributes to reducing the application risk of breakthroughs in subsequent development stages.
及早发现具有市场潜力的突破性研究对国家和企业的发展至关重要。本研究提出了一种整合科学、技术和市场轨迹变化的突破性研究的早期识别新方法。首先,将巩固或不稳定(CD)指数与主题建模相结合,以捕捉潜在的突破。其次,从生态位角度分析突破主题对应的细粒度市场的轨迹变化。最后,确定了具有未来市场前景的突破性课题。我们选择3D打印作为案例研究,数据集包括59,474篇论文,12,720项专利和95,553条企业数据。实证结果表明,该方法具有显著的优势。具体来说,通过结合市场信号,我们的方法在F1-Score中实现了7%的改进。此外,与次优基线相比,我们的方法将F1-Score提高了28%。本研究为识别具有市场潜力的突破提供了实践指导,有助于降低突破在后续开发阶段的应用风险。
{"title":"Market potential matters: a novel approach for early identification of breakthrough research","authors":"Yiming Li ,&nbsp;Xukan Xu","doi":"10.1016/j.ipm.2025.104547","DOIUrl":"10.1016/j.ipm.2025.104547","url":null,"abstract":"<div><div>Early identification of breakthrough research with market potential is critical to the development of countries and enterprises. This study proposes a novel approach for early identification of breakthrough research that integrates trajectory changes in science, technology, and market. First, the consolidation-or-destabilization (CD) index is combined with topic modeling to capture potential breakthroughs. Second, the trajectory changes in fine-grained markets corresponding to breakthrough topics are detected from the ecological niche perspective. Finally, breakthrough topics with great prospects in future markets are identified. We select 3D printing as a case study, with datasets consisting of 59,474 papers, 12,720 patents, and 95,553 pieces of enterprise data. The empirical results demonstrate that our method exhibits significant advantages. Specifically, by combining market signals, our method achieved an improvement of 7% in the <em>F1-Score</em>. Moreover, compared to the suboptimal baseline, our method increased the <em>F1-Score</em> by 28%. This study provides practical guidance for identifying breakthroughs with market potential and contributes to reducing the application risk of breakthroughs in subsequent development stages.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104547"},"PeriodicalIF":6.9,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diagnosing the bias iceberg in large language models: A three-level framework of explicit, evaluative, and implicit gender bias 诊断大型语言模型中的偏见冰山:显性、评估性和隐性性别偏见的三级框架
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-09 DOI: 10.1016/j.ipm.2025.104554
Anling Xiang
Large language models (LLMs) have achieved remarkable performance gains, yet concerns about their embedded biases remain pressing. Existing research often targets a single dimension, lacking systematic comparisons across different types of bias. This study introduces a three-level diagnostic framework—explicit, evaluative, and implicit—to characterize the “bias iceberg” in LLMs. We construct a bilingual dataset (Chinese–English) spanning seven socio-psychological dimensions (appearance, competence, dominance, emotion, leadership, morality, and physicality), comprising approximately 420 minimal-pair sentences and 400 word-association sets, and conduct unified evaluations across eight mainstream models (GPT-4o, Claude-3.7, Gemini-2.5, Grok-3, Qwen-Plus, DeepSeek-v3, Doubao, and Kimi-2). Results reveal that explicit bias remains generally low (mean ≈ 0.0141), though residual disadvantages for women persist in appearance and emotion. Evaluative bias intensifies to a moderate level (mean ≈ 0.0259), with directional divergence in morality and dominance. Implicit bias emerges as the most pronounced (mean ≈ 0.1738, peaking at 0.45), manifesting stable male-anchoring effects in dominance and physicality, with a magnitude 12.3 times greater than explicit bias. Network analysis uncovers three structural archetypes—hyper-centralized, balanced clustering, and de-centralized. Cross-cultural comparisons further show that U.S. models more strongly reproduce “male–power/physicality” associations at evaluative and implicit levels, whereas Chinese models exhibit greater convergence. The proposed framework and dataset are reproducible and cross-culturally adaptable, offering new empirical evidence and structured insights for uncovering and mitigating deep-seated biases in LLMs.
大型语言模型(llm)已经取得了显著的性能提升,但对其嵌入式偏差的担忧仍然很紧迫。现有的研究往往针对单一维度,缺乏不同类型偏见之间的系统比较。本研究引入了一个三级诊断框架——显性、评估性和隐性——来表征法学硕士中的“偏见冰山”。我们构建了一个跨越7个社会心理维度(外表、能力、支配、情感、领导、道德和身体)的双语数据集(中英),包括大约420个最小对句子和400个单词关联集,并在8个主流模型(gpt - 40、Claude-3.7、Gemini-2.5、Grok-3、Qwen-Plus、DeepSeek-v3、豆包和Kimi-2)上进行了统一评估。结果显示,尽管女性在外表和情感上仍然存在劣势,但显性偏见总体上仍然很低(平均值≈0.0141)。评价偏倚加剧到中等水平(平均值≈0.0259),在道德和优势上有方向性分化。内隐偏见最为明显(平均值≈0.1738,峰值为0.45),在优势和身体上表现出稳定的男性锚定效应,其幅度是外显偏见的12.3倍。网络分析揭示了三种结构原型——超集中化、平衡集群化和去中心化。跨文化比较进一步表明,美国模式在评价和隐含层面上更强烈地再现了“男性权力/身体”的关联,而中国模式则表现出更大的趋同。所提出的框架和数据集具有可重复性和跨文化适应性,为发现和减轻法学硕士中根深蒂固的偏见提供了新的经验证据和结构化的见解。
{"title":"Diagnosing the bias iceberg in large language models: A three-level framework of explicit, evaluative, and implicit gender bias","authors":"Anling Xiang","doi":"10.1016/j.ipm.2025.104554","DOIUrl":"10.1016/j.ipm.2025.104554","url":null,"abstract":"<div><div>Large language models (LLMs) have achieved remarkable performance gains, yet concerns about their embedded biases remain pressing. Existing research often targets a single dimension, lacking systematic comparisons across different types of bias. This study introduces a three-level diagnostic framework—explicit, evaluative, and implicit—to characterize the “bias iceberg” in LLMs. We construct a bilingual dataset (Chinese–English) spanning seven socio-psychological dimensions (appearance, competence, dominance, emotion, leadership, morality, and physicality), comprising approximately 420 minimal-pair sentences and 400 word-association sets, and conduct unified evaluations across eight mainstream models (GPT-4o, Claude-3.7, Gemini-2.5, Grok-3, Qwen-Plus, DeepSeek-v3, Doubao, and Kimi-2). Results reveal that explicit bias remains generally low (<em>mean</em> ≈ 0.0141), though residual disadvantages for women persist in appearance and emotion. Evaluative bias intensifies to a moderate level (<em>mean</em> ≈ 0.0259), with directional divergence in morality and dominance. Implicit bias emerges as the most pronounced (<em>mean</em> ≈ 0.1738, peaking at 0.45), manifesting stable male-anchoring effects in dominance and physicality, with a magnitude 12.3 times greater than explicit bias. Network analysis uncovers three structural archetypes—hyper-centralized, balanced clustering, and de-centralized. Cross-cultural comparisons further show that U.S. models more strongly reproduce “male–power/physicality” associations at evaluative and implicit levels, whereas Chinese models exhibit greater convergence. The proposed framework and dataset are reproducible and cross-culturally adaptable, offering new empirical evidence and structured insights for uncovering and mitigating deep-seated biases in LLMs.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104554"},"PeriodicalIF":6.9,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737435","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Jobpocalypse or techno-utopia? geospatially decoding public concerns through the social media noise in AI’s disruption era 就业末日还是科技乌托邦?通过人工智能颠覆时代的社交媒体噪音,从地理空间上解读公众的担忧
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-09 DOI: 10.1016/j.ipm.2025.104550
Jiawei Chen , Hong Chen
In the context of the artificial intelligence (AI) revolution, public perceptions are complex and diverse regarding whether AI signifies a “jobpocalypse” or ushers in a “techno-utopia”. To decode public sentiment and perception regarding AI’s impact on employment, this study captures related public discussion texts (40,299 in total) from Weibo and Douyin. Word cloud visualization presents key public concerns, Word2Vec reveals semantic associations between keywords, and BERTopic analyzes the cognitive focus and thematic characteristics of public attention. Additionally, social media and geographic information are integrated to reveal regional heterogeneity. The research findings indicate: (1) public perceptions show obvious emotional polarity, yet the overall expression tends to be cautious and rational. (2) Public perceptions are multidimensional (10 topics), focusing on human-machine collaboration, technological unemployment, industry applications, and risk expectations. (3) The primary focuses of the two platforms overlap in some areas but also differ in others. (4) An “AI divide” exists across regions. The eastern region emphasizes technological rationality and international comparison, the central region prioritizes technological empowerment and social harmony, while the western region concentrates on unemployment risk and social impact.
在人工智能(AI)革命的背景下,公众对人工智能是意味着“工作末日”还是迎来“技术乌托邦”的看法是复杂而多样的。为了解读公众对人工智能对就业影响的情绪和看法,本研究从微博和抖音上获取了相关的公共讨论文本(共40299条)。词云可视化呈现公众关注的关键问题,Word2Vec揭示关键词之间的语义关联,BERTopic分析公众关注的认知焦点和主题特征。此外,结合社交媒体和地理信息来揭示区域异质性。研究发现:(1)公众认知表现出明显的情绪极性,但整体表达倾向于谨慎和理性。(2)公众认知是多维的(10个主题),主要集中在人机协作、技术失业、行业应用和风险预期等方面。(3)两个平台的主要关注点在某些领域重叠,但在其他领域也有所不同。(4)地区间存在“人工智能鸿沟”。东部地区以技术理性与国际比较为重点,中部地区以技术赋能与社会和谐为重点,西部地区以失业风险与社会影响为重点。
{"title":"Jobpocalypse or techno-utopia? geospatially decoding public concerns through the social media noise in AI’s disruption era","authors":"Jiawei Chen ,&nbsp;Hong Chen","doi":"10.1016/j.ipm.2025.104550","DOIUrl":"10.1016/j.ipm.2025.104550","url":null,"abstract":"<div><div>In the context of the artificial intelligence (AI) revolution, public perceptions are complex and diverse regarding whether AI signifies a “jobpocalypse” or ushers in a “techno-utopia”. To decode public sentiment and perception regarding AI’s impact on employment, this study captures related public discussion texts (40,299 in total) from Weibo and Douyin. Word cloud visualization presents key public concerns, Word2Vec reveals semantic associations between keywords, and BERTopic analyzes the cognitive focus and thematic characteristics of public attention. Additionally, social media and geographic information are integrated to reveal regional heterogeneity. The research findings indicate: (1) public perceptions show obvious emotional polarity, yet the overall expression tends to be cautious and rational. (2) Public perceptions are multidimensional (10 topics), focusing on human-machine collaboration, technological unemployment, industry applications, and risk expectations. (3) The primary focuses of the two platforms overlap in some areas but also differ in others. (4) An “AI divide” exists across regions. The eastern region emphasizes technological rationality and international comparison, the central region prioritizes technological empowerment and social harmony, while the western region concentrates on unemployment risk and social impact.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104550"},"PeriodicalIF":6.9,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DEEL: An imbalanced binary data classification method based on diffusion model data augmentation and multi-objective optimization ensemble 基于扩散模型数据增强和多目标优化集成的非平衡二值数据分类方法
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-09 DOI: 10.1016/j.ipm.2025.104537
Hongwei Ding , Songyu Wang , Xiaoming Yuan , Nana Huang , Xiaohui Cui
Class imbalance remains a major challenge in real-world classification tasks. To address this, we propose Diffusion-Enhanced Ensemble Learning (DEEL), a unified framework that synergistically integrates diffusion-based data augmentation and multi-objective ensemble optimization for binary classification tasks. Specifically, we design a Dynamic Attention Diffusion Model (DADM) to generate diverse and realistic minority class samples through a forward noise and reverse denoising process. By incorporating temporal embeddings, residual connections, and attention mechanisms, DADM enhances the fidelity and distributional alignment of the generated data. Complementing this, an ensemble learning strategy based on the Non-dominated Sorting Genetic Algorithm II (NSGA-II) optimizes the fusion of multiple base classifiers across F1-score, G-mean, and AUC metrics. Extensive experiments on 26 real-world imbalanced datasets demonstrate that DEEL improves average F1-score and G-mean by 21.7 % and 24.8 %, respectively, over competitive baselines. Moreover, visualization and Jensen-Shannon distance analyses quantitatively verify the high diversity and distributional coherence of DADM-generated samples, underscoring their effectiveness for imbalanced learning.
类不平衡仍然是现实世界分类任务中的主要挑战。为了解决这个问题,我们提出了扩散增强集成学习(Diffusion-Enhanced Ensemble Learning, DEEL),这是一个统一的框架,可以协同集成基于扩散的数据增强和用于二分类任务的多目标集成优化。具体而言,我们设计了一个动态注意扩散模型(DADM),通过正向噪声和反向去噪过程生成多样化和逼真的少数类样本。通过结合时间嵌入、剩余连接和注意机制,DADM增强了生成数据的保真度和分布一致性。此外,基于非支配排序遗传算法II (NSGA-II)的集成学习策略优化了F1-score、G-mean和AUC指标上多个基分类器的融合。在26个真实不平衡数据集上进行的大量实验表明,与竞争基线相比,DEEL的平均f1分数和G-mean分别提高了21.7%和24.8%。此外,可视化和Jensen-Shannon距离分析定量地验证了dadm生成的样本的高度多样性和分布一致性,强调了它们对不平衡学习的有效性。
{"title":"DEEL: An imbalanced binary data classification method based on diffusion model data augmentation and multi-objective optimization ensemble","authors":"Hongwei Ding ,&nbsp;Songyu Wang ,&nbsp;Xiaoming Yuan ,&nbsp;Nana Huang ,&nbsp;Xiaohui Cui","doi":"10.1016/j.ipm.2025.104537","DOIUrl":"10.1016/j.ipm.2025.104537","url":null,"abstract":"<div><div>Class imbalance remains a major challenge in real-world classification tasks. To address this, we propose Diffusion-Enhanced Ensemble Learning (DEEL), a unified framework that synergistically integrates diffusion-based data augmentation and multi-objective ensemble optimization for binary classification tasks. Specifically, we design a Dynamic Attention Diffusion Model (DADM) to generate diverse and realistic minority class samples through a forward noise and reverse denoising process. By incorporating temporal embeddings, residual connections, and attention mechanisms, DADM enhances the fidelity and distributional alignment of the generated data. Complementing this, an ensemble learning strategy based on the Non-dominated Sorting Genetic Algorithm II (NSGA-II) optimizes the fusion of multiple base classifiers across F1-score, G-mean, and AUC metrics. Extensive experiments on 26 real-world imbalanced datasets demonstrate that DEEL improves average F1-score and G-mean by 21.7 % and 24.8 %, respectively, over competitive baselines. Moreover, visualization and Jensen-Shannon distance analyses quantitatively verify the high diversity and distributional coherence of DADM-generated samples, underscoring their effectiveness for imbalanced learning.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104537"},"PeriodicalIF":6.9,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing cross-platform academic networking behavior: Methods and insights on institutional affiliations and user clustering 跨平台学术网络行为分析:机构隶属关系和用户聚类的方法和见解
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-09 DOI: 10.1016/j.ipm.2025.104546
Weiwei Yan , Yanyan Wang , Jiahui Song , Yin Zhang
This study investigates cross-platform behavior on academic social networking sites (ASNSs), focusing on differences among users from academic, government, and corporate institutions. Users often engage with multiple ASNSs due to differing platform features and contexts, leading to distinct behavioral patterns. Drawing on data from Academia.edu (ACA) and ResearchGate (RG), this study analyzes user profiles from 15 institutions to identify cross-platform users and compare behaviors. It proposes an approach for identifying such users and develops a cross-platform user behavior indicator system to support the analysis. A clustering analysis further explores behavior patterns and provides additional insights into cross-platform engagement. Findings show that cross-platform users tend to disclose more information, maintain broader networks, and engage more actively on RG than on ACA. Government-affiliated users are the most active, with high levels of disclosure, publication, and interaction. Corporate users exhibit varied strengths and weaknesses, while academic users demonstrate moderate activity. Most academic cross-platform users fall into a “civilian-type” category, sharing fewer publications and presenting inconsistent profile information. In contrast, many government and corporate users are ”star-type,” showing greater consistency and visibility across platforms. This study advances understanding of cross-platform ASNS behavior and reveals sector-based differences that may inform platform design and user strategies.
本研究调查了学术社交网站(ASNSs)上的跨平台行为,重点关注学术、政府和企业机构用户之间的差异。由于不同的平台特性和上下文,用户经常使用多个asn,从而导致不同的行为模式。本研究利用acamia.edu (ACA)和ResearchGate (RG)的数据,分析了来自15所院校的用户资料,以识别跨平台用户并比较其行为。提出了一种识别此类用户的方法,并开发了跨平台的用户行为指标系统来支持分析。聚类分析进一步探索了行为模式,并提供了更多关于跨平台粘性的见解。研究结果显示,与ACA相比,跨平台用户倾向于在RG上披露更多信息,维持更广泛的网络,并更积极地参与其中。与政府有关的用户最为活跃,具有高水平的披露、发布和互动。企业用户表现出不同的优点和缺点,而学术用户则表现出适度的活动。大多数学术跨平台用户属于“平民型”,共享的出版物较少,个人资料不一致。相比之下,许多政府和企业用户是“明星型”,在平台上表现出更大的一致性和可见性。这项研究促进了对跨平台ASNS行为的理解,并揭示了基于行业的差异,这些差异可能会为平台设计和用户策略提供信息。
{"title":"Analyzing cross-platform academic networking behavior: Methods and insights on institutional affiliations and user clustering","authors":"Weiwei Yan ,&nbsp;Yanyan Wang ,&nbsp;Jiahui Song ,&nbsp;Yin Zhang","doi":"10.1016/j.ipm.2025.104546","DOIUrl":"10.1016/j.ipm.2025.104546","url":null,"abstract":"<div><div>This study investigates cross-platform behavior on academic social networking sites (ASNSs), focusing on differences among users from academic, government, and corporate institutions. Users often engage with multiple ASNSs due to differing platform features and contexts, leading to distinct behavioral patterns. Drawing on data from Academia.edu (ACA) and ResearchGate (RG), this study analyzes user profiles from 15 institutions to identify cross-platform users and compare behaviors. It proposes an approach for identifying such users and develops a cross-platform user behavior indicator system to support the analysis. A clustering analysis further explores behavior patterns and provides additional insights into cross-platform engagement. Findings show that cross-platform users tend to disclose more information, maintain broader networks, and engage more actively on RG than on ACA. Government-affiliated users are the most active, with high levels of disclosure, publication, and interaction. Corporate users exhibit varied strengths and weaknesses, while academic users demonstrate moderate activity. Most academic cross-platform users fall into a “civilian-type” category, sharing fewer publications and presenting inconsistent profile information. In contrast, many government and corporate users are ”star-type,” showing greater consistency and visibility across platforms. This study advances understanding of cross-platform ASNS behavior and reveals sector-based differences that may inform platform design and user strategies.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104546"},"PeriodicalIF":6.9,"publicationDate":"2025-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning rules and aligning elements for document-level relation extraction 为文档级关系提取学习规则和对齐元素
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-08 DOI: 10.1016/j.ipm.2025.104511
Ganlin Xu , Jianzhou Feng , Qin Wang
Document-level relation extraction (DocRE) aims to infer semantic relations between entity pairs1 in a document. Generation-based methods for DocRE only learn superficial text patterns from plain text instead of logical rule patterns while generating uncontrolled outputs. Therefore, this paper proposes a novel generative paradigm, a rule learning and elements alignment (RLEA) method for DocRE. We build a symmetrical structure using two T5 models (text learner and rule learner), where the text learner learns text patterns from symbolic triplets, and the rule learner learns rule patterns from chain-like logic rules. To better solve the above challenges, we proposed three key techniques: the bidirectional gate function, the rule regularizer, and the alignment mechanism. The experimental results indicate that our method achieves state-of-the-art results in relation extraction and logical consistency, with RLEA obtaining 72.37, 79.44 and 94.52 on DWIE w.r.t Ign F1, F1 and Logic respectively, 61.94 and 63.96 on DocRED w.r.t Ign F1 and F1 respectively, 76.81 and 77.06 on Re-DocRED w.r.t Ign F1 and F1 respectively. Besides, quantitative experiments and qualitative analysis show how logical rules work on black-box generation-based models2 for DocRE.
文档级关系抽取(DocRE)旨在推断文档中实体对1之间的语义关系。DocRE的基于生成的方法在生成不受控制的输出时,只从纯文本中学习肤浅的文本模式,而不是从逻辑规则模式中学习。因此,本文提出了一种新的生成范式,即规则学习和元素对齐(RLEA)方法。我们使用两个T5模型(文本学习器和规则学习器)构建了一个对称结构,其中文本学习器从符号三元组中学习文本模式,规则学习器从链状逻辑规则中学习规则模式。为了更好地解决上述挑战,我们提出了三种关键技术:双向门函数、规则正则化器和对齐机制。实验结果表明,我们的方法在关系提取和逻辑一致性方面取得了较好的结果,RLEA在DWIE w.r.t Ign F1、F1和Logic上分别获得72.37、79.44和94.52,在DocRED w.r.t Ign F1和F1上分别获得61.94和63.96,在Re-DocRED w.r.t Ign F1和F1上分别获得76.81和77.06。此外,定量实验和定性分析显示了逻辑规则如何在基于黑箱生成的DocRE模型中起作用。
{"title":"Learning rules and aligning elements for document-level relation extraction","authors":"Ganlin Xu ,&nbsp;Jianzhou Feng ,&nbsp;Qin Wang","doi":"10.1016/j.ipm.2025.104511","DOIUrl":"10.1016/j.ipm.2025.104511","url":null,"abstract":"<div><div>Document-level relation extraction (DocRE) aims to infer semantic relations between entity pairs<span><span><sup>1</sup></span></span> in a document. Generation-based methods for DocRE only learn superficial text patterns from plain text instead of logical rule patterns while generating uncontrolled outputs. Therefore, this paper proposes a novel generative paradigm, a rule learning and elements alignment (RLEA) method for DocRE. We build a symmetrical structure using two T5 models (text learner and rule learner), where the text learner learns text patterns from symbolic triplets, and the rule learner learns rule patterns from chain-like logic rules. To better solve the above challenges, we proposed three key techniques: the bidirectional gate function, the rule regularizer, and the alignment mechanism. The experimental results indicate that our method achieves state-of-the-art results in relation extraction and logical consistency, with RLEA obtaining 72.37, 79.44 and 94.52 on DWIE w.r.t <strong>Ign F1, F1</strong> and <strong>Logic</strong> respectively, 61.94 and 63.96 on DocRED w.r.t <strong>Ign F1</strong> and <strong>F1</strong> respectively, 76.81 and 77.06 on Re-DocRED w.r.t <strong>Ign F1</strong> and <strong>F1</strong> respectively. Besides, quantitative experiments and qualitative analysis show how logical rules work on black-box generation-based models<span><span><sup>2</sup></span></span> for DocRE.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104511"},"PeriodicalIF":6.9,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Does fair ranking lead to fair recruitment outcomes? A study of interventions, interfaces, and interactions 公平的排名会带来公平的招聘结果吗?对干预、接口和相互作用的研究
IF 6.9 1区 管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-08 DOI: 10.1016/j.ipm.2025.104506
Alessandro Fabris , Clara Rus , Jorge Saldivar , Anna Gatzioura , Asia J. Biega , Carlos Castillo
Personnel recruitment is increasingly mediated by Applicant Tracking Systems (ATS), which rank candidates for job positions, making them a central decision-support tool in modern Human Resources (HR) processes. Often framed as an information retrieval (IR) problem, the ranking of candidates in ATS is typically driven by relevance to the job position, with algorithms sorting applicants according to a set of predefined criteria. In recent years, fairness-aware ranking methods have emerged to mitigate the risk of indirect discrimination, where the ordering of candidates may inadvertently favor one demographic group over another. These approaches are inspired by browsing models developed for web search and aim to balance candidate exposure based on protected characteristics. However, ATS in recruitment introduce unique challenges due to their high-stakes nature and the decision-making context in which they operate. In this paper, we present a series of user studies that explore the disconnect between fair exposure and fair outcomes in candidate shortlisting. We focus on how factors such as task design (e.g., how recruiters interact with candidate lists), individual representations of candidates (e.g., national origin cues), and ranking order influence both position bias and demographic balance. Our findings show that while demographic balance may be achieved in terms of ranking visibility, this does not necessarily translate to fair outcomes in terms of who gets shortlisted. Through a crowdsourced experiment and in-depth interviews with recruiters, we identify key task-level, individual, and ranking factors that mediate these effects. We conclude that fairness in ATS rankings is contingent not only on algorithmic design but also on the shortlisting tasks they support, as well as the interfaces, strategies, and assumptions that recruiters use when interacting with candidate lists. Based on these insights, we provide implications for the design of algorithms, interfaces, and recruitment processes that support fairer and more equitable recruitment outcomes.
人事招聘越来越多地由申请人跟踪系统(ATS)调解,该系统对职位候选人进行排名,使其成为现代人力资源(HR)流程中的核心决策支持工具。ATS中的候选人排名通常是由与工作职位的相关性驱动的,算法根据一组预定义的标准对申请人进行排序,这通常被定义为信息检索(IR)问题。近年来,注重公平的排名方法已经出现,以减轻间接歧视的风险,在这种情况下,候选人的排序可能会无意中偏袒一个人口群体而不是另一个群体。这些方法的灵感来自于为网络搜索开发的浏览模型,旨在平衡基于受保护特征的候选暴露。然而,ATS在招聘中由于其高风险性质和运作的决策环境而带来了独特的挑战。在本文中,我们提出了一系列用户研究,探讨公平曝光和候选人入围公平结果之间的脱节。我们关注任务设计(例如,招聘人员如何与候选人列表互动)、候选人的个人表现(例如,国籍线索)和排名顺序等因素如何影响职位偏见和人口平衡。我们的研究结果表明,虽然在排名可见性方面可以实现人口平衡,但这并不一定意味着在谁入围方面的公平结果。通过众包实验和与招聘人员的深度访谈,我们确定了介导这些影响的关键任务级别、个人和排名因素。我们得出结论,ATS排名的公平性不仅取决于算法设计,还取决于它们支持的候选任务,以及招聘人员在与候选人列表互动时使用的界面、策略和假设。基于这些见解,我们为算法、界面和招聘流程的设计提供了启示,以支持更公平、更公平的招聘结果。
{"title":"Does fair ranking lead to fair recruitment outcomes? A study of interventions, interfaces, and interactions","authors":"Alessandro Fabris ,&nbsp;Clara Rus ,&nbsp;Jorge Saldivar ,&nbsp;Anna Gatzioura ,&nbsp;Asia J. Biega ,&nbsp;Carlos Castillo","doi":"10.1016/j.ipm.2025.104506","DOIUrl":"10.1016/j.ipm.2025.104506","url":null,"abstract":"<div><div>Personnel recruitment is increasingly mediated by Applicant Tracking Systems (ATS), which rank candidates for job positions, making them a central decision-support tool in modern Human Resources (HR) processes. Often framed as an information retrieval (IR) problem, the ranking of candidates in ATS is typically driven by relevance to the job position, with algorithms sorting applicants according to a set of predefined criteria. In recent years, fairness-aware ranking methods have emerged to mitigate the risk of indirect discrimination, where the ordering of candidates may inadvertently favor one demographic group over another. These approaches are inspired by browsing models developed for web search and aim to balance candidate exposure based on protected characteristics. However, ATS in recruitment introduce unique challenges due to their high-stakes nature and the decision-making context in which they operate. In this paper, we present a series of user studies that explore the disconnect between <em>fair exposure</em> and <em>fair outcomes</em> in candidate shortlisting. We focus on how factors such as task design (e.g., how recruiters interact with candidate lists), individual representations of candidates (e.g., national origin cues), and ranking order influence both position bias and demographic balance. Our findings show that while demographic balance may be achieved in terms of ranking visibility, this does not necessarily translate to fair outcomes in terms of who gets shortlisted. Through a crowdsourced experiment and in-depth interviews with recruiters, we identify key task-level, individual, and ranking factors that mediate these effects. We conclude that fairness in ATS rankings is contingent not only on algorithmic design but also on the shortlisting tasks they support, as well as the interfaces, strategies, and assumptions that recruiters use when interacting with candidate lists. Based on these insights, we provide implications for the design of algorithms, interfaces, and recruitment processes that support fairer and more equitable recruitment outcomes.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 3","pages":"Article 104506"},"PeriodicalIF":6.9,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145737423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Processing & Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1