Pub Date : 2026-06-01Epub Date: 2026-01-23DOI: 10.1016/j.ipm.2025.104593
Guoqi Geng, Qianyi Zhan, Heng-Yang Lu
Stance detection aims to identify users’ attitudes toward specific targets in social media, playing a crucial role in information processing and public opinion management. However, existing research on stance detection often overlooks the potential influence of user personality traits on stance expression. In view of the shortcomings of existing stance detection methods, this paper studies the impact of personality traits on stance judgment. To this end, we propose PERStance, a personality-guided enhanced multimodal zero-shot stance detection method. Specifically, PERStance uses a Large Language Model (LLM) to infer users’ personality traits from a multi-dimensional perspective, thereby more accurately understanding users’ potential stances on specific targets. To mitigate issues such as LLM hallucinations and reasoning confusion, we incorporate the Chain-of-Thought framework in the stance detection stage and optimize its reasoning path. Experimental results on multiple multimodal stance detection datasets show that the PERStance method proposed in this paper achieves the best performance in stance detection, with an average increase of 23.88% in the Macro-F1 score. Ablation experiments verify the effectiveness of each module in this method. The source code of our proposed framework is released at https://github.com/jncsnlp/PERStance.
姿态检测旨在识别用户在社交媒体中对特定目标的态度,在信息处理和舆论管理中起着至关重要的作用。然而,现有的姿态检测研究往往忽略了用户人格特征对姿态表达的潜在影响。针对现有姿态检测方法的不足,本文研究了人格特征对姿态判断的影响。为此,我们提出了一种人格引导的增强多模态零射击姿态检测方法PERStance。具体而言,PERStance使用LLM (Large Language Model)从多维度角度推断用户的人格特征,从而更准确地了解用户对特定目标的潜在立场。为了减轻LLM幻觉和推理混乱等问题,我们将思维链框架纳入姿态检测阶段,并优化其推理路径。在多个多模态姿态检测数据集上的实验结果表明,本文提出的PERStance方法在姿态检测中表现最佳,其Macro-F1得分平均提高23.88%。烧蚀实验验证了该方法中各个模块的有效性。我们提出的框架的源代码发布在https://github.com/jncsnlp/PERStance。
{"title":"PERStance: Personality-guided enhanced multimodal stance detection","authors":"Guoqi Geng, Qianyi Zhan, Heng-Yang Lu","doi":"10.1016/j.ipm.2025.104593","DOIUrl":"10.1016/j.ipm.2025.104593","url":null,"abstract":"<div><div>Stance detection aims to identify users’ attitudes toward specific targets in social media, playing a crucial role in information processing and public opinion management. However, existing research on stance detection often overlooks the potential influence of user personality traits on stance expression. In view of the shortcomings of existing stance detection methods, this paper studies the impact of personality traits on stance judgment. To this end, we propose PERStance, a personality-guided enhanced multimodal zero-shot stance detection method. Specifically, PERStance uses a Large Language Model (LLM) to infer users’ personality traits from a multi-dimensional perspective, thereby more accurately understanding users’ potential stances on specific targets. To mitigate issues such as LLM hallucinations and reasoning confusion, we incorporate the Chain-of-Thought framework in the stance detection stage and optimize its reasoning path. Experimental results on multiple multimodal stance detection datasets show that the PERStance method proposed in this paper achieves the best performance in stance detection, with an average increase of <strong>23.88%</strong> in the Macro-F1 score. Ablation experiments verify the effectiveness of each module in this method. The source code of our proposed framework is released at <span><span>https://github.com/jncsnlp/PERStance</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104593"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-19DOI: 10.1016/j.ipm.2026.104638
Zhishuo Zhang , Hu Liu , Tao Shi , Qian Li , Huayong Niu
This study develops a systematic Measurement–Analysis–Prediction framework to evaluate global digital economy efficiency using data from 114 countries over 2006–2023. Efficiency is decomposed into two stages—infrastructure transformation and value creation and international competitiveness—measured via a super-efficiency sequential Slack-Based Measure (SBM) model. Regional disparities are examined with the Dagum Gini coefficient, and machine learning models are employed for prediction, with Random Forest (RF) identified as the optimal predictor. Results show that global digital economy efficiency has shown a fluctuating upward trend, with Stage 1 (infrastructure transformation) consistently outperforms Stage 2 (value creation). Notably, 2021 marked a significant turning point for infrastructure transformation efficiency, with the efficiency value surging to 0.2883 due to pandemic-induced digital demand. Europe achieves the highest efficiency, while Asia and the Americas exhibit strong internal polarization; overall disparities are driven mainly by net inter-regional gaps. Machine learning predictions indicate efficiency will increase from 0.3210 in 2024 to 0.3566 in 2028, though regional imbalances are expected to persist. Overall, this study provides robust empirical evidence and a comprehensive framework for understanding the transmission mechanisms of digital economy efficiency, interpreting global disparity patterns, and guiding policy formulation.
{"title":"Measuring and forecasting global digital economy efficiency: An integrated approach using the super-efficiency sequential SBM model and machine learning algorithms","authors":"Zhishuo Zhang , Hu Liu , Tao Shi , Qian Li , Huayong Niu","doi":"10.1016/j.ipm.2026.104638","DOIUrl":"10.1016/j.ipm.2026.104638","url":null,"abstract":"<div><div>This study develops a systematic <em>Measurement–Analysis–Prediction</em> framework to evaluate global digital economy efficiency using data from 114 countries over 2006–2023. Efficiency is decomposed into two stages—<em>infrastructure transformation</em> and <em>value creation and international competitiveness</em>—measured via a super-efficiency sequential Slack-Based Measure (SBM) model. Regional disparities are examined with the Dagum Gini coefficient, and machine learning models are employed for prediction, with Random Forest (RF) identified as the optimal predictor. Results show that global digital economy efficiency has shown a fluctuating upward trend, with Stage 1 (infrastructure transformation) consistently outperforms Stage 2 (value creation). Notably, 2021 marked a significant turning point for infrastructure transformation efficiency, with the efficiency value surging to 0.2883 due to pandemic-induced digital demand. Europe achieves the highest efficiency, while Asia and the Americas exhibit strong internal polarization; overall disparities are driven mainly by net inter-regional gaps. Machine learning predictions indicate efficiency will increase from 0.3210 in 2024 to 0.3566 in 2028, though regional imbalances are expected to persist. Overall, this study provides robust empirical evidence and a comprehensive framework for understanding the transmission mechanisms of digital economy efficiency, interpreting global disparity patterns, and guiding policy formulation.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104638"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-20DOI: 10.1016/j.ipm.2026.104635
Lihua Fu , Luo Jin , Yaokuang Li
Artificial intelligence (AI) is driving technological and industrial transformation, reshaping enterprise structure and innovation. Despite its importance, research lacks insight into AI’s impact on innovation ambidexterity and the underexplored role of risk-taking. Drawing on information processing theory, this study constructs a theoretical model to examine the impact of AI applications on enterprise exploration, exploitation, and innovation ambidexterity, as well as incorporates the level of risk-taking as a moderating variable. Using Stata 17 to perform panel data regression and a series of robustness tests, we analyze 1,908 firm-year observations from the information transmission, software, and information technology service industries, as well as the scientific research and technology service industries listed on the Shanghai and Shenzhen Stock Exchanges from 2012 to 2022. The empirical analyses propose that AI applications remarkably enhance innovation ambidexterity at the 1 % level. This positive effect is the most pronounced when risk-taking is moderate. This study extends information processing theory to the AI-enabled innovation context and further enriches its boundary conditions by introducing risk-taking as a nonlinear moderator. Managerially, the findings suggest that enterprises should calibrate risk-taking levels to complement AI deployment, enabling a balanced approach to exploration and exploitation.
{"title":"Artificial intelligence applications and innovation ambidexterity: The “inverted U-shaped” regulating effect of risk-taking","authors":"Lihua Fu , Luo Jin , Yaokuang Li","doi":"10.1016/j.ipm.2026.104635","DOIUrl":"10.1016/j.ipm.2026.104635","url":null,"abstract":"<div><div>Artificial intelligence (AI) is driving technological and industrial transformation, reshaping enterprise structure and innovation. Despite its importance, research lacks insight into AI’s impact on innovation ambidexterity and the underexplored role of risk-taking. Drawing on information processing theory, this study constructs a theoretical model to examine the impact of AI applications on enterprise exploration, exploitation, and innovation ambidexterity, as well as incorporates the level of risk-taking as a moderating variable. Using Stata 17 to perform panel data regression and a series of robustness tests, we analyze 1,908 firm-year observations from the information transmission, software, and information technology service industries, as well as the scientific research and technology service industries listed on the Shanghai and Shenzhen Stock Exchanges from 2012 to 2022. The empirical analyses propose that AI applications remarkably enhance innovation ambidexterity at the 1 % level. This positive effect is the most pronounced when risk-taking is moderate. This study extends information processing theory to the AI-enabled innovation context and further enriches its boundary conditions by introducing risk-taking as a nonlinear moderator. Managerially, the findings suggest that enterprises should calibrate risk-taking levels to complement AI deployment, enabling a balanced approach to exploration and exploitation.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104635"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-22DOI: 10.1016/j.ipm.2026.104627
Paerhati Tulajiang , Jinzhong Ning , Yuanyuan Sun , Liang Yang , Yuanyu Zhang , Kelaiti Xiao , Zhixing Lu , Yijia Zhang , Hongfei Lin
Multilingual named entity recognition (NER) is especially challenging in low-resource and typologically diverse languages, where translation drift, morphological variation, and noisy alignments degrade performance. Existing encoder-based methods often rely on dense attention or uniform alignment, which tends to propagate irrelevant signals across languages. We present SEGA, a lightweight and typology-aware framework that incorporates sparse guided attention to select auxiliary signals, alongside a weighted fusion layer that balances representations between cross-lingual and monolingual contexts. Unlike prior approaches, SEGA requires no parallel corpora and supports fully monolingual inference. We evaluate SEGA on six multilingual NER benchmarks spanning over 60 languages, including CoNLL, WikiANN, MasakhaNER 2.0, XTREME-40, WikiNEuRal, and MultiNERD. SEGA achieves new state-of-the-art results on five datasets, with absolute gains of up to +24.2 F1 over strong encoder baselines, and outperforming prompt-based large language models by up to +18.9 F1 in low-resource scenarios. Efficiency analyses show that SEGA adds only ∼ 30M parameters beyond a standard dual encoder, making it lightweight and deployable on a single GPU. Comprehensive ablation, visualization, and error analyses confirm that SEGA is robust to alignment noise, morphological complexity, and boundary ambiguity, offering a practical and scalable solution for real-world multilingual NER.
{"title":"SEGA: Selective cross-lingual representation via sparse guided attention for low-resource multilingual named entity recognition","authors":"Paerhati Tulajiang , Jinzhong Ning , Yuanyuan Sun , Liang Yang , Yuanyu Zhang , Kelaiti Xiao , Zhixing Lu , Yijia Zhang , Hongfei Lin","doi":"10.1016/j.ipm.2026.104627","DOIUrl":"10.1016/j.ipm.2026.104627","url":null,"abstract":"<div><div>Multilingual named entity recognition (NER) is especially challenging in low-resource and typologically diverse languages, where translation drift, morphological variation, and noisy alignments degrade performance. Existing encoder-based methods often rely on dense attention or uniform alignment, which tends to propagate irrelevant signals across languages. We present SEGA, a lightweight and typology-aware framework that incorporates sparse guided attention to select auxiliary signals, alongside a weighted fusion layer that balances representations between cross-lingual and monolingual contexts. Unlike prior approaches, SEGA requires no parallel corpora and supports fully monolingual inference. We evaluate SEGA on six multilingual NER benchmarks spanning over 60 languages, including CoNLL, WikiANN, MasakhaNER 2.0, XTREME-40, WikiNEuRal, and MultiNERD. SEGA achieves new state-of-the-art results on five datasets, with absolute gains of up to +24.2 F1 over strong encoder baselines, and outperforming prompt-based large language models by up to +18.9 F1 in low-resource scenarios. Efficiency analyses show that SEGA adds only ∼ 30M parameters beyond a standard dual encoder, making it lightweight and deployable on a single GPU. Comprehensive ablation, visualization, and error analyses confirm that SEGA is robust to alignment noise, morphological complexity, and boundary ambiguity, offering a practical and scalable solution for real-world multilingual NER.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104627"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146023254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-02DOI: 10.1016/j.ipm.2025.104596
Kelvin Du , Rui Mao , Frank Xing , Gianmarco Mengaldo , Erik Cambria
Language models have revolutionized information processing, elevating it to new levels and generating opportunities to positively impact our society, e.g., in Environmental, Social, and Governance (ESG) domains. This article surveys the current use of language models for ESG analysis, focusing on their applicable scope, effectiveness, and transformative impact. It highlights how these models facilitate a deeper understanding of ESG practices and impacts by integrating unstructured data while acknowledging existing limitations and challenges. Specifically, based on a review of over ninety ESG studies published since the introduction of Transformers in 2018, we discovered that the potential of language models is particularly notable in four primary themes: (1) ESG Frameworks and Standards, which involve the classification of ESG-related texts into binary categories, coarse-grained ESG factors, or fine-grained ESG topics. This theme also includes identifying ESG topic trends and assessing the alignment of corporate ESG disclosures with sustainable development goals; (2) ESG Reporting and Disclosure, which include ESG narrative processing, ESG reporting assurance and ESG report generation; (3) ESG Measurement and Evaluation, which involves calculating ESG ratings, extracting key performance indicators (KPIs), assessing ESG risks, detecting ESG controversy categories, analyzing ESG impact and duration, and assessing the effects of ESG on sustainable growth and corporate financial performance, among other functions; (4) ESG Integration and Application, aiming to incorporate ESG factors into broader financial applications and thereby innovate financial tasks, including ESG sentiment analysis, ESG chatbots and AI assistants, ESG-based financial risk and credit analysis, and ESG investing strategies. We conclude by emphasizing the significance of language models in advancing ESG studies and discussing future research directions.
{"title":"Language models for environmental, social, and governance analysis: A review","authors":"Kelvin Du , Rui Mao , Frank Xing , Gianmarco Mengaldo , Erik Cambria","doi":"10.1016/j.ipm.2025.104596","DOIUrl":"10.1016/j.ipm.2025.104596","url":null,"abstract":"<div><div>Language models have revolutionized information processing, elevating it to new levels and generating opportunities to positively impact our society, e.g., in Environmental, Social, and Governance (ESG) domains. This article surveys the current use of language models for ESG analysis, focusing on their applicable scope, effectiveness, and transformative impact. It highlights how these models facilitate a deeper understanding of ESG practices and impacts by integrating unstructured data while acknowledging existing limitations and challenges. Specifically, based on a review of over ninety ESG studies published since the introduction of Transformers in 2018, we discovered that the potential of language models is particularly notable in four primary themes: (1) <strong>ESG Frameworks and Standards</strong>, which involve the classification of ESG-related texts into binary categories, coarse-grained ESG factors, or fine-grained ESG topics. This theme also includes identifying ESG topic trends and assessing the alignment of corporate ESG disclosures with sustainable development goals; (2) <strong>ESG Reporting and Disclosure</strong>, which include ESG narrative processing, ESG reporting assurance and ESG report generation; (3) <strong>ESG Measurement and Evaluation</strong>, which involves calculating ESG ratings, extracting key performance indicators (KPIs), assessing ESG risks, detecting ESG controversy categories, analyzing ESG impact and duration, and assessing the effects of ESG on sustainable growth and corporate financial performance, among other functions; (4) <strong>ESG Integration and Application</strong>, aiming to incorporate ESG factors into broader financial applications and thereby innovate financial tasks, including ESG sentiment analysis, ESG chatbots and AI assistants, ESG-based financial risk and credit analysis, and ESG investing strategies. We conclude by emphasizing the significance of language models in advancing ESG studies and discussing future research directions.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104596"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-12DOI: 10.1016/j.ipm.2026.104617
Bin Liu , Jiaqi Han , Zhenyu Zhang , Shijun Li , Haixi Zhang , Yijie Chen , Keqin Li
The demand of large models for data has revitalized information extraction research, particularly for Chinese texts, where semantic isolation poses unique challenges. Existing methods often rely on Chinese word segmentation, but their capacity to capture full semantic meaning is constrained by polysemy, flexible word order, and other unique characteristics of the Chinese language. To address this limitation, we propose three-level semantic division and design CEREM, a prompt- and pointer-based IE network, to extract highly aggregated semantics. In our design, prompts unify multiple IE tasks while preserving semantic interactions, a Segment Information Attention mechanism implicitly aggregates the high-level semantics to enhance Chinese understanding, and an Independent Branches strategy decouples parameters to focus separately on the sub-tasks of start and end index prediction. We evaluate CEREM on four datasets–DiaKG, CMedCausal, Title2Event, and the self-constructed CAIT–covering named entity recognition (NER), relation extraction (RE), and event extraction tasks. CEREM achieves state-of-the-art performance: on CAIT, 88.59% F1 for NER and 71.82% for RE; on DiaKG, 81.77% for NER and 65.44% for RE; and for causal relation extraction on CMedCausal, 45.30% F1. These results demonstrate CEREM’s effectiveness across domains and task types, highlighting its potential as a unified framework for Chinese information extraction.
{"title":"CEREM: A segment-wise attention network for chinese highly aggregated semantic extraction","authors":"Bin Liu , Jiaqi Han , Zhenyu Zhang , Shijun Li , Haixi Zhang , Yijie Chen , Keqin Li","doi":"10.1016/j.ipm.2026.104617","DOIUrl":"10.1016/j.ipm.2026.104617","url":null,"abstract":"<div><div>The demand of large models for data has revitalized information extraction research, particularly for Chinese texts, where semantic isolation poses unique challenges. Existing methods often rely on Chinese word segmentation, but their capacity to capture full semantic meaning is constrained by polysemy, flexible word order, and other unique characteristics of the Chinese language. To address this limitation, we propose three-level semantic division and design CEREM, a prompt- and pointer-based IE network, to extract highly aggregated semantics. In our design, prompts unify multiple IE tasks while preserving semantic interactions, a Segment Information Attention mechanism implicitly aggregates the high-level semantics to enhance Chinese understanding, and an Independent Branches strategy decouples parameters to focus separately on the sub-tasks of start and end index prediction. We evaluate CEREM on four datasets–DiaKG, CMedCausal, Title2Event, and the self-constructed CAIT–covering named entity recognition (NER), relation extraction (RE), and event extraction tasks. CEREM achieves state-of-the-art performance: on CAIT, 88.59% F1 for NER and 71.82% for RE; on DiaKG, 81.77% for NER and 65.44% for RE; and for causal relation extraction on CMedCausal, 45.30% F1. These results demonstrate CEREM’s effectiveness across domains and task types, highlighting its potential as a unified framework for Chinese information extraction.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104617"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-13DOI: 10.1016/j.ipm.2025.104591
Jing Wang , Duantengchuan Li , Xu Du , Hao Li , Zhuang Hu
Visual questions are an important means to evaluate students’ knowledge. Knowledge-based visual question classification can effectively excavate the knowledge intention of the question, and realize the effective organization and management of online question resources at the knowledge level. The existing methods simply regard it as a multimodal classification task, ignoring the capture of implicit knowledge information and the fine-grained interactions between multimodal and multi-granularity features. To mitigate this, we propose a quaternion hypergraph consistent network (QHCN). This approach can not only extract explicit semantic features and implicit knowledge features from text and images simultaneously, but also considers three key properties among explicit-implicit features: modality complementation, modality independence, and knowledge consistency. Specifically, a visual question is represented as a quaternion vector consisting of two modalities and four-dimensional features. To achieve multimodal complementation, the consistency of vision and language guides the construction of a quaternion hypergraph, and a quaternion convolution operator deeply fuses explicit-implicit features. To capture inter-dependencies between explicit-implicit features, the independence loss and knowledge consistency loss are designed to optimize hypergraph network parameters and enhance the hypergraph structure. Extensive experiments on visual question sets verify that our QHCN achieved an accuracy of 94.82% and an F1 score of 94.76%, outperforming the optimal baseline by +1.46% and +1.53%, respectively.
{"title":"Knowledge-based visual question classification using quaternion hypergraph consistent network","authors":"Jing Wang , Duantengchuan Li , Xu Du , Hao Li , Zhuang Hu","doi":"10.1016/j.ipm.2025.104591","DOIUrl":"10.1016/j.ipm.2025.104591","url":null,"abstract":"<div><div>Visual questions are an important means to evaluate students’ knowledge. Knowledge-based visual question classification can effectively excavate the knowledge intention of the question, and realize the effective organization and management of online question resources at the knowledge level. The existing methods simply regard it as a multimodal classification task, ignoring the capture of implicit knowledge information and the fine-grained interactions between multimodal and multi-granularity features. To mitigate this, we propose a quaternion hypergraph consistent network (QHCN). This approach can not only extract explicit semantic features and implicit knowledge features from text and images simultaneously, but also considers three key properties among explicit-implicit features: modality complementation, modality independence, and knowledge consistency. Specifically, a visual question is represented as a quaternion vector consisting of two modalities and four-dimensional features. To achieve multimodal complementation, the consistency of vision and language guides the construction of a quaternion hypergraph, and a quaternion convolution operator deeply fuses explicit-implicit features. To capture inter-dependencies between explicit-implicit features, the independence loss and knowledge consistency loss are designed to optimize hypergraph network parameters and enhance the hypergraph structure. Extensive experiments on visual question sets verify that our QHCN achieved an accuracy of 94.82% and an F1 score of 94.76%, outperforming the optimal baseline by +1.46% and +1.53%, respectively.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104591"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2026-01-16DOI: 10.1016/j.ipm.2026.104613
Memoona Saleem , Zahoor Ur Rehman , Raja Hashim Ali , Ujala Akmal , Ali Zeeshan Ijaz , Raja Manzar Abbas
The recent model and conceptual advancements in Artificial Intelligence (AI) and Natural Language Processing (NLP) are changing how knowledge-intensive tasks are giving more reliable and creative decisions. Particularly, the advancements in NLP has completely changed the way human language is processed, is understood, and is then used for generating human language. These updates are useful for the deeper analysis of textual content that support information systems and their management. For example, in the field of education, these technologies offer advanced and more intelligent tools, which enhance education through improved learning experiences, optimized assessments, and other teaching and study mechanisms. In this paper, we have worked on a unified framework for automatic questions generation (AQG) and educational content analysis. For this purpose, we have developed Questify-TheEduBot, which integrates transformer-based models (BERT, GPT), Latent Dirichlet Allocation (LDA), sentiment analysis, and keyword extraction into a single pipeline. Existing tools typically address isolated tasks but our tool generates multiple types of questions, i.e., multiple choice questions (MCQs), cloze, as well as descriptive type of questions. In addition to AQG, Questify-TheEduBot simultaneously validates semantic coherence, topic coverage, and sentiment appropriateness. We have evaluated and compared our model on SQuAD v2.0 and LearningQ datasets, which consists of over 300,000 Question Answer pairs. Questify-TheEduBot demonstrated excellent performance on the test datasets, with cosine similarity above 0.85, keyword overlap of 87%, and topic modeling precision of 89%. Human evaluation further confirms the pedagogical relevance of generated questions, where our study shows significant improvements over template-based and Seq2Seq competing baseline models. The web-based platform of our tool offers instructors and learners with a tested, interpretable, and resource-efficient tool for automated assessment, support in curriculum development, and enables personalized learning. By merging automated question generation and content analytics, Questify-TheEduBot advances the state of NLP in the field of education, where it provides actionable insights for information management in digital learning environments.
{"title":"An information processing framework for education: Supporting automatic question generation with NLP to minimize human intervention","authors":"Memoona Saleem , Zahoor Ur Rehman , Raja Hashim Ali , Ujala Akmal , Ali Zeeshan Ijaz , Raja Manzar Abbas","doi":"10.1016/j.ipm.2026.104613","DOIUrl":"10.1016/j.ipm.2026.104613","url":null,"abstract":"<div><div>The recent model and conceptual advancements in Artificial Intelligence (AI) and Natural Language Processing (NLP) are changing how knowledge-intensive tasks are giving more reliable and creative decisions. Particularly, the advancements in NLP has completely changed the way human language is processed, is understood, and is then used for generating human language. These updates are useful for the deeper analysis of textual content that support information systems and their management. For example, in the field of education, these technologies offer advanced and more intelligent tools, which enhance education through improved learning experiences, optimized assessments, and other teaching and study mechanisms. In this paper, we have worked on a unified framework for automatic questions generation (AQG) and educational content analysis. For this purpose, we have developed Questify-TheEduBot, which integrates transformer-based models (BERT, GPT), Latent Dirichlet Allocation (LDA), sentiment analysis, and keyword extraction into a single pipeline. Existing tools typically address isolated tasks but our tool generates multiple types of questions, i.e., multiple choice questions (MCQs), cloze, as well as descriptive type of questions. In addition to AQG, Questify-TheEduBot simultaneously validates semantic coherence, topic coverage, and sentiment appropriateness. We have evaluated and compared our model on SQuAD v2.0 and LearningQ datasets, which consists of over 300,000 Question Answer pairs. Questify-TheEduBot demonstrated excellent performance on the test datasets, with cosine similarity above 0.85, keyword overlap of 87%, and topic modeling precision of 89%. Human evaluation further confirms the pedagogical relevance of generated questions, where our study shows significant improvements over template-based and Seq2Seq competing baseline models. The web-based platform of our tool offers instructors and learners with a tested, interpretable, and resource-efficient tool for automated assessment, support in curriculum development, and enables personalized learning. By merging automated question generation and content analytics, Questify-TheEduBot advances the state of NLP in the field of education, where it provides actionable insights for information management in digital learning environments.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104613"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Insufficient use of unlabeled data often leads to inaccurate medical image segmentation, and noise in pseudo-labels can further destabilize training. In this paper, we propose a semi-supervised model based on the SAM2 combined with a bidirectional copy-paste mean teacher model (SemiBCP-SAM2). Specifically, we use a student model to generate segmentation results, which are then used as input prompts for SAM2 to generate additional pseudo-labels, providing auxiliary supervision to guide student learning. We also introduce a Masked Prompt (MP) mechanism that reduces prompt confidence to better handle uncertainty and noise, improving its performance in complex or incomplete information scenarios. Another major contribution is the transplantability of this model that can be achieved by replacing the baseline network in the student-teacher model, and can enhance the performance of other semi-supervised segmentation networks at a lower cost. We conduct comparative experiments and performance evaluations of SemiBCP-SAM2 on the ACDC (100 MRI scans) and PROMISE12 (50 MRI scans) datasets. On ACDC, with 5% and 10% labeled data, SemiBCP-SAM2 improves Dice by 0.29% and 1.16%, and Jaccard by 0.39% and 1.84%. On PROMISE12, with 5% and 20% labeled data, it improves Dice by 1.61% and 2.03%, and Jaccard by 1.99% and 2.79%. Source code is released at https://github.com/ydlam/SemiBCP-SAM2.
{"title":"SemiBCP-SAM2 : Semi-supervised model via enhanced bidirectional copy-paste based on SAM2 for medical image segmentation","authors":"Guangqi Yang , Xiaoxin Guo , Haoran Zhang , Zhenyuan Zheng , Hongliang Dong , Songbai Xu","doi":"10.1016/j.ipm.2025.104576","DOIUrl":"10.1016/j.ipm.2025.104576","url":null,"abstract":"<div><div>Insufficient use of unlabeled data often leads to inaccurate medical image segmentation, and noise in pseudo-labels can further destabilize training. In this paper, we propose a semi-supervised model based on the SAM2 combined with a bidirectional copy-paste mean teacher model (SemiBCP-SAM2). Specifically, we use a student model to generate segmentation results, which are then used as input prompts for SAM2 to generate additional pseudo-labels, providing auxiliary supervision to guide student learning. We also introduce a Masked Prompt (MP) mechanism that reduces prompt confidence to better handle uncertainty and noise, improving its performance in complex or incomplete information scenarios. Another major contribution is the transplantability of this model that can be achieved by replacing the baseline network in the student-teacher model, and can enhance the performance of other semi-supervised segmentation networks at a lower cost. We conduct comparative experiments and performance evaluations of SemiBCP-SAM2 on the ACDC (100 MRI scans) and PROMISE12 (50 MRI scans) datasets. On ACDC, with 5% and 10% labeled data, SemiBCP-SAM2 improves Dice by 0.29% and 1.16%, and Jaccard by 0.39% and 1.84%. On PROMISE12, with 5% and 20% labeled data, it improves Dice by 1.61% and 2.03%, and Jaccard by 1.99% and 2.79%. Source code is released at <span><span>https://github.com/ydlam/SemiBCP-SAM2</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104576"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-06-01Epub Date: 2025-12-31DOI: 10.1016/j.ipm.2025.104574
Xianglin Zhao , Yucheng Jin , Annie Yan Wang , Ming Zhang
Wearable devices provide rich quantitative data for self-reflection on physical activity. However, users often struggle to derive meaningful insights from these data, highlighting the need for enhanced support. To investigate whether Large Language Models (LLMs) can facilitate this process, we propose and evaluate a human-LLM collaborative reflective journaling paradigm. We developed PaceMind, an LLM-mediated journaling system that implements this paradigm based on a three-stage reflection framework. It can generate data-driven drafts and personalized questions to guide users in integrating exercise data with personal insights. A two-week within-subjects study () compared the LLM-mediated system with a template-based journaling baseline. The LLM-mediated design significantly improved the perceived effectiveness of reflection support and increased users’ intention to use the system. However, perceived ease of use did not improve significantly. Users appreciated the LLM’s scaffolding for easing data sense-making, but also reported added cognitive work in verifying and personalizing the LLM-generated content. Although objective activity levels did not change significantly, the LLM-mediated condition showed a trend toward more adaptive exercise planning and sustained engagement. Our findings provide empirical evidence for a human-LLM collaborative reflection paradigm in a data-intensive exercise context. They highlight both the potential to deepen user reflection and underscore the critical design challenge of balancing automation with meaningful cognitive engagement and user control.
{"title":"From tracking to thinking: Facilitating post-exercise reflection by a large language model-mediated journaling system","authors":"Xianglin Zhao , Yucheng Jin , Annie Yan Wang , Ming Zhang","doi":"10.1016/j.ipm.2025.104574","DOIUrl":"10.1016/j.ipm.2025.104574","url":null,"abstract":"<div><div>Wearable devices provide rich quantitative data for self-reflection on physical activity. However, users often struggle to derive meaningful insights from these data, highlighting the need for enhanced support. To investigate whether Large Language Models (LLMs) can facilitate this process, we propose and evaluate a human-LLM collaborative reflective journaling paradigm. We developed <em>PaceMind</em>, an LLM-mediated journaling system that implements this paradigm based on a three-stage reflection framework. It can generate data-driven drafts and personalized questions to guide users in integrating exercise data with personal insights. A two-week within-subjects study (<span><math><mrow><mi>N</mi><mo>=</mo><mn>21</mn></mrow></math></span>) compared the LLM-mediated system with a template-based journaling baseline. The LLM-mediated design significantly improved the perceived effectiveness of reflection support and increased users’ intention to use the system. However, perceived ease of use did not improve significantly. Users appreciated the LLM’s scaffolding for easing data sense-making, but also reported added cognitive work in verifying and personalizing the LLM-generated content. Although objective activity levels did not change significantly, the LLM-mediated condition showed a trend toward more adaptive exercise planning and sustained engagement. Our findings provide empirical evidence for a human-LLM collaborative reflection paradigm in a data-intensive exercise context. They highlight both the potential to deepen user reflection and underscore the critical design challenge of balancing automation with meaningful cognitive engagement and user control.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"63 4","pages":"Article 104574"},"PeriodicalIF":6.9,"publicationDate":"2026-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}