首页 > 最新文献

Information Processing & Management最新文献

英文 中文
Deep expertise and interest personalized transformer for expert finding 深厚的专业知识和兴趣是专家发现的个性化转换器
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-24 DOI: 10.1016/j.ipm.2024.103773
Yinghui Wang , Qiyao Peng , Hongtao Liu , Hongyan Xu , Minglai Shao , Wenjun Wang

Most existing expert finding methods in Community Question Answering (CQA) websites typically determine an expert’s suitability for answering one question based on their past answered questions. However, experts’ interests evolve over time, and their abilities to address questions vary. Consequently, effectively capturing the diverse interests and expertise of experts from their historical records poses a challenge due to dynamic preferences and varying abilities. In this paper, we propose an expert finding framework, which aims to capture experts’ diverse expertise and temporal-aware interests from their past answered questions. Specifically, we encode the timestamp and vote score information of each question answered by the expert for modeling their interests and expertise. Then, we design a personalized transformer encoder to effectively learn the inherent representation based on the expert’s historical answering behaviors. We further design an additive attention-based interaction encoder to dynamically capture the relevance between a target question and an expert’s historical answered questions. We conduct experiments on six real-world CQA datasets from StackExchange, the largest of which contains 8921912 questions and 687213 answerers. Experimental results show that on the metric P@1, compared with the best baseline methods, our method has achieved 3.3%–15.6% performance improvement.

社区问题解答(CQA)网站中现有的大多数专家查找方法通常是根据专家过去回答过的问题来确定其是否适合回答某个问题。然而,专家的兴趣会随着时间的推移而变化,他们回答问题的能力也各不相同。因此,从专家的历史记录中有效捕捉专家的不同兴趣和专长是一项挑战,因为专家的偏好和能力是动态变化的。在本文中,我们提出了一个专家查找框架,旨在从专家过去回答过的问题中捕捉他们的不同专长和时间感知兴趣。具体来说,我们对专家回答的每个问题的时间戳和投票得分信息进行编码,以建立专家的兴趣和专长模型。然后,我们设计了一个个性化的变换器编码器,以根据专家的历史回答行为有效地学习内在表示。我们进一步设计了一种基于注意力的附加交互编码器,以动态捕捉目标问题与专家历史回答问题之间的相关性。我们在来自 StackExchange 的六个真实 CQA 数据集上进行了实验,其中最大的数据集包含 8921912 个问题和 687213 个回答者。实验结果表明,在指标 P@1 上,与最佳基准方法相比,我们的方法提高了 3.3%-15.6% 的性能。
{"title":"Deep expertise and interest personalized transformer for expert finding","authors":"Yinghui Wang ,&nbsp;Qiyao Peng ,&nbsp;Hongtao Liu ,&nbsp;Hongyan Xu ,&nbsp;Minglai Shao ,&nbsp;Wenjun Wang","doi":"10.1016/j.ipm.2024.103773","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103773","url":null,"abstract":"<div><p>Most existing expert finding methods in Community Question Answering (CQA) websites typically determine an expert’s suitability for answering one question based on their past answered questions. However, experts’ interests evolve over time, and their abilities to address questions vary. Consequently, effectively capturing the diverse interests and expertise of experts from their historical records poses a challenge due to dynamic preferences and varying abilities. In this paper, we propose an expert finding framework, which aims to capture experts’ diverse expertise and temporal-aware interests from their past answered questions. Specifically, we encode the timestamp and vote score information of each question answered by the expert for modeling their interests and expertise. Then, we design a personalized transformer encoder to effectively learn the inherent representation based on the expert’s historical answering behaviors. We further design an additive attention-based interaction encoder to dynamically capture the relevance between a target question and an expert’s historical answered questions. We conduct experiments on six real-world CQA datasets from StackExchange, the largest of which contains 8921912 questions and 687213 answerers. Experimental results show that on the metric P@1, compared with the best baseline methods, our method has achieved 3.3%–15.6% performance improvement.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reviewing 25 years of continuous sign language recognition research: Advances, challenges, and prospects 回顾 25 年来持续不断的手语识别研究:进展、挑战和前景
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-24 DOI: 10.1016/j.ipm.2024.103774
Sarah Alyami , Hamzah Luqman , Mohammad Hammoudeh

Sign language is a form of visual communication employing hand gestures, body movements, and facial expressions. The growing prevalence of hearing impairment has driven the research community towards the domain of Continuous Sign Language Recognition (CSLR), which involves identification of successive signs in a video stream without prior knowledge of temporal boundaries. This survey article conducts a review of CSLR research, spanning the past 25 years, offering insights into the evolution of CSLR systems. A critical analysis of 126 studies is presented and organized into a taxonomy comprising seven critical dimensions: sign language, data acquisition, input modality, sign language cues, recognition techniques, utilized datasets, and overall performance. Additionally, the article investigated the classification of deep-learning CSLR models, categorizing them based on spatial, temporal, and alignment methods, while identifying their advantages and limitations. The article also explored various research aspects including CSLR challenges, the significance of non-manual features in CSLR systems, and identified gaps in existing literature. This literature taxonomy serves as a resource aiding researchers in the development and positioning of novel CSLR techniques. The study emphasizes the efficacy of multi-modal deep learning systems in capturing diverse sign language cues. However, the examination of existing research uncovers numerous limitations, calling for continued research and innovation within the CSLR domain. The findings not only contribute to the broader understanding of sign language recognition but also lay the foundations for future research initiatives aimed at addressing the persistent challenges within this emerging field.

手语是一种利用手势、身体动作和面部表情进行视觉交流的形式。听力障碍的日益普遍推动了研究界对连续手语识别(CSLR)领域的研究,该领域涉及在不预先知道时间界限的情况下识别视频流中的连续手势。这篇调查文章对过去 25 年的 CSLR 研究进行了回顾,深入探讨了 CSLR 系统的演变。文章对 126 项研究进行了批判性分析,并将其归纳为包含七个关键维度的分类法:手语、数据采集、输入模式、手语线索、识别技术、使用的数据集和整体性能。此外,文章还研究了深度学习 CSLR 模型的分类,根据空间、时间和排列方法对这些模型进行了分类,同时确定了它们的优势和局限性。文章还探讨了 CSLR 面临的挑战、非人工特征在 CSLR 系统中的意义等多个研究方面,并确定了现有文献中存在的空白。该文献分类法可作为一种资源,帮助研究人员开发和定位新型 CSLR 技术。研究强调了多模态深度学习系统在捕捉各种手语线索方面的功效。然而,对现有研究的审查发现了许多局限性,这就要求在 CSLR 领域继续进行研究和创新。研究结果不仅有助于加深对手语识别的理解,还为未来旨在解决这一新兴领域长期存在的挑战的研究计划奠定了基础。
{"title":"Reviewing 25 years of continuous sign language recognition research: Advances, challenges, and prospects","authors":"Sarah Alyami ,&nbsp;Hamzah Luqman ,&nbsp;Mohammad Hammoudeh","doi":"10.1016/j.ipm.2024.103774","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103774","url":null,"abstract":"<div><p>Sign language is a form of visual communication employing hand gestures, body movements, and facial expressions. The growing prevalence of hearing impairment has driven the research community towards the domain of Continuous Sign Language Recognition (CSLR), which involves identification of successive signs in a video stream without prior knowledge of temporal boundaries. This survey article conducts a review of CSLR research, spanning the past 25 years, offering insights into the evolution of CSLR systems. A critical analysis of 126 studies is presented and organized into a taxonomy comprising seven critical dimensions: sign language, data acquisition, input modality, sign language cues, recognition techniques, utilized datasets, and overall performance. Additionally, the article investigated the classification of deep-learning CSLR models, categorizing them based on spatial, temporal, and alignment methods, while identifying their advantages and limitations. The article also explored various research aspects including CSLR challenges, the significance of non-manual features in CSLR systems, and identified gaps in existing literature. This literature taxonomy serves as a resource aiding researchers in the development and positioning of novel CSLR techniques. The study emphasizes the efficacy of multi-modal deep learning systems in capturing diverse sign language cues. However, the examination of existing research uncovers numerous limitations, calling for continued research and innovation within the CSLR domain. The findings not only contribute to the broader understanding of sign language recognition but also lay the foundations for future research initiatives aimed at addressing the persistent challenges within this emerging field.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretable software estimation with graph neural networks and orthogonal array tunning method 利用图神经网络和正交阵列调谐法进行可解释软件估算
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-24 DOI: 10.1016/j.ipm.2024.103778
Nevena Rankovic , Dragica Rankovic , Mirjana Ivanovic , Jelena Kaljevic

Software estimation rates are still suboptimal regarding efficiency, runtime, and the accuracy of model predictions. Graph Neural Networks (GNNs) are complex, yet their precise forecasting reduces the gap between expected and actual software development efforts, thereby minimizing associated risks. However, defining optimal hyperparameter configurations remains a challenge. This paper compares state-of-the-art models such as Long-Short-Term-Memory (LSTM), Graph Gated Neural Networks (GGNN), and Graph Gated Sequence Neural Networks (GGSNN), and conducts experiments with various hyperparameter settings to optimize performance. We also aim to gain the most informative feedback from our models by exploring insights using a post-hoc agnostic method like Shapley Additive Explanations (SHAP). Our findings indicate that the Taguchi orthogonal array optimization method is the most computationally efficient, yielding notably improved performance metrics. This suggests a compromise between computational efficiency and prediction accuracy while still requiring the lowest number of runnings, with an RMSE of 0.9211 and an MAE of 310.4. For the best-performing model, the GGSNN model, within the Constructive Cost Model (COCOMO), Function Point Analysis (FPA), and Use Case Points (UCP) frameworks, applying the SHAP method leads to a more accurate determination of relevance, as evidenced by the norm reduction in activation vectors. The SHAP method stands out by exhibiting the smallest area under the curve and faster convergence, indicating its efficiency in pinpointing concept relevance.

在效率、运行时间和模型预测的准确性方面,软件估算率仍未达到最佳水平。图神经网络(GNN)非常复杂,但它的精确预测可以缩小预期与实际软件开发工作量之间的差距,从而最大限度地降低相关风险。然而,确定最佳超参数配置仍然是一项挑战。本文比较了长短期记忆(LSTM)、图门控神经网络(GGNN)和图门控序列神经网络(GGSNN)等最先进的模型,并进行了各种超参数设置实验,以优化性能。我们的目标还包括通过使用 Shapley Additive Explanations (SHAP) 等事后不可知方法来探索洞察力,从而从模型中获得最翔实的反馈信息。我们的研究结果表明,田口正交阵列优化方法的计算效率最高,性能指标明显改善。这表明在计算效率和预测准确性之间达成了折衷,同时所需的运行次数仍然最少,RMSE 为 0.9211,MAE 为 310.4。对于构造成本模型 (COCOMO)、功能点分析 (FPA) 和用例点 (UCP) 框架中表现最好的 GGSNN 模型,应用 SHAP 方法可以更准确地确定相关性,激活向量的规范减少就是证明。SHAP 方法的突出特点是曲线下面积最小,收敛速度更快,这表明它在精确定位概念相关性方面非常高效。
{"title":"Interpretable software estimation with graph neural networks and orthogonal array tunning method","authors":"Nevena Rankovic ,&nbsp;Dragica Rankovic ,&nbsp;Mirjana Ivanovic ,&nbsp;Jelena Kaljevic","doi":"10.1016/j.ipm.2024.103778","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103778","url":null,"abstract":"<div><p>Software estimation rates are still suboptimal regarding efficiency, runtime, and the accuracy of model predictions. Graph Neural Networks (GNNs) are complex, yet their precise forecasting reduces the gap between expected and actual software development efforts, thereby minimizing associated risks. However, defining optimal hyperparameter configurations remains a challenge. This paper compares state-of-the-art models such as Long-Short-Term-Memory (LSTM), Graph Gated Neural Networks (GGNN), and Graph Gated Sequence Neural Networks (GGSNN), and conducts experiments with various hyperparameter settings to optimize performance. We also aim to gain the most informative feedback from our models by exploring insights using a post-hoc agnostic method like Shapley Additive Explanations (SHAP). Our findings indicate that the Taguchi orthogonal array optimization method is the most computationally efficient, yielding notably improved performance metrics. This suggests a compromise between computational efficiency and prediction accuracy while still requiring the lowest number of runnings, with an RMSE of 0.9211 and an MAE of 310.4. For the best-performing model, the GGSNN model, within the Constructive Cost Model (COCOMO), Function Point Analysis (FPA), and Use Case Points (UCP) frameworks, applying the SHAP method leads to a more accurate determination of relevance, as evidenced by the norm reduction in activation vectors. The SHAP method stands out by exhibiting the smallest area under the curve and faster convergence, indicating its efficiency in pinpointing concept relevance.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141090430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation FUMMER:用于多模态推荐的细粒度自监督动量蒸馏框架
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-22 DOI: 10.1016/j.ipm.2024.103776
Yibiao Wei , Yang Xu , Lei Zhu , Jingwei Ma , Jiangping Huang

The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel Fine-grained Self-supervised Mom entum Distillation Framework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at https://github.com/BIAOBIAO12138/FUMMER.

多模态数据所包含的大量语义信息越来越受到业界和学术界的重视。为了有效利用多模态信息,现有的多模态推荐方法主要是建立多模态辅助图来改进用户和项目的表示。然而,由于多模态数据的值密度较弱,不可避免地会产生严重的噪声问题,从而难以有效利用多模态内容中的有价值信息。为解决这一问题,我们提出了一种用于多模态推荐的新型细粒度自监督矩阵蒸馏框架(FUMMER)。具体来说,我们提出了一种基于变换器的细粒度特征提取器(TFFE)和一种动量蒸馏(MoD)结构,该结构结合了模式内和模式间的对比学习,可对 TFFE 进行充分预训练,以进行细粒度特征提取。此外,我们还设计了一个结构感知细粒度对比学习模块,以充分利用来自细粒度结构特征的自监督信号。在三个真实世界数据集上进行的广泛实验表明,我们的方法优于最先进的多模态推荐方法。进一步的实验验证了我们提出的细粒度特征提取方法可以作为预训练模型,通过学习项目的细粒度特征表征来有效提高推荐方法的性能。代码可在 https://github.com/BIAOBIAO12138/FUMMER 公开获取。
{"title":"FUMMER: A fine-grained self-supervised momentum distillation framework for multimodal recommendation","authors":"Yibiao Wei ,&nbsp;Yang Xu ,&nbsp;Lei Zhu ,&nbsp;Jingwei Ma ,&nbsp;Jiangping Huang","doi":"10.1016/j.ipm.2024.103776","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103776","url":null,"abstract":"<div><p>The considerable semantic information contained in multimodal data is increasingly appreciated by industry and academia. To effectively leverage multimodal information, existing multimodal recommendation methods mainly build multimodal auxiliary graphs to improve the representation of users and items. However, the weak value density of multimodal data inevitably leads to serious noise issues, making it difficult to effectively exploit valuable information from the multimodal contents. To address this issue, we propose a novel <u>F</u>ine-grained Self-s<u>u</u>pervised <u>M</u>o<u>m</u> <u>e</u>ntum Distillation F<u>r</u>amework (FUMMER) for multimodal recommendations. Specifically, we propose a Transformer-based Fine-grained Feature Extractor (TFFE) and a Momentum Distillation (MoD) structure that incorporates intra- and inter-modal contrastive learning to fully pre-train TFFE for fine-grained feature extraction. Additionally, we design a structure-aware fine-grained contrastive learning module to fully exploit the self-supervised signals from fine-grained structural features. Extensive experiments on three real-world datasets show that our method outperforms state-of-the-art multimodal recommendation methods. Further experiments verify that the fine-grained feature extraction method we propose can serve as a pre-trained model, enhancing the performance of recommendation methods effectively by learning the fine-grained feature representations of items. The code is publicly available at <span>https://github.com/BIAOBIAO12138/FUMMER</span><svg><path></path></svg>.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141084419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predicting users’ future interests on social networks: A reference framework 预测用户在社交网络上的未来兴趣:参考框架
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-21 DOI: 10.1016/j.ipm.2024.103765
Fattane Zarrinkalam , Havva Alizadeh Noughabi , Zeinab Noorian , Hossein Fani , Ebrahim Bagheri

Predicting users’ interests on social networks is gaining attention due to its potential to cater customized information and services to the end users. Although previous works have extensively explored how users’ interests can be modeled on social networks, there has been limited investigation into the prediction of users’ future interests. The objective of our work in this paper is to empirically study the effectiveness of different sets of features based on users’ past social interactions, historical interests and their temporal dynamics to predict their interests over a collection of future-yet-unobserved topics. More specifically, we introduce and formalize the features for interest prediction in four categories: user-based, topical, explicit user-topic engagement, and friends’ influence. We further explore the influence of temporality by augmenting features with information pertaining to users’ historical interests and social connections. We model the task of future interest prediction as a learning-to-rank problem where different features and their related categories are ranked based on their relevance and performance in interest prediction, and investigate the efficiency of different features individually and comparatively for predicting the future interest of users with different activity levels in social networks over on unobserved topics. After conducting experiments on a real-world dataset sourced from Twitter, we have identified several noteworthy findings: (1) relevance feature in the category of past explicit user-topic engagement is the strongest indicator for predicting user’s future interest across all user groups, with an observed 8.57% decrease in NDCG and an 8.95% decrease in MAP when it is removed in the ablation study. (2) the observation of an 8.06% decrease in NDCG and a 7.3% decrease in MAP, when topical features such as popularity, freshness, and coherence are removed in the ablation study, highlights their significance as among the strongest indicators for users’ future interest, particularly for low-active users. (3) although temporal features show a clear positive impact across user groups with varying levels of activity (resulting in a 4.5% decrease in NDCG and a 7.3% decrease in MAP when removed in the ablation study), the temporal topical features do not demonstrate a significant positive effect, and 4) The removal of user-specific characteristics such as influence and personality traits in the ablation study reveals their significant impact in predicting future interest over cold topics, reflected by a 5.49% decrease in NDCG and a 5.72% decrease in MAP. Our findings make significant contributions to the field of future interest prediction, offering valuable insights and practical implications for various applications in social network analysis.

在社交网络上预测用户兴趣因其可为最终用户提供定制信息和服务而日益受到关注。尽管之前的研究已经广泛探讨了如何在社交网络上建立用户兴趣模型,但对用户未来兴趣预测的研究还很有限。本文的工作目标是根据用户过去的社交互动、历史兴趣及其时间动态,实证研究不同特征集在预测用户对一系列未来尚未观察到的主题的兴趣方面的有效性。更具体地说,我们介绍并正式确定了四类兴趣预测特征:基于用户的特征、话题特征、明确的用户话题参与特征和朋友影响特征。通过使用与用户历史兴趣和社交关系相关的信息来增强特征,我们进一步探索了时间性的影响。我们将未来兴趣预测任务建模为一个 "学习排名"(learning-to-rank)问题,根据不同特征及其相关类别在兴趣预测中的相关性和性能对其进行排名,并研究不同特征在预测社交网络中不同活动水平的用户对未观察到的主题的未来兴趣时的单独效率和比较效率。在对来自 Twitter 的真实世界数据集进行实验后,我们发现了几个值得注意的发现:(1)在所有用户群体中,过去明确的用户话题参与类别中的相关性特征是预测用户未来兴趣的最强指标,在消减研究中去除该特征后,观察到 NDCG 下降了 8.57%,MAP 下降了 8.95%。(2) 在消减研究中,当去除流行度、新鲜度和连贯性等主题特征时,观察到 NDCG 下降了 8.06%,MAP 下降了 7.3%,这凸显了它们作为用户未来兴趣的最强指标之一的重要性,尤其是对于低活跃度用户而言。(3)虽然在不同活跃度的用户群体中,时间特征显示出明显的积极影响(在消减研究中去除这些特征后,NDCG 下降了 4.5%,MAP 下降了 7.3%),但时间主题特征并没有显示出显著的积极影响,以及 4)在消减研究中去除用户特定特征(如影响力和个性特征)后,发现它们在预测用户对冷门话题的未来兴趣方面具有显著影响,NDCG 下降了 5.49%,MAP 下降了 5.72%。我们的研究结果为未来兴趣预测领域做出了重大贡献,为社交网络分析的各种应用提供了宝贵的见解和实际意义。
{"title":"Predicting users’ future interests on social networks: A reference framework","authors":"Fattane Zarrinkalam ,&nbsp;Havva Alizadeh Noughabi ,&nbsp;Zeinab Noorian ,&nbsp;Hossein Fani ,&nbsp;Ebrahim Bagheri","doi":"10.1016/j.ipm.2024.103765","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103765","url":null,"abstract":"<div><p>Predicting users’ interests on social networks is gaining attention due to its potential to cater customized information and services to the end users. Although previous works have extensively explored how users’ interests can be modeled on social networks, there has been limited investigation into the prediction of users’ future interests. The objective of our work in this paper is to empirically study the effectiveness of different sets of features based on users’ past social interactions, historical interests and their temporal dynamics to predict their interests over a collection of future-yet-unobserved topics. More specifically, we introduce and formalize the features for interest prediction in four categories: <em>user-based</em>, <em>topical</em>, <em>explicit user-topic engagement</em>, and <em>friends’ influence</em>. We further explore the influence of temporality by augmenting features with information pertaining to users’ historical interests and social connections. We model the task of future interest prediction as a learning-to-rank problem where different features and their related categories are ranked based on their relevance and performance in interest prediction, and investigate the efficiency of different features individually and comparatively for predicting the future interest of users with different activity levels in social networks over on unobserved topics. After conducting experiments on a real-world dataset sourced from Twitter, we have identified several noteworthy findings: (1) <span>relevance</span> feature in the category of past explicit user-topic engagement is the strongest indicator for predicting user’s future interest across all user groups, with an observed 8.57% decrease in NDCG and an 8.95% decrease in MAP when it is removed in the ablation study. (2) the observation of an 8.06% decrease in NDCG and a 7.3% decrease in MAP, when topical features such as <span>popularity</span>, <span>freshness</span>, and <span>coherence</span> are removed in the ablation study, highlights their significance as among the strongest indicators for users’ future interest, particularly for low-active users. (3) although temporal features show a clear positive impact across user groups with varying levels of activity (resulting in a 4.5% decrease in NDCG and a 7.3% decrease in MAP when removed in the ablation study), the temporal topical features do not demonstrate a significant positive effect, and 4) The removal of user-specific characteristics such as <span>influence</span> and <span>personality traits</span> in the ablation study reveals their significant impact in predicting future interest over <em>cold</em> topics, reflected by a 5.49% decrease in NDCG and a 5.72% decrease in MAP. Our findings make significant contributions to the field of future interest prediction, offering valuable insights and practical implications for various applications in social network analysis.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141072779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Algorithm design and performance evaluation of sparse induced suffix sorting 稀疏诱导后缀排序的算法设计和性能评估
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-21 DOI: 10.1016/j.ipm.2024.103777
Wenbo Wu , Ge Nong

Sorting any m target suffixes of an input string X of n characters from a constant alphabet is a key task for building the sparse suffix array SSA(X) for index construction. A number of probabilistic and deterministic algorithms have been proposed for sorting sparse suffixes with varying time and space complexities, but only some experimental results are available for performance evaluation of these algorithms. We design a divide-and-conquer algorithm called sSAIS for computing SSA(X) in O(nlogmlog(n/m)) time and O(m) workspace by using the induced sorting principle, and conduct an experimental performance study on real and artificial datasets. This work reveals that to design an efficient deterministic algorithm for sorting sparse suffixes is a tough challenge and the density of target suffixes might be considered as a critical design parameter.

对输入字符串 X 的任意 m 个目标后缀进行排序是建立稀疏后缀数组 SSA(X) 以构建索引的关键任务。目前已经提出了许多用于稀疏后缀排序的概率和确定性算法,其时间和空间复杂度各不相同,但这些算法的性能评估只有一些实验结果。我们设计了一种名为 sSAIS 的分而治之算法,利用诱导排序原理在 O(nlogmlog(n/m)) 时间和 O(m) 工作空间内计算 SSA(X),并在真实数据集和人工数据集上进行了性能实验研究。这项工作揭示了设计一种高效的确定性算法对稀疏后缀进行排序是一项艰巨的挑战,目标后缀的密度可能被视为一个关键的设计参数。
{"title":"Algorithm design and performance evaluation of sparse induced suffix sorting","authors":"Wenbo Wu ,&nbsp;Ge Nong","doi":"10.1016/j.ipm.2024.103777","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103777","url":null,"abstract":"<div><p>Sorting any <span><math><mi>m</mi></math></span> target suffixes of an input string <span><math><mi>X</mi></math></span> of <span><math><mi>n</mi></math></span> characters from a constant alphabet is a key task for building the sparse suffix array <span><math><mrow><mi>S</mi><mi>S</mi><mi>A</mi><mrow><mo>(</mo><mi>X</mi><mo>)</mo></mrow></mrow></math></span> for index construction. A number of probabilistic and deterministic algorithms have been proposed for sorting sparse suffixes with varying time and space complexities, but only some experimental results are available for performance evaluation of these algorithms. We design a divide-and-conquer algorithm called sSAIS for computing <span><math><mrow><mi>S</mi><mi>S</mi><mi>A</mi><mrow><mo>(</mo><mi>X</mi><mo>)</mo></mrow></mrow></math></span> in <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>n</mi><mo>log</mo><mi>m</mi><mo>log</mo><mrow><mo>(</mo><mi>n</mi><mo>/</mo><mi>m</mi><mo>)</mo></mrow><mo>)</mo></mrow></mrow></math></span> time and <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mi>m</mi><mo>)</mo></mrow></mrow></math></span> workspace by using the induced sorting principle, and conduct an experimental performance study on real and artificial datasets. This work reveals that to design an efficient deterministic algorithm for sorting sparse suffixes is a tough challenge and the density of target suffixes might be considered as a critical design parameter.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141072871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nice to meet images with Big Clusters and Features: A cluster-weighted multi-modal co-clustering method 很高兴见到具有大聚类和特征的图像:集群加权多模态共聚方法
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-19 DOI: 10.1016/j.ipm.2024.103735
Chaoyang Zhang , Hang Xue , Kai Nie , Xihui Wu , Zhengzheng Lou , Shouyi Yang , Qinglei Zhou , Shizhe Hu

Multi-modal image clustering focuses on exploring and exploiting related information across various modals of input images to obtain clear image cluster patterns. Recent multi-modal/view clustering methods have shown promising performance in solving the image clustering problem. However, most existing methods fail to properly deal with the multi-modal image data with massive number of clusters and high dimensionality of features in real-world applications, such as image retrieval, multi-modal autonomous driving perception and industrial automation. We call this problem “Big Clusters and Features” for short just as “Big Data” for large number of samples. To fix this challenging problem, we in this article design a general multi-modal image clustering framework which integrates cluster-wise weight learning, feature learning, and clustering structure learning. Under this framework, we further propose a new Cluster-weighted Multi-modal Information Bottleneck Co-clustering (CMIBC) method, and it could effectively measure image cluster importance information and discriminative features of each modal for obtaining satisfactory image clustering performance. Unlike existing cluster weight learning methods only considering intra-cluster similarity or cross-cluster dissimilarity, we design a novel cluster-wise weight learning strategy by jointly considering and enjoying the best of both worlds. Many carefully designed experiments on various multi-modal image datasets with big clusters and features reveal the competitive advantages of the CMIBC algorithm over lots of compared single/multi-modal clustering methods, particularly the notable improvement of 3.12% and 5.28% on the Leaves Plant Species dataset in terms of accuracy and normalized mutual information. Owing to the promising performance, the proposed CMIBC may be extended to many other practical applications, e.g., multi-modal medical analysis and video recognition.

多模态图像聚类侧重于探索和利用输入图像的各种模态之间的相关信息,以获得清晰的图像聚类模式。最近的多模态/视图聚类方法在解决图像聚类问题上表现出了良好的性能。然而,在图像检索、多模态自动驾驶感知和工业自动化等实际应用中,大多数现有方法都无法正确处理具有海量聚类和高维度特征的多模态图像数据。我们将这一问题称为 "大聚类和特征"(Big Clusters and Features),正如大量样本的 "大数据"(Big Data)一样。为了解决这个具有挑战性的问题,我们在本文中设计了一个通用的多模态图像聚类框架,它集成了聚类权重学习、特征学习和聚类结构学习。在此框架下,我们进一步提出了一种新的聚类加权多模态信息瓶颈协同聚类(CMIBC)方法,该方法能有效衡量图像聚类的重要性信息和各模态的判别特征,从而获得令人满意的图像聚类性能。与现有的只考虑簇内相似性或簇间不相似性的簇权重学习方法不同,我们设计了一种新颖的簇权重学习策略,共同考虑并兼顾两者的优点。在各种多模态图像数据集上进行的大量精心设计的实验表明,CMIBC 算法与大量单模态/多模态聚类方法相比具有竞争优势,特别是在植物物种数据集上,CMIBC 算法在准确率和归一化互信息方面分别提高了 3.12% 和 5.28%。鉴于其良好的性能,所提出的 CMIBC 可以推广到许多其他实际应用中,例如多模态医疗分析和视频识别。
{"title":"Nice to meet images with Big Clusters and Features: A cluster-weighted multi-modal co-clustering method","authors":"Chaoyang Zhang ,&nbsp;Hang Xue ,&nbsp;Kai Nie ,&nbsp;Xihui Wu ,&nbsp;Zhengzheng Lou ,&nbsp;Shouyi Yang ,&nbsp;Qinglei Zhou ,&nbsp;Shizhe Hu","doi":"10.1016/j.ipm.2024.103735","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103735","url":null,"abstract":"<div><p>Multi-modal image clustering focuses on exploring and exploiting related information across various modals of input images to obtain clear image cluster patterns. Recent multi-modal/view clustering methods have shown promising performance in solving the image clustering problem. However, most existing methods fail to properly deal with the multi-modal image data with massive number of clusters and high dimensionality of features in real-world applications, such as image retrieval, multi-modal autonomous driving perception and industrial automation. We call this problem “Big Clusters and Features” for short just as “Big Data” for large number of samples. To fix this challenging problem, we in this article design a general multi-modal image clustering framework which integrates cluster-wise weight learning, feature learning, and clustering structure learning. Under this framework, we further propose a new Cluster-weighted Multi-modal Information Bottleneck Co-clustering (CMIBC) method, and it could effectively measure image cluster importance information and discriminative features of each modal for obtaining satisfactory image clustering performance. Unlike existing cluster weight learning methods only considering intra-cluster similarity or cross-cluster dissimilarity, we design a novel cluster-wise weight learning strategy by jointly considering and enjoying the best of both worlds. Many carefully designed experiments on various multi-modal image datasets with big clusters and features reveal the competitive advantages of the CMIBC algorithm over lots of compared single/multi-modal clustering methods, particularly the notable improvement of 3.12% and 5.28% on the Leaves Plant Species dataset in terms of accuracy and normalized mutual information. Owing to the promising performance, the proposed CMIBC may be extended to many other practical applications, e.g., multi-modal medical analysis and video recognition.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141067674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying influential nodes in complex networks via Transformer 通过变压器识别复杂网络中的有影响力节点
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-17 DOI: 10.1016/j.ipm.2024.103775
Leiyang Chen , Ying Xi , Liang Dong , Manjun Zhao , Chenliang Li , Xiao Liu , Xiaohui Cui

In the domain of complex networks, the identification of influential nodes plays a crucial role in ensuring network stability and facilitating efficient information dissemination. Although the study of influential nodes has been applied in many fields such as suppression of rumor spreading, regulation of group behavior, and prediction of mass events evolution, current deep learning-based algorithms have limited input features and are incapable of aggregating neighbor information of nodes, thus failing to adapt to complex networks. We propose an influential node identification method in complex networks based on the Transformer. In this method, the input sequence of a node includes information about the node itself and its neighbors, enabling the model to effectively aggregate node information to identify its influence. Experiments were conducted on 9 synthetic networks and 12 real networks. Using the SIR model and a benchmark method to verify the effectiveness of our approach. The experimental results show that this method can more effectively identify influential nodes in complex networks. In particular, the method improves 27 percent compared to the second place method in network Netscience and 21 percent in network Faa.

在复杂网络领域,识别有影响力的节点对确保网络稳定和促进高效信息传播起着至关重要的作用。尽管对有影响力节点的研究已被应用于抑制谣言传播、规范群体行为、预测群体事件演化等多个领域,但目前基于深度学习的算法输入特征有限,无法聚合节点的邻居信息,因而无法适应复杂网络。我们提出了一种基于变压器的复杂网络中具有影响力的节点识别方法。在这种方法中,节点的输入序列包括节点本身及其邻居的信息,从而使模型能够有效地聚合节点信息以识别其影响力。我们在 9 个合成网络和 12 个真实网络上进行了实验。使用 SIR 模型和基准方法来验证我们方法的有效性。实验结果表明,这种方法能更有效地识别复杂网络中具有影响力的节点。特别是在网络 Netscience 中,与排名第二的方法相比,该方法提高了 27%,在网络 Faa 中提高了 21%。
{"title":"Identifying influential nodes in complex networks via Transformer","authors":"Leiyang Chen ,&nbsp;Ying Xi ,&nbsp;Liang Dong ,&nbsp;Manjun Zhao ,&nbsp;Chenliang Li ,&nbsp;Xiao Liu ,&nbsp;Xiaohui Cui","doi":"10.1016/j.ipm.2024.103775","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103775","url":null,"abstract":"<div><p>In the domain of complex networks, the identification of influential nodes plays a crucial role in ensuring network stability and facilitating efficient information dissemination. Although the study of influential nodes has been applied in many fields such as suppression of rumor spreading, regulation of group behavior, and prediction of mass events evolution, current deep learning-based algorithms have limited input features and are incapable of aggregating neighbor information of nodes, thus failing to adapt to complex networks. We propose an influential node identification method in complex networks based on the Transformer. In this method, the input sequence of a node includes information about the node itself and its neighbors, enabling the model to effectively aggregate node information to identify its influence. Experiments were conducted on 9 synthetic networks and 12 real networks. Using the SIR model and a benchmark method to verify the effectiveness of our approach. The experimental results show that this method can more effectively identify influential nodes in complex networks. In particular, the method improves 27 percent compared to the second place method in network Netscience and 21 percent in network Faa.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141067675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How far is reality from vision: An online data-driven method for brand image assessment and maintenance 现实与愿景有多远?品牌形象评估与维护的在线数据驱动方法
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-16 DOI: 10.1016/j.ipm.2024.103769
Xiaoyan Jiang , Jie Lin , Chao Wang , Lixin Zhou

Brand image assessment and maintenance are essential for branding. This paper constructs an online data-driven quantitative brand image assessment method using the classic brand image theory as a conceptual model. The method is organized as follows. First, with domain expert knowledge and deep learning, this paper constructs a task ontology to clearly describe the brand image constituent content, constituent relationship, properties, and property values. Then, using the task ontology as a priori knowledge, we identify the content of brand associations from User-generated Content (UGC) and Firm-generated content (FGC), respectively, and calculate the associations’ favorability, strength and uniqueness; classify brand associations into three categories: functional, experiential, and symbolic to achieve a dual-perspectives (consumer perceptions & corporate claims) brand image assessment. Finally, this study compares the dual-perspective brand images from the components and benefits to construct a brand image communication and maintenance strategy. The development and validation of the methodology take the Chinese New Energy Vehicle (NEV) market as the analysis object. The proposed dual-perspective brand image quantitative assessment model is a new development of brand image evaluation and maintenance theoretical method in the digital era. It is also a practical tool for brand management in enterprises.

品牌形象的评估和维护对品牌建设至关重要。本文以经典的品牌形象理论为概念模型,构建了一种在线数据驱动的量化品牌形象评估方法。该方法的组织结构如下。首先,本文利用领域专家知识和深度学习构建了一个任务本体,清晰地描述了品牌形象的构成内容、构成关系、属性和属性值。然后,以任务本体为先验知识,分别从用户生成内容(UGC)和企业生成内容(FGC)中识别品牌联想的内容,计算联想的好感度、强度和独特性;将品牌联想分为功能性、体验性和象征性三类,实现双视角(消费者感知& 企业主张)品牌形象评估。最后,本研究将对双视角品牌形象的构成要素和优势进行比较,以构建品牌形象传播和维护策略。该方法的开发和验证以中国新能源汽车(NEV)市场为分析对象。所提出的双视角品牌形象量化评估模型是数字时代品牌形象评估与维护理论方法的新发展。它也是企业品牌管理的实用工具。
{"title":"How far is reality from vision: An online data-driven method for brand image assessment and maintenance","authors":"Xiaoyan Jiang ,&nbsp;Jie Lin ,&nbsp;Chao Wang ,&nbsp;Lixin Zhou","doi":"10.1016/j.ipm.2024.103769","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103769","url":null,"abstract":"<div><p>Brand image assessment and maintenance are essential for branding. This paper constructs an online data-driven quantitative brand image assessment method using the classic brand image theory as a conceptual model. The method is organized as follows. First, with domain expert knowledge and deep learning, this paper constructs a task ontology to clearly describe the brand image constituent content, constituent relationship, properties, and property values. Then, using the task ontology as a priori knowledge, we identify the content of brand associations from User-generated Content (UGC) and Firm-generated content (FGC), respectively, and calculate the associations’ favorability, strength and uniqueness; classify brand associations into three categories: functional, experiential, and symbolic to achieve a dual-perspectives (consumer perceptions &amp; corporate claims) brand image assessment. Finally, this study compares the dual-perspective brand images from the components and benefits to construct a brand image communication and maintenance strategy. The development and validation of the methodology take the Chinese New Energy Vehicle (NEV) market as the analysis object. The proposed dual-perspective brand image quantitative assessment model is a new development of brand image evaluation and maintenance theoretical method in the digital era. It is also a practical tool for brand management in enterprises.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Map retrieval intention recognition based on relevance feedback and geographic semantic guidance: For better understanding user retrieval demands 基于相关性反馈和地理语义引导的地图检索意图识别:更好地理解用户检索需求
IF 8.6 1区 管理学 Q1 Engineering Pub Date : 2024-05-11 DOI: 10.1016/j.ipm.2024.103767
Zhipeng Gui , Xinjie Liu , Anqi Zhao , Yuhan Jiang , Zhipeng Ling , Xiaohui Hu , Fa Li , Zelong Yang , Huayi Wu , Shuangming Zhao

Effective retrieval is essential for finding resources in demand handily amidst extensive data records in data warehouse. Mainstream map retrieval methods suffer from intention gap problem and are incapable to describe sophisticated user demands precisely due to the limits of low- and middle-level text or visual feature matching, resulting in unsatisfactory retrieval results. Such limitations are more marked when map retrieval demands were characterized with joint constraints of geographic concepts. To address this issue, we propose a map retrieval intention recognition method to perceive user demands with relevance feedback samples and geographic semantics guidance. Specifically, we construct a hierarchical intention expression model to describe retrieval goals and their multi-dimensional attribute constrains; incorporate geographic ontologies to provide semantic guidance and facilitate recognition; utilize the frequent itemset mining (FIM) algorithm Apriori to generate intention candidates from relevance feedback samples, and search for the optimal intention set by adopting the minimum description length (MDL) principle. The experiments verify the effectiveness of Apriori algorithm and MDL principle on intention recognition. The proposed method outperforms the FIM algorithm Gene Ontology (RuleGO) and the Decision Tree algorithm with Hierarchical Features (DTHF) with higher recognition accuracy and noise tolerance. Furthermore, through our sample augmentation strategy, the method yields promising recognition accuracy even when the feedback sample size is as low as ten, substantially reducing the feedback burden in human-computer interactions. We envision that the application of our method in spatial data infrastructures (SDIs), such as geoportals and catalogue services, could enhance the quality of service and user experience in geospatial data discovery.

要在数据仓库的大量数据记录中方便地找到所需的资源,有效的检索至关重要。主流的地图检索方法存在意图差距问题,由于中低层次文本或视觉特征匹配的限制,无法精确描述复杂的用户需求,导致检索结果不尽人意。当地图检索需求以地理概念的联合约束为特征时,这种局限性就更加明显。针对这一问题,我们提出了一种地图检索意图识别方法,通过相关性反馈样本和地理语义指导来感知用户需求。具体来说,我们构建了一个分层意图表达模型来描述检索目标及其多维属性约束;结合地理本体来提供语义指导并促进识别;利用频繁项集挖掘(FIM)算法 Apriori 从相关性反馈样本中生成候选意图,并通过最小描述长度(MDL)原则来搜索最佳意图集。实验验证了 Apriori 算法和 MDL 原则在意图识别上的有效性。所提出的方法优于基因本体(RuleGO)的 FIM 算法和带层次特征的决策树算法(DTHF),具有更高的识别准确率和噪声容忍度。此外,通过我们的样本增强策略,该方法即使在反馈样本量低至 10 个的情况下也能获得可喜的识别准确率,从而大大减轻了人机交互中的反馈负担。我们设想,将我们的方法应用于空间数据基础设施(SDI),如地理门户和目录服务,可以提高地理空间数据发现的服务质量和用户体验。
{"title":"Map retrieval intention recognition based on relevance feedback and geographic semantic guidance: For better understanding user retrieval demands","authors":"Zhipeng Gui ,&nbsp;Xinjie Liu ,&nbsp;Anqi Zhao ,&nbsp;Yuhan Jiang ,&nbsp;Zhipeng Ling ,&nbsp;Xiaohui Hu ,&nbsp;Fa Li ,&nbsp;Zelong Yang ,&nbsp;Huayi Wu ,&nbsp;Shuangming Zhao","doi":"10.1016/j.ipm.2024.103767","DOIUrl":"https://doi.org/10.1016/j.ipm.2024.103767","url":null,"abstract":"<div><p>Effective retrieval is essential for finding resources in demand handily amidst extensive data records in data warehouse. Mainstream map retrieval methods suffer from intention gap problem and are incapable to describe sophisticated user demands precisely due to the limits of low- and middle-level text or visual feature matching, resulting in unsatisfactory retrieval results. Such limitations are more marked when map retrieval demands were characterized with joint constraints of geographic concepts. To address this issue, we propose a map retrieval intention recognition method to perceive user demands with relevance feedback samples and geographic semantics guidance. Specifically, we construct a hierarchical intention expression model to describe retrieval goals and their multi-dimensional attribute constrains; incorporate geographic ontologies to provide semantic guidance and facilitate recognition; utilize the frequent itemset mining (FIM) algorithm Apriori to generate intention candidates from relevance feedback samples, and search for the optimal intention set by adopting the minimum description length (MDL) principle. The experiments verify the effectiveness of Apriori algorithm and MDL principle on intention recognition. The proposed method outperforms the FIM algorithm Gene Ontology (RuleGO) and the Decision Tree algorithm with Hierarchical Features (DTHF) with higher recognition accuracy and noise tolerance. Furthermore, through our sample augmentation strategy, the method yields promising recognition accuracy even when the feedback sample size is as low as ten, substantially reducing the feedback burden in human-computer interactions. We envision that the application of our method in spatial data infrastructures (SDIs), such as geoportals and catalogue services, could enhance the quality of service and user experience in geospatial data discovery.</p></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":null,"pages":null},"PeriodicalIF":8.6,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306457324001274/pdfft?md5=2fea28aa3a710d592505cbf5802085c8&pid=1-s2.0-S0306457324001274-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140905472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Information Processing & Management
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1