首页 > 最新文献

Knowledge-Based Systems最新文献

英文 中文
Knowledge-driven nodes selection with modified LASSO regularization for neural network modeling in economic data forecasting 基于改进LASSO正则化的知识驱动节点选择神经网络经济数据预测模型
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-05 DOI: 10.1016/j.knosys.2026.115273
Wisnowan Hendy Saputra , Dedy Dwi Prastyo , Kartika Fithriasari
The complexity of machine learning models tends to be accompanied by regularization, which is expected to decrease. The complexity of neural network models is unique in that they simultaneously face the challenge of gradient-based estimation. Regularization with L1-norm-based shrinkage is more popular because it can select model parameters, but it has been proven that it cannot be effectively applied to gradient-based estimation. This study proposes the development of shrinkage regularization (LASSO) by modifying the gradient to maintain its effectiveness in gradient-based estimation, which is referred to as modified LASSO (mLASSO). The mLASSO regularization is developed based on knowledge-driven to perform nodes selection. The mLASSO regularization allows for partially or fully unselection of a node based on the connections it makes. Our simulation study demonstrates the consistency of mLASSO regularization, which remains unaffected by variations in the activation function used, the type of data distribution, or the number of nodes in the hidden layer, which is very important in NN model architecture. It proves its effectiveness in reducing the value of the model goodness criterion. Based on empirical application results, mLASSO regularization is considered capable of overcoming the possibility of overfitting in NN models so that the NN-mLASSO model prediction results are not affected by outliers in the training data. Furthermore, mLASSO regularization has a significant influence on the model's goodness-of-fit criteria. This effectiveness concludes that mLASSO regularization can overcome the possibility of overfitting in NN models and guarantees significantly lower model complexity.
机器学习模型的复杂性往往伴随着正则化,正则化有望降低。神经网络模型的复杂性是独特的,因为它们同时面临基于梯度的估计的挑战。基于l1范数的收缩正则化更受欢迎,因为它可以选择模型参数,但已经证明它不能有效地应用于基于梯度的估计。本研究提出通过修改梯度来保持收缩正则化(LASSO)在基于梯度的估计中的有效性,这被称为改进的LASSO (mLASSO)。基于知识驱动的mLASSO正则化算法进行节点选择。mLASSO正则化允许根据所建立的连接部分或完全取消节点的选择。我们的仿真研究证明了mLASSO正则化的一致性,它不受所使用的激活函数、数据分布类型或隐藏层中节点数量的变化的影响,这在神经网络模型体系结构中非常重要。证明了该方法在降低模型优度准则值方面的有效性。根据经验应用结果,认为mLASSO正则化能够克服NN模型中过拟合的可能性,使NN-mLASSO模型预测结果不受训练数据中离群值的影响。此外,mLASSO正则化对模型的拟合优度标准有显著影响。这种有效性表明,mLASSO正则化可以克服神经网络模型中过拟合的可能性,并保证显著降低模型复杂度。
{"title":"Knowledge-driven nodes selection with modified LASSO regularization for neural network modeling in economic data forecasting","authors":"Wisnowan Hendy Saputra ,&nbsp;Dedy Dwi Prastyo ,&nbsp;Kartika Fithriasari","doi":"10.1016/j.knosys.2026.115273","DOIUrl":"10.1016/j.knosys.2026.115273","url":null,"abstract":"<div><div>The complexity of machine learning models tends to be accompanied by regularization, which is expected to decrease. The complexity of neural network models is unique in that they simultaneously face the challenge of gradient-based estimation. Regularization with L1-norm-based shrinkage is more popular because it can select model parameters, but it has been proven that it cannot be effectively applied to gradient-based estimation. This study proposes the development of shrinkage regularization (LASSO) by modifying the gradient to maintain its effectiveness in gradient-based estimation, which is referred to as modified LASSO (mLASSO). The mLASSO regularization is developed based on knowledge-driven to perform nodes selection. The mLASSO regularization allows for partially or fully unselection of a node based on the connections it makes. Our simulation study demonstrates the consistency of mLASSO regularization, which remains unaffected by variations in the activation function used, the type of data distribution, or the number of nodes in the hidden layer, which is very important in NN model architecture. It proves its effectiveness in reducing the value of the model goodness criterion. Based on empirical application results, mLASSO regularization is considered capable of overcoming the possibility of overfitting in NN models so that the NN-mLASSO model prediction results are not affected by outliers in the training data. Furthermore, mLASSO regularization has a significant influence on the model's goodness-of-fit criteria. This effectiveness concludes that mLASSO regularization can overcome the possibility of overfitting in NN models and guarantees significantly lower model complexity.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115273"},"PeriodicalIF":7.6,"publicationDate":"2026-01-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Few-shot and chain-of-thought prompting for equipment maintenance knowledge graph construction via large language models 基于大型语言模型构建设备维护知识图谱的少镜头和思维链提示
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-03 DOI: 10.1016/j.knosys.2026.115266
Xing Qi , Bo Yang , Shilong Wang , Zhengping Zhang , Yucheng Zhang , Kaze Du
Equipment operation and maintenance (O&M) is fundamental for ensuring the normal operation of equipment, and the accumulation and reuse of knowledge are critical supports for efficient O&M. Knowledge Graphs (KGs), as advanced tools for knowledge storage and management, facilitate intricate knowledge association queries and rapid issue localization, thereby substantially enhancing operational efficiency. However, traditional KG construction methods face limitations such as heavy reliance on extensive manual data annotation, inadequate contextual understanding, and low accuracy in extracting complex relations. To address these challenges, this paper proposes FSCoT-EMKG: a method combining few-shot learning and Chain-of-Thought (CoT) prompting with large language models (LLMs) for constructing KGs in the domain of equipment O&M. This approach integrates CoT and few-shot learning examples into prompts and incorporates relation ontology to enhance LLMs' capabilities in performing high-quality knowledge extraction tasks, thus optimizing KG construction processes. The method assigns the role of knowledge extraction experts to LLMs, clearly defining their responsibilities within the task. Then, the pre-trained model MiniLMv2 is employed to encode O&M knowledge, so as to obtain embeddings that are computed using a Gaussian Mixture Model to assign embeddings to the most probable Gaussian distributions. Each Gaussian distribution is summarized to identify available relations, constructing a relation ontology specific to equipment O&M. A few-shot learning selection mechanism is designed to select high-quality examples, assisting LLMs in understanding knowledge extraction tasks. Finally, a hierarchical adaptive reasoning CoT is introduced, complemented by a scoring and error correction mechanism to verify the rationality of triples, thereby constructing a high-quality KG The proposed method is validated using a new energy vehicle O&M dataset from an automobile company in Chongqing. Experiments show that FSCoT-EMKG achieves an F1 score of 84.73%, outperforming current state-of-the-art KG construction methods and demonstrating its effectiveness in building equipment O&M KGs.
设备运维是保证设备正常运行的基础,知识的积累和重用是高效运维的重要支撑。知识图谱作为知识存储和管理的先进工具,可以简化复杂的知识关联查询和快速定位问题,从而大大提高了操作效率。然而,传统的KG构建方法存在严重依赖大量人工数据标注、上下文理解不足、提取复杂关系的准确性低等局限性。为了解决这些挑战,本文提出了FSCoT-EMKG:一种将少量学习和思维链(CoT)提示与大型语言模型(llm)相结合的方法,用于在设备o&&m领域构建kg。该方法将CoT和few-shot学习示例集成到提示中,并结合关系本体,提高法学硕士执行高质量知识提取任务的能力,从而优化KG构建过程。该方法将知识提取专家的角色分配给法学硕士,明确了他们在任务中的职责。然后,利用预训练模型MiniLMv2对0和M个知识进行编码,得到嵌入,通过高斯混合模型计算得到嵌入,并将嵌入分配给最可能的高斯分布。总结每个高斯分布以识别可用的关系,构建特定于设备O&;M的关系本体。设计了一个few-shot学习选择机制来选择高质量的例子,帮助法学硕士理解知识抽取任务。最后,引入了分层自适应推理模型,并辅以评分和纠错机制来验证三元组的合理性,从而构建了一个高质量的模型。最后,使用重庆某汽车公司的新能源汽车O&;M数据集对该方法进行了验证。实验表明,FSCoT-EMKG的F1得分为84.73%,优于目前最先进的KG施工方法,并证明了其在建造设备O&;M KG时的有效性。
{"title":"Few-shot and chain-of-thought prompting for equipment maintenance knowledge graph construction via large language models","authors":"Xing Qi ,&nbsp;Bo Yang ,&nbsp;Shilong Wang ,&nbsp;Zhengping Zhang ,&nbsp;Yucheng Zhang ,&nbsp;Kaze Du","doi":"10.1016/j.knosys.2026.115266","DOIUrl":"10.1016/j.knosys.2026.115266","url":null,"abstract":"<div><div>Equipment operation and maintenance (O&amp;M) is fundamental for ensuring the normal operation of equipment, and the accumulation and reuse of knowledge are critical supports for efficient O&amp;M. Knowledge Graphs (KGs), as advanced tools for knowledge storage and management, facilitate intricate knowledge association queries and rapid issue localization, thereby substantially enhancing operational efficiency. However, traditional KG construction methods face limitations such as heavy reliance on extensive manual data annotation, inadequate contextual understanding, and low accuracy in extracting complex relations. To address these challenges, this paper proposes FSCoT-EMKG: a method combining few-shot learning and Chain-of-Thought (CoT) prompting with large language models (LLMs) for constructing KGs in the domain of equipment O&amp;M. This approach integrates CoT and few-shot learning examples into prompts and incorporates relation ontology to enhance LLMs' capabilities in performing high-quality knowledge extraction tasks, thus optimizing KG construction processes. The method assigns the role of knowledge extraction experts to LLMs, clearly defining their responsibilities within the task. Then, the pre-trained model MiniLMv2 is employed to encode O&amp;M knowledge, so as to obtain embeddings that are computed using a Gaussian Mixture Model to assign embeddings to the most probable Gaussian distributions. Each Gaussian distribution is summarized to identify available relations, constructing a relation ontology specific to equipment O&amp;M. A few-shot learning selection mechanism is designed to select high-quality examples, assisting LLMs in understanding knowledge extraction tasks. Finally, a hierarchical adaptive reasoning CoT is introduced, complemented by a scoring and error correction mechanism to verify the rationality of triples, thereby constructing a high-quality KG The proposed method is validated using a new energy vehicle O&amp;M dataset from an automobile company in Chongqing. Experiments show that FSCoT-EMKG achieves an F1 score of 84.73%, outperforming current state-of-the-art KG construction methods and demonstrating its effectiveness in building equipment O&amp;M KGs.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"335 ","pages":"Article 115266"},"PeriodicalIF":7.6,"publicationDate":"2026-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ESQAD: A curriculum-aligned dataset for question answer generation in Spanish ESQAD:用于西班牙语问答生成的课程对齐数据集
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.knosys.2025.115255
Carlos Badenes-Olmedo, Paul Eyzaguirre-Barreda, Noa Chu-Artzt, Joaquin Gayoso-Cabada
This paper presents ESQAD (Educational Spanish Question-Answer Dataset), an open-access resource for question-answer generation (QAG) in Spanish aligned with national curricula. ESQAD comprises 6980 expert-curated exam questions, 340 literary comprehension items, 47 legal FAQs, and 913 LLM-generated pairs validated in academic settings. The dataset covers diverse linguistic and cognitive features, including subject classification, question typologies, and Bloom-level complexity. We analyze lexical diversity, cognitive intent, and structure across subsets. The automatically generated portion was created with Bloom-guided prompts and large language models (LLMs), and evaluated through both computational metrics and classroom validation. Results show that cognitive-level control is reliable for factual recall but only partial for higher-order reasoning. All resources, including dataset, code, and validation tool, are publicly available. ESQAD provides a structured benchmark for curriculum-aware QAG in under-resourced languages, with applications in educational NLP, question generation, and adaptive learning.
本文介绍了ESQAD(教育西班牙语问答数据集),这是一个与国家课程一致的西班牙语问答生成(QAG)的开放获取资源。ESQAD包括6980个专家策划的考试问题,340个文学理解项目,47个法律常见问题,以及913个法学硕士生成的在学术环境中验证的对。该数据集涵盖了多种语言和认知特征,包括主题分类、问题类型和bloom级别的复杂性。我们分析了跨子集的词汇多样性、认知意图和结构。自动生成的部分是用bloom引导的提示和大型语言模型(llm)创建的,并通过计算度量和课堂验证进行评估。结果表明,认知水平的控制对事实回忆是可靠的,但对高阶推理只有部分可靠。所有资源,包括数据集、代码和验证工具,都是公开可用的。ESQAD为资源不足语言的课程感知QAG提供了一个结构化的基准,并应用于教育NLP、问题生成和适应性学习。
{"title":"ESQAD: A curriculum-aligned dataset for question answer generation in Spanish","authors":"Carlos Badenes-Olmedo,&nbsp;Paul Eyzaguirre-Barreda,&nbsp;Noa Chu-Artzt,&nbsp;Joaquin Gayoso-Cabada","doi":"10.1016/j.knosys.2025.115255","DOIUrl":"10.1016/j.knosys.2025.115255","url":null,"abstract":"<div><div>This paper presents ESQAD (Educational Spanish Question-Answer Dataset), an open-access resource for question-answer generation (QAG) in Spanish aligned with national curricula. ESQAD comprises 6980 expert-curated exam questions, 340 literary comprehension items, 47 legal FAQs, and 913 LLM-generated pairs validated in academic settings. The dataset covers diverse linguistic and cognitive features, including subject classification, question typologies, and Bloom-level complexity. We analyze lexical diversity, cognitive intent, and structure across subsets. The automatically generated portion was created with Bloom-guided prompts and large language models (LLMs), and evaluated through both computational metrics and classroom validation. Results show that cognitive-level control is reliable for factual recall but only partial for higher-order reasoning. All resources, including dataset, code, and validation tool, are publicly available. ESQAD provides a structured benchmark for curriculum-aware QAG in under-resourced languages, with applications in educational NLP, question generation, and adaptive learning.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"335 ","pages":"Article 115255"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge-driven approach for automated fire safety compliance checking in operational buildings 以知识为导向的方法,自动检查营运楼宇的消防安全合规性
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.knosys.2025.115156
Dayou Chen , Long Chen , Yi Yang , Qiuchen Lu , Craig Hancock , Russell Lock , Simon Sølvsten
Building fire safety compliance remains a critical but labor-intensive task, particularly during the operational phase when new hazards can arise from post-occupancy modifications, equipment degradation, or improper use of space. Existing automated compliance checking (ACC) methods have primarily focused on the design phase and rely on static BIM data, offering limited adaptability to dynamic, in-use conditions. This study presents a formalized, knowledge-based approach for automating fire safety compliance monitoring during building operation. The proposed approach integrates heterogeneous data, including building layouts, in-situ images, and regulatory clauses, within a unified reasoning architecture combining multimodal perception, semantic integration, and rule-based inference. Comprehensive experiments across three compliance coverage categories validated the perception-reasoning pipeline, achieving high accuracy in layout extraction (pixel accuracy = 0.90) and safety asset detection (mAP = 0.91). The domain-adapted Fire Compliance VQA further achieved notable improvements in compliance description accuracy compared with a generic vision-language baseline across BLEU and ROUGE metrics. The results confirm the feasibility of translating observational evidence into clause-grounded compliance decisions. This study extends ACC into the operational phase and establishes a foundation for automated, evidence-based compliance monitoring across dynamic building environments.
建筑消防安全合规仍然是一项关键的劳动密集型任务,特别是在运营阶段,当入住后的修改、设备退化或空间使用不当可能产生新的危险时。现有的自动化合规性检查(ACC)方法主要集中在设计阶段,依赖于静态BIM数据,对动态使用条件的适应性有限。本研究提出了一种形式化的、以知识为基础的方法,用于在建筑物运行过程中自动化消防安全合规监测。该方法将异构数据(包括建筑布局、原位图像和监管条款)集成到一个统一的推理架构中,该架构结合了多模态感知、语义集成和基于规则的推理。三个合规覆盖类别的综合实验验证了感知-推理管道,在布局提取(像素精度= 0.90)和安全资产检测(mAP = 0.91)方面取得了较高的准确性。与跨BLEU和ROUGE度量的通用视觉语言基线相比,适用于领域的Fire Compliance VQA进一步在符合性描述准确性方面取得了显著的改进。结果证实了将观测证据转化为基于条款的合规决策的可行性。该研究将ACC扩展到操作阶段,并为跨动态建筑环境的自动化、循证合规监控建立了基础。
{"title":"A knowledge-driven approach for automated fire safety compliance checking in operational buildings","authors":"Dayou Chen ,&nbsp;Long Chen ,&nbsp;Yi Yang ,&nbsp;Qiuchen Lu ,&nbsp;Craig Hancock ,&nbsp;Russell Lock ,&nbsp;Simon Sølvsten","doi":"10.1016/j.knosys.2025.115156","DOIUrl":"10.1016/j.knosys.2025.115156","url":null,"abstract":"<div><div>Building fire safety compliance remains a critical but labor-intensive task, particularly during the operational phase when new hazards can arise from post-occupancy modifications, equipment degradation, or improper use of space. Existing automated compliance checking (ACC) methods have primarily focused on the design phase and rely on static BIM data, offering limited adaptability to dynamic, in-use conditions. This study presents a formalized, knowledge-based approach for automating fire safety compliance monitoring during building operation. The proposed approach integrates heterogeneous data, including building layouts, in-situ images, and regulatory clauses, within a unified reasoning architecture combining multimodal perception, semantic integration, and rule-based inference. Comprehensive experiments across three compliance coverage categories validated the perception-reasoning pipeline, achieving high accuracy in layout extraction (pixel accuracy = 0.90) and safety asset detection (mAP = 0.91). The domain-adapted Fire Compliance VQA further achieved notable improvements in compliance description accuracy compared with a generic vision-language baseline across BLEU and ROUGE metrics. The results confirm the feasibility of translating observational evidence into clause-grounded compliance decisions. This study extends ACC into the operational phase and establishes a foundation for automated, evidence-based compliance monitoring across dynamic building environments.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115156"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EchoNet: A hierarchical collaborative network for point cloud-based 3D action recognition EchoNet:基于点云的三维动作识别的分层协作网络
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.knosys.2025.115257
Guojia Huang , Zhenjie Hou , Xing Li , Jiuzhen Liang , Xinwen Zhou
Dynamic point clouds provide inherent geometric fidelity for 3D action recognition, yet their unstructured nature makes it challenging to capture complex spatiotemporal patterns. Existing approaches either rely on local neighborhood aggregation, employ explicit spatiotemporal decoupling, or adopt parallel global modeling. However, they often suffer from limited spatiotemporal awareness, fragmented short-term motion continuity, and a lack of hierarchical progression. To address these issues, we propose the hierarchical collaboration hypothesis: effective representations of dynamic point clouds should follow a progressive abstraction from points to regions to the global level, while maintaining semantic consistency across layers. Building on this hypothesis, we introduce EchoNet, a hierarchical collaborative network composed of three complementary modules: the Point Feature Constructor (PFC) for capturing fine-grained geometric details, the Layered Abstraction Synthesizer (LAS) for hierarchical structural abstraction, and the Temporal Context Refiner (TCR) for enhancing cross-frame temporal dependencies. Furthermore, we design a Multi-Scale Regional Channel Attention (MSRCA) module, which adaptively emphasizes critical action regions by integrating positional encoding with multi-regional context. Experiments on NTU RGB+D 60/120, UTD-MHAD, and MSR Action3D demonstrate that EchoNet achieves state-of-the-art or highly competitive performance, exemplified by a top-tier accuracy of 97.07% on MSR Action3D for complex action recognition. The model also proves effective in large-scale scenarios, attaining 84.3% on the challenging NTU RGB+D 120 Cross-Subject benchmark. While performance on the large-scale NTU RGB+D 120 dataset shows the potential for further improvement, our analysis underscores the promise of hierarchical models for building scalable and efficient dynamic point cloud representations.
动态点云为三维动作识别提供了固有的几何保真度,但其非结构化的性质使得捕捉复杂的时空模式具有挑战性。现有方法要么依赖于局部邻域聚合,要么采用显式时空解耦,要么采用并行全局建模。然而,他们经常遭受有限的时空意识,碎片化的短期运动连续性和缺乏等级进展。为了解决这些问题,我们提出了分层协作假设:动态点云的有效表示应该遵循从点到区域再到全局的渐进抽象,同时保持各层之间的语义一致性。基于这一假设,我们介绍了EchoNet,这是一个由三个互补模块组成的分层协作网络:用于捕获细粒度几何细节的点特征构造器(PFC),用于分层结构抽象的分层抽象合成器(LAS),以及用于增强跨帧时间依赖性的时间上下文精炼器(TCR)。此外,我们设计了一个多尺度区域通道注意(MSRCA)模块,该模块通过将位置编码与多区域上下文相结合,自适应强调关键动作区域。在NTU RGB+D 60/120, UTD-MHAD和MSR Action3D上的实验表明,EchoNet达到了最先进或极具竞争力的性能,例如在MSR Action3D上复杂动作识别的顶级准确率为97.07%。该模型在大规模场景中也被证明是有效的,在具有挑战性的NTU RGB+ d120跨学科基准测试中达到84.3%。虽然大规模NTU RGB+D 120数据集的性能显示出进一步改进的潜力,但我们的分析强调了构建可扩展和高效动态点云表示的分层模型的前景。
{"title":"EchoNet: A hierarchical collaborative network for point cloud-based 3D action recognition","authors":"Guojia Huang ,&nbsp;Zhenjie Hou ,&nbsp;Xing Li ,&nbsp;Jiuzhen Liang ,&nbsp;Xinwen Zhou","doi":"10.1016/j.knosys.2025.115257","DOIUrl":"10.1016/j.knosys.2025.115257","url":null,"abstract":"<div><div>Dynamic point clouds provide inherent geometric fidelity for 3D action recognition, yet their unstructured nature makes it challenging to capture complex spatiotemporal patterns. Existing approaches either rely on local neighborhood aggregation, employ explicit spatiotemporal decoupling, or adopt parallel global modeling. However, they often suffer from limited spatiotemporal awareness, fragmented short-term motion continuity, and a lack of hierarchical progression. To address these issues, we propose the hierarchical collaboration hypothesis: effective representations of dynamic point clouds should follow a progressive abstraction from points to regions to the global level, while maintaining semantic consistency across layers. Building on this hypothesis, we introduce EchoNet, a hierarchical collaborative network composed of three complementary modules: the Point Feature Constructor (PFC) for capturing fine-grained geometric details, the Layered Abstraction Synthesizer (LAS) for hierarchical structural abstraction, and the Temporal Context Refiner (TCR) for enhancing cross-frame temporal dependencies. Furthermore, we design a Multi-Scale Regional Channel Attention (MSRCA) module, which adaptively emphasizes critical action regions by integrating positional encoding with multi-regional context. Experiments on NTU RGB+D 60/120, UTD-MHAD, and MSR Action3D demonstrate that EchoNet achieves state-of-the-art or highly competitive performance, exemplified by a top-tier accuracy of 97.07% on MSR Action3D for complex action recognition. The model also proves effective in large-scale scenarios, attaining 84.3% on the challenging NTU RGB+D 120 Cross-Subject benchmark. While performance on the large-scale NTU RGB+D 120 dataset shows the potential for further improvement, our analysis underscores the promise of hierarchical models for building scalable and efficient dynamic point cloud representations.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115257"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IGCDet: Independence guided co-training for sparsely annotated object detection IGCDet:稀疏注释对象检测的独立指导协同训练
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-02 DOI: 10.1016/j.knosys.2025.115217
Jian-Xun Mi , Jiahui Feng , Haiyang Wang , Yanjun Wu , Ranzhi Zhao , Chang Liu
Object detection models can achieve excellent detection performance with fully annotated instances. However, requiring complete annotations for every dataset is impractical due to high labor and time costs, as well as the inevitable occurrence of missing annotations. As a result, the absence of annotations can potentially provide misleading supervision and harm the training process. Recent methodologies have achieved remarkable effectiveness through the application of Co-Mining. However, the independence of each branch in Co-Mining cannot be guaranteed, overlooking valuable information during multi-perspective training. To address this issue, we introduce an Independence Guided Co-Training Model (IGCDet) that leverages Image Independence Decomposition to ensure the independence of each co-training branch. This model aims to capture diverse perspectives from images as extensively as possible, identifying missing annotations and incorporating them as positive supervision in the training process. Additionally, we propose the use of Joint-Confidence, derived from the combination of classification and regression, as pseudo-label scores, effectively mitigating issues associated with pseudo-label bias. Extensive experiments have verified the effectiveness of the proposed method.
对象检测模型可以在具有完全注释的实例的情况下获得优异的检测性能。然而,要求每个数据集都有完整的注释是不切实际的,因为这需要耗费大量的人力和时间,并且不可避免地会出现注释缺失的情况。因此,缺少注释可能会提供误导性的监督并损害训练过程。最近的方法通过联合挖掘的应用取得了显著的效果。然而,在联合挖掘中,每个分支的独立性无法保证,忽略了多视角培训中有价值的信息。为了解决这个问题,我们引入了一个独立引导的协同训练模型(IGCDet),该模型利用图像独立分解来确保每个协同训练分支的独立性。该模型旨在尽可能广泛地从图像中捕获不同的视角,识别缺失的注释并将其作为训练过程中的积极监督。此外,我们建议使用联合置信度,从分类和回归的结合中得出,作为伪标签分数,有效地减轻与伪标签偏差相关的问题。大量的实验验证了该方法的有效性。
{"title":"IGCDet: Independence guided co-training for sparsely annotated object detection","authors":"Jian-Xun Mi ,&nbsp;Jiahui Feng ,&nbsp;Haiyang Wang ,&nbsp;Yanjun Wu ,&nbsp;Ranzhi Zhao ,&nbsp;Chang Liu","doi":"10.1016/j.knosys.2025.115217","DOIUrl":"10.1016/j.knosys.2025.115217","url":null,"abstract":"<div><div>Object detection models can achieve excellent detection performance with fully annotated instances. However, requiring complete annotations for every dataset is impractical due to high labor and time costs, as well as the inevitable occurrence of missing annotations. As a result, the absence of annotations can potentially provide misleading supervision and harm the training process. Recent methodologies have achieved remarkable effectiveness through the application of Co-Mining. However, the independence of each branch in Co-Mining cannot be guaranteed, overlooking valuable information during multi-perspective training. To address this issue, we introduce an Independence Guided Co-Training Model (IGCDet) that leverages Image Independence Decomposition to ensure the independence of each co-training branch. This model aims to capture diverse perspectives from images as extensively as possible, identifying missing annotations and incorporating them as positive supervision in the training process. Additionally, we propose the use of Joint-Confidence, derived from the combination of classification and regression, as pseudo-label scores, effectively mitigating issues associated with pseudo-label bias. Extensive experiments have verified the effectiveness of the proposed method.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115217"},"PeriodicalIF":7.6,"publicationDate":"2026-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145981032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Emergent in–place, comparison–based sorting in deep Q-networks 深度q网络中基于比较的紧急就地排序
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1016/j.knosys.2025.115254
Koki Shiga, Kanta Ozawa, Koichi Yamazaki
This study investigates how a Deep Q-Network (DQN) learns to sort in a very simple environment. Sorting algorithms can be classified in two ways: in-place or out-of-place, and comparison-based or non-comparison-based. In this study, we focus on in-place, comparison-based sorting algorithms. This group includes basic algorithms such as bubble sort, selection sort, and insertion sort. To reflect the key operations of these algorithms, the environment was designed to be minimal. It provided only two possible actions: “pair selection” (choosing two objects to compare) and “swap” (exchanging them when they are out of order). The agent did not decide when sorting was complete; instead, the environment ended the episode automatically. This simple design raises two key questions: can the agent learn in such a restricted setting, and if so, what kind of sorting algorithm does it learn? Our experiments show that the DQN learns to sort by repeatedly comparing and swapping adjacent pairs. In each training run, it discovers a specific order of comparisons, and following this order makes it possible to sort the input. This behavior is similar to bubble sort. The results show that reinforcement learning agents can develop sorting strategies under the simplest possible conditions.
本研究探讨了深度q网络(Deep Q-Network, DQN)如何在一个非常简单的环境中学习排序。排序算法可以以两种方式分类:就地或非就地,基于比较或非基于比较。在这项研究中,我们关注的是基于比较的就地排序算法。这一组包括基本算法,如冒泡排序、选择排序和插入排序。为了反映这些算法的关键操作,环境被设计成最小的。它只提供了两种可能的操作:“配对选择”(选择两个对象进行比较)和“交换”(当它们无序时交换它们)。代理没有决定分拣何时完成;相反,环境自动结束了这一集。这个简单的设计提出了两个关键问题:智能体能否在这样一个受限的环境中学习,如果可以,它学习什么样的排序算法?我们的实验表明,DQN通过反复比较和交换相邻对来学习排序。在每次训练运行中,它发现一个特定的比较顺序,按照这个顺序可以对输入进行排序。这种行为类似于冒泡排序。结果表明,强化学习智能体可以在最简单的条件下制定排序策略。
{"title":"Emergent in–place, comparison–based sorting in deep Q-networks","authors":"Koki Shiga,&nbsp;Kanta Ozawa,&nbsp;Koichi Yamazaki","doi":"10.1016/j.knosys.2025.115254","DOIUrl":"10.1016/j.knosys.2025.115254","url":null,"abstract":"<div><div>This study investigates how a Deep Q-Network (DQN) learns to sort in a very simple environment. Sorting algorithms can be classified in two ways: in-place or out-of-place, and comparison-based or non-comparison-based. In this study, we focus on in-place, comparison-based sorting algorithms. This group includes basic algorithms such as bubble sort, selection sort, and insertion sort. To reflect the key operations of these algorithms, the environment was designed to be minimal. It provided only two possible actions: “pair selection” (choosing two objects to compare) and “swap” (exchanging them when they are out of order). The agent did not decide when sorting was complete; instead, the environment ended the episode automatically. This simple design raises two key questions: can the agent learn in such a restricted setting, and if so, what kind of sorting algorithm does it learn? Our experiments show that the DQN learns to sort by repeatedly comparing and swapping adjacent pairs. In each training run, it discovers a specific order of comparisons, and following this order makes it possible to sort the input. This behavior is similar to bubble sort. The results show that reinforcement learning agents can develop sorting strategies under the simplest possible conditions.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"335 ","pages":"Article 115254"},"PeriodicalIF":7.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal Motif-Aware time series modeling via subsequence dynamics graph structures 基于子序列动态图结构的时间序列感知建模
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 DOI: 10.1016/j.knosys.2025.115208
Yakun Wang , Yisheng Zou , Gang Wang
Subsequence-level structure in time series provides a compact and expressive basis for representation learning, yet most existing GNN-based approaches operate at the whole-sequence level, extracting local or static features or modeling only inter-series dependencies while failing to explicitly capture such recurring temporal patterns within individual sequences. To address this, we propose the Temporal Motif-aware Graph Attention Network (TMA-GAT), which reformulates sequence modeling as a temporal motif-aware graph representation learning problem. Specifically, raw series are first divided into subsequences and encoded to produce embeddings, upon which a deep clustering module identifies a set of recurring temporal motifs that serve as fundamental subsequence-level units. For each series, we then construct a personalized temporal motif graph whose nodes are temporal motif instances and whose edges describe their temporal transitions and structural dependencies. A graph attention mechanism is applied to model the structural dependencies and temporal transitions among temporal motifs. By explicitly capturing temporal motif-level transitions, TMA-GAT effectively integrates both short- and long-term temporal dependencies into the learned representations. Extensive experiments on two mainstream time series analysis tasks, including long-term forecasting and classification, demonstrate that TMA-GAT consistently outperforms state-of-the-art baselines. Further qualitative and quantitative analyses confirm the effectiveness of our proposed model.
时间序列中的子序列级结构为表示学习提供了紧凑而有表现力的基础,然而大多数现有的基于gnn的方法都是在整个序列级别上操作的,提取局部或静态特征,或者只对序列间的依赖关系进行建模,而不能明确地捕获单个序列中重复出现的时间模式。为了解决这个问题,我们提出了时序感知图注意网络(TMA-GAT),它将序列建模重新表述为时序感知图表示学习问题。具体来说,首先将原始序列划分为子序列并编码以产生嵌入,在此基础上,深度聚类模块识别一组重复出现的时间基序,这些基序作为基本子序列级单元。对于每个序列,我们构建了一个个性化的时间基序图,其节点是时间基序实例,其边描述了它们的时间过渡和结构依赖关系。采用图注意机制对时间基元之间的结构依赖和时间转移进行建模。通过显式捕获时间基元级转换,TMA-GAT有效地将短期和长期时间依赖性集成到学习表征中。在包括长期预测和分类在内的两种主流时间序列分析任务上进行的大量实验表明,TMA-GAT始终优于最先进的基线。进一步的定性和定量分析证实了我们提出的模型的有效性。
{"title":"Temporal Motif-Aware time series modeling via subsequence dynamics graph structures","authors":"Yakun Wang ,&nbsp;Yisheng Zou ,&nbsp;Gang Wang","doi":"10.1016/j.knosys.2025.115208","DOIUrl":"10.1016/j.knosys.2025.115208","url":null,"abstract":"<div><div>Subsequence-level structure in time series provides a compact and expressive basis for representation learning, yet most existing GNN-based approaches operate at the whole-sequence level, extracting local or static features or modeling only inter-series dependencies while failing to explicitly capture such recurring temporal patterns within individual sequences. To address this, we propose the Temporal Motif-aware Graph Attention Network (TMA-GAT), which reformulates sequence modeling as a temporal motif-aware graph representation learning problem. Specifically, raw series are first divided into subsequences and encoded to produce embeddings, upon which a deep clustering module identifies a set of recurring temporal motifs that serve as fundamental subsequence-level units. For each series, we then construct a personalized temporal motif graph whose nodes are temporal motif instances and whose edges describe their temporal transitions and structural dependencies. A graph attention mechanism is applied to model the structural dependencies and temporal transitions among temporal motifs. By explicitly capturing temporal motif-level transitions, TMA-GAT effectively integrates both short- and long-term temporal dependencies into the learned representations. Extensive experiments on two mainstream time series analysis tasks, including long-term forecasting and classification, demonstrate that TMA-GAT consistently outperforms state-of-the-art baselines. Further qualitative and quantitative analyses confirm the effectiveness of our proposed model.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"335 ","pages":"Article 115208"},"PeriodicalIF":7.6,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145927647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-Scale feature embedding framework using grouped and parametric convolutions for efficient time series imputation 基于分组卷积和参数卷积的多尺度特征嵌入框架
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.knosys.2025.115175
Ruochen Liu , Mingxin Teng , Junwei Ma , Kai Wu
Missing value imputation is a critical challenge in multivariate time series analysis, as incomplete data significantly degrades downstream task performance. Although recent methods employing Multi-Layer Perceptron (MLP) and Transformer-based models have gained attention for capturing non-linear relationships and long-range dependencies, they primarily focus on intrinsic temporal features, such as periodicity and trends, which limits their ability to handle complex, cross-variable interactions. Additionally, these models, often utilizing either simple mappings or attention mechanisms, face challenges in balancing effectiveness and computational efficiency, especially with randomly distributed missing values. To address these limitations, we propose a two-stage network architecture, the Parametric Grouped Convolutional Network (PGConvNet), specifically designed for time series imputation. By expanding multivariate time series from 1D to 2D and mapping variable information into higher-dimensional channels, PGConvNet effectively captures both temporal and inter-variable dependencies. The first stage employs the Multi-Scale Grouped Convolutional Block (MSGBlock) to extract multi-scale temporal and multivariate interaction features, while the second stage, the Parametric Grouped Convolutional Block (PGCBlock), dynamically adapts to the random positioning of missing values using parametric convolutions, capturing relevant variable and temporal information around missing data points in place of traditional attention mechanisms. Extensive experiments across multiple datasets demonstrate that PGConvNet achieves competitive performance compared to existing methods in accuracy and efficiency, while introducing a multi-dimensional convolutional approach for multivariate time series imputation that effectively addresses complex imputation scenarios. The source code of our proposed method is available at https://github.com/Tmx158/PGConvNet.
缺失值输入是多变量时间序列分析中的一个关键挑战,因为不完整的数据会显著降低下游任务的性能。尽管最近使用多层感知器(MLP)和基于变压器的模型的方法已经获得了捕获非线性关系和长期依赖关系的关注,但它们主要关注内在的时间特征,如周期性和趋势,这限制了它们处理复杂的跨变量相互作用的能力。此外,这些模型通常使用简单的映射或注意机制,在平衡有效性和计算效率方面面临挑战,特别是随机分布的缺失值。为了解决这些限制,我们提出了一个两阶段的网络架构,参数分组卷积网络(PGConvNet),专门为时间序列imputation设计。通过将多元时间序列从一维扩展到二维,并将变量信息映射到高维通道,PGConvNet有效地捕获了时间和变量间的依赖关系。第一阶段采用多尺度分组卷积块(MSGBlock)提取多尺度时间和多变量交互特征,第二阶段采用参数分组卷积块(PGCBlock),利用参数卷积动态适应缺失值的随机定位,捕获缺失数据点周围的相关变量和时间信息,取代传统的注意力机制。跨多个数据集的大量实验表明,与现有方法相比,PGConvNet在准确性和效率方面具有竞争力,同时引入了多维卷积方法用于多变量时间序列输入,有效地解决了复杂的输入场景。我们提出的方法的源代码可在https://github.com/Tmx158/PGConvNet上获得。
{"title":"A Multi-Scale feature embedding framework using grouped and parametric convolutions for efficient time series imputation","authors":"Ruochen Liu ,&nbsp;Mingxin Teng ,&nbsp;Junwei Ma ,&nbsp;Kai Wu","doi":"10.1016/j.knosys.2025.115175","DOIUrl":"10.1016/j.knosys.2025.115175","url":null,"abstract":"<div><div>Missing value imputation is a critical challenge in multivariate time series analysis, as incomplete data significantly degrades downstream task performance. Although recent methods employing Multi-Layer Perceptron (MLP) and Transformer-based models have gained attention for capturing non-linear relationships and long-range dependencies, they primarily focus on intrinsic temporal features, such as periodicity and trends, which limits their ability to handle complex, cross-variable interactions. Additionally, these models, often utilizing either simple mappings or attention mechanisms, face challenges in balancing effectiveness and computational efficiency, especially with randomly distributed missing values. To address these limitations, we propose a two-stage network architecture, the Parametric Grouped Convolutional Network (PGConvNet), specifically designed for time series imputation. By expanding multivariate time series from 1D to 2D and mapping variable information into higher-dimensional channels, PGConvNet effectively captures both temporal and inter-variable dependencies. The first stage employs the Multi-Scale Grouped Convolutional Block (MSGBlock) to extract multi-scale temporal and multivariate interaction features, while the second stage, the Parametric Grouped Convolutional Block (PGCBlock), dynamically adapts to the random positioning of missing values using parametric convolutions, capturing relevant variable and temporal information around missing data points in place of traditional attention mechanisms. Extensive experiments across multiple datasets demonstrate that PGConvNet achieves competitive performance compared to existing methods in accuracy and efficiency, while introducing a multi-dimensional convolutional approach for multivariate time series imputation that effectively addresses complex imputation scenarios. The source code of our proposed method is available at <span><span>https://github.com/Tmx158/PGConvNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115175"},"PeriodicalIF":7.6,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RISE: Semantics-guided discriminative and robust learning for unsupervised cross-modal hashing RISE:用于无监督跨模态哈希的语义引导的判别和鲁棒学习
IF 7.6 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-31 DOI: 10.1016/j.knosys.2025.115242
Rui Wang , Haixiao Huang , Yanglin Feng , Dezhong Peng , Peng Hu , Yongxiang Li
Unsupervised Cross-Modal Hashing (UCMH) has addressed critical needs for efficient multimodal retrieval but faces fundamental challenges: the semantic gap between modalities and the absence of supervision signals. To overcome these limitations, we propose a Robust unsupervISEd cross-modal hashing framework (RISE) that exploits invariant semantics as pseudo-supervision for unsupervised cross-modal learning. Our approach features: (1) Modality-specific encoders with Soft-Sign Hashing (SSH) layers for generating unified binary representations; (2) A Semantic Clustering Discriminative Learning (SCDL) module that constructs pseudo-prototypes by aligning cross-modal semantics while mitigating intra-cluster drift and inter-cluster ambiguity; (3) An Adaptively Robust Prototype-supervised Learning (ARPL) module that dynamically balances discriminative learning and noise tolerance for unreliable pseudo-labels. Extensive experiments on four benchmarks (i.e., MIRFLICKR-25K, IAPR TC-12, NUS-WIDE, and MS-COCO) demonstrate that RISE achieves state-of-the-art performance, which outperforms existing UCMH methods by significant margins. Ablation studies validate the complementary roles of SCDL and ARPL in addressing semantic structure learning and pseudo-label noise.
无监督跨模态哈希(UCMH)解决了高效多模态检索的关键需求,但面临着基本挑战:模态之间的语义差距和缺乏监督信号。为了克服这些限制,我们提出了一个鲁棒无监督跨模态哈希框架(RISE),该框架利用不变语义作为无监督跨模态学习的伪监督。我们的方法具有以下特点:(1)具有软签名哈希(SSH)层的模态特定编码器,用于生成统一的二进制表示;(2)语义聚类判别学习(SCDL)模块,该模块通过对齐跨模态语义来构建伪原型,同时减少簇内漂移和簇间歧义;(3)一种动态平衡不可靠伪标签的判别学习和噪声容忍的自适应鲁棒原型监督学习(ARPL)模块。在四个基准测试(即MIRFLICKR-25K、IAPR TC-12、NUS-WIDE和MS-COCO)上进行的大量实验表明,RISE实现了最先进的性能,大大优于现有的UCMH方法。消融研究验证了SCDL和ARPL在解决语义结构学习和伪标签噪声方面的互补作用。
{"title":"RISE: Semantics-guided discriminative and robust learning for unsupervised cross-modal hashing","authors":"Rui Wang ,&nbsp;Haixiao Huang ,&nbsp;Yanglin Feng ,&nbsp;Dezhong Peng ,&nbsp;Peng Hu ,&nbsp;Yongxiang Li","doi":"10.1016/j.knosys.2025.115242","DOIUrl":"10.1016/j.knosys.2025.115242","url":null,"abstract":"<div><div>Unsupervised Cross-Modal Hashing (UCMH) has addressed critical needs for efficient multimodal retrieval but faces fundamental challenges: the semantic gap between modalities and the absence of supervision signals. To overcome these limitations, we propose a <strong>R</strong>obust unsuperv<strong>ISE</strong>d cross-modal hashing framework (RISE) that exploits invariant semantics as pseudo-supervision for unsupervised cross-modal learning. Our approach features: (1) Modality-specific encoders with Soft-Sign Hashing (SSH) layers for generating unified binary representations; (2) A Semantic Clustering Discriminative Learning (SCDL) module that constructs pseudo-prototypes by aligning cross-modal semantics while mitigating intra-cluster drift and inter-cluster ambiguity; (3) An Adaptively Robust Prototype-supervised Learning (ARPL) module that dynamically balances discriminative learning and noise tolerance for unreliable pseudo-labels. Extensive experiments on four benchmarks (i.e., MIRFLICKR-25K, IAPR TC-12, NUS-WIDE, and MS-COCO) demonstrate that RISE achieves state-of-the-art performance, which outperforms existing UCMH methods by significant margins. Ablation studies validate the complementary roles of SCDL and ARPL in addressing semantic structure learning and pseudo-label noise.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"336 ","pages":"Article 115242"},"PeriodicalIF":7.6,"publicationDate":"2025-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145915300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Knowledge-Based Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1