Cognitive Computation最新文献

A Joint Network for Low-Light Image Enhancement Based on Retinex 基于 Retinex 的低照度图像增强联合网络

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-09-16 DOI: 10.1007/s12559-024-10347-4

Yonglong Jiang, Jiahe Zhu, Liangliang Li, Hongbing Ma

Methods based on the physical Retinex model are effective in enhancing low-light images, adeptly handling the challenges posed by low signal-to-noise ratios and high noise in images captured under weak lighting conditions. However, traditional models based on manually designed Retinex priors do not adapt well to complex and varying degradation environments. DEANet (Jiang et al., Tsinghua Sci Technol. 2023;28(4):743–53 2023) combines frequency and Retinex to address the interference of high-frequency noise in low-light image restoration. Nonetheless, low-frequency noise still significantly impacts the restoration of low-light images. To overcome this issue, this paper integrates the physical Retinex model with deep learning to propose a joint network model, DEANet++, for enhancing low-light images. The model is divided into three modules: decomposition, enhancement, and adjustment. The decomposition module employs a data-driven approach based on Retinex theory to split the image; the enhancement module restores degradation and adjusts brightness in the decomposed images; and the adjustment module restores details and adjusts complex features in the enhanced images. Trained on the publicly available LOL dataset, DEANet++ not only surpasses the control group in both visual and quantitative aspects but also achieves superior results compared to other Retinex-based enhancement methods. Ablation studies and additional experiments highlight the importance of each component in this method.

基于物理 Retinex 模型的方法能有效增强弱光图像的效果，并能巧妙地应对弱光条件下拍摄的低信噪比和高噪声图像所带来的挑战。然而，基于人工设计的 Retinex 前验的传统模型不能很好地适应复杂多变的退化环境。DEANet （Jiang 等，Tsinghua Sci Technol.2023; 28(4):743-53 2023）将频率和 Retinex 结合起来，解决了低照度图像复原中高频噪声的干扰问题。然而，低频噪声仍会对低照度图像的修复产生重大影响。为了克服这一问题，本文将物理 Retinex 模型与深度学习相结合，提出了一种用于增强低照度图像的联合网络模型 DEANet++。该模型分为三个模块：分解、增强和调整。分解模块采用基于 Retinex 理论的数据驱动方法对图像进行分解；增强模块对分解后的图像进行降级恢复和亮度调整；调整模块对增强后的图像进行细节恢复和复杂特征调整。在公开的 LOL 数据集上进行训练后，DEANet++ 不仅在视觉和定量方面超越了对照组，而且与其他基于 Retinex 的增强方法相比也取得了优异的结果。消融研究和其他实验凸显了该方法中每个组件的重要性。

{"title":"A Joint Network for Low-Light Image Enhancement Based on Retinex","authors":"Yonglong Jiang, Jiahe Zhu, Liangliang Li, Hongbing Ma","doi":"10.1007/s12559-024-10347-4","DOIUrl":"https://doi.org/10.1007/s12559-024-10347-4","url":null,"abstract":"Methods based on the physical Retinex model are effective in enhancing low-light images, adeptly handling the challenges posed by low signal-to-noise ratios and high noise in images captured under weak lighting conditions. However, traditional models based on manually designed Retinex priors do not adapt well to complex and varying degradation environments. DEANet (Jiang et al., Tsinghua Sci Technol. 2023;28(4):743–53 2023) combines frequency and Retinex to address the interference of high-frequency noise in low-light image restoration. Nonetheless, low-frequency noise still significantly impacts the restoration of low-light images. To overcome this issue, this paper integrates the physical Retinex model with deep learning to propose a joint network model, DEANet++, for enhancing low-light images. The model is divided into three modules: decomposition, enhancement, and adjustment. The decomposition module employs a data-driven approach based on Retinex theory to split the image; the enhancement module restores degradation and adjusts brightness in the decomposed images; and the adjustment module restores details and adjusts complex features in the enhanced images. Trained on the publicly available LOL dataset, DEANet++ not only surpasses the control group in both visual and quantitative aspects but also achieves superior results compared to other Retinex-based enhancement methods. Ablation studies and additional experiments highlight the importance of each component in this method.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"110 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incorporating Template-Based Contrastive Learning into Cognitively Inspired, Low-Resource Relation Extraction 将基于模板的对比学习融入认知启发的低资源关系提取中

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-09-10 DOI: 10.1007/s12559-024-10343-8

Yandan Zheng, Luu Anh Tuan

From an unstructured text, relation extraction (RE) predicts semantic relationships between pairs of entities. The process of labeling tokens and phrases can be very expensive and require a great deal of time and effort. The low-resource relation extraction (LRE) problem comes into being and is challenging since there are only a limited number of annotated sentences available. Recent research has focused on minimizing the cross-entropy loss between pseudo labels and ground truth or on using external knowledge to make annotations for unlabeled data. Existing methods, however, fail to take into account the semantics of relation types and the information hidden within different relation groups. By drawing inspiration from the process of human interpretation of unstructured documents, we introduce a Template-based Contrastive Learning ( TempCL ). Through the use of template, we limit the model’s attention to the semantic information that is contained in a relation. Then, we employ a contrastive learning strategy using both group-wise and instance-wise perspectives to leverage shared semantic information within the same relation type to achieve a more coherent semantic representation. Particularly, the proposed group-wise contrastive learning minimizes the discrepancy between the template and original sentences in the same label group and maximizes the difference between those from separate label groups under limited annotation settings. Our experiment results on two public datasets show that our model TempCL achieves state-of-the-art results for low-resource relation extraction in comparison to baselines. The relative error reductions range from 0.68 to 1.32%. Our model encourages the feature to be aligned with both the original and template sentences. Using two contrastive losses, we exploit shared semantic information underlying sentences (both original and template) that have the same relation type. We demonstrate that our method reduces the noise caused by tokens that are unrelated and constrains the model’s attention to the tokens that are related.

关系提取（RE）是从非结构化文本中预测实体对之间的语义关系。标记标记符和短语的过程可能非常昂贵，需要花费大量的时间和精力。低资源关系提取（LRE）问题应运而生，由于可用的注释句子数量有限，因此具有挑战性。最近的研究主要集中在尽量减少伪标签和地面实况之间的交叉熵损失，或利用外部知识为无标签数据进行注释。然而，现有的方法没有考虑到关系类型的语义以及隐藏在不同关系组中的信息。通过从人类对非结构化文档的解释过程中汲取灵感，我们引入了基于模板的对比学习（TempCL）。通过使用模板，我们将模型的注意力限制在关系中包含的语义信息上。然后，我们采用了一种对比学习策略，从分组和实例两个角度来利用同一关系类型中的共享语义信息，从而获得更加连贯的语义表征。特别是，在有限的注释设置下，所提出的分组对比学习能使同一标签组中模板与原始句子之间的差异最小化，并使不同标签组中模板与原始句子之间的差异最大化。我们在两个公开数据集上的实验结果表明，与基线相比，我们的模型 TempCL 在低资源关系提取方面取得了最先进的结果。相对误差降低了 0.68% 到 1.32%。我们的模型鼓励特征与原始句子和模板句子保持一致。利用两种对比损失，我们利用了具有相同关系类型的句子（包括原始句和模板句）中的共享语义信息。我们证明，我们的方法减少了不相关的标记所造成的噪音，并将模型的注意力限制在相关的标记上。

{"title":"Incorporating Template-Based Contrastive Learning into Cognitively Inspired, Low-Resource Relation Extraction","authors":"Yandan Zheng, Luu Anh Tuan","doi":"10.1007/s12559-024-10343-8","DOIUrl":"https://doi.org/10.1007/s12559-024-10343-8","url":null,"abstract":"From an unstructured text, relation extraction (RE) predicts semantic relationships between pairs of entities. The process of labeling tokens and phrases can be very expensive and require a great deal of time and effort. The low-resource relation extraction (LRE) problem comes into being and is challenging since there are only a limited number of annotated sentences available. Recent research has focused on minimizing the cross-entropy loss between pseudo labels and ground truth or on using external knowledge to make annotations for unlabeled data. Existing methods, however, fail to take into account the semantics of relation types and the information hidden within different relation groups. By drawing inspiration from the process of human interpretation of unstructured documents, we introduce a Template-based Contrastive Learning ( TempCL ). Through the use of template, we limit the model’s attention to the semantic information that is contained in a relation. Then, we employ a contrastive learning strategy using both group-wise and instance-wise perspectives to leverage shared semantic information within the same relation type to achieve a more coherent semantic representation. Particularly, the proposed group-wise contrastive learning minimizes the discrepancy between the template and original sentences in the same label group and maximizes the difference between those from separate label groups under limited annotation settings. Our experiment results on two public datasets show that our model TempCL achieves state-of-the-art results for low-resource relation extraction in comparison to baselines. The relative error reductions range from 0.68 to 1.32%. Our model encourages the feature to be aligned with both the original and template sentences. Using two contrastive losses, we exploit shared semantic information underlying sentences (both original and template) that have the same relation type. We demonstrate that our method reduces the noise caused by tokens that are unrelated and constrains the model’s attention to the tokens that are related.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"100 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Cognitive Rough Approach for Severity Analysis of Autistic Children Using Spherical Fuzzy Bipolar Soft Sets 利用球形模糊双极性软集分析自闭症儿童严重程度的新型认知粗糙方法

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-09-06 DOI: 10.1007/s12559-024-10349-2

Ghous Ali, Nimra Lateef, Muhammad Usman Zia, Tehseen Abbas

Autism spectrum disorders (ASDs) pose complex challenges, characterized by atypical behaviors, sensory sensitivities, and difficulties in social interaction. Despite extensive research, their exact causes remain elusive, indicating a multifactorial interplay of genetic, environmental, and neurological factors. This complexity calls for innovative approaches to ASD understanding and management. Motivated by the need to address the nuanced and uncertain nature of ASD-related data, in this study, we introduce a novel hybrid model called rough spherical fuzzy bipolar soft sets (RSFBSSs) by integrating rough sets, spherical fuzzy sets, and bipolar soft sets, which accommodates imprecision inherent in clinical assessments. We build upon foundational concepts of RSFBSS theory, developing a comprehensive algorithm for uncertain multiple attribute decision-making (MADM). Leveraging this framework, we aim to assess ASD symptom severity in pediatric populations, considering diverse contributing factors to ASD pathogenesis. The RSFBSSs offer advantages over existing methodologies, providing a robust framework for handling complex ASD data. The algorithmic framework facilitates accurate and individualized assessments of ASD symptomatology. To validate our model’s efficacy, we conduct a comparative analysis with preexisting hybrid models, employing quantitative metrics and qualitative evaluations. Through this comprehensive evaluation, we demonstrate the superior performance and versatility of RSFBSSs, offering promising avenues for advancing ASD management.

自闭症谱系障碍（ASD）带来了复杂的挑战，其特点是行为不典型、感官敏感和社交困难。尽管进行了广泛的研究，但其确切病因仍然难以捉摸，这表明是遗传、环境和神经因素等多因素相互作用的结果。这种复杂性要求采用创新的方法来理解和管理 ASD。为了解决 ASD 相关数据的细微差别和不确定性，在本研究中，我们通过整合粗糙集、球形模糊集和双极性软集，引入了一种称为粗糙球形模糊双极性软集（RSFBSS）的新型混合模型，该模型可适应临床评估中固有的不精确性。我们以 RSFBSS 理论的基本概念为基础，开发了一种用于不确定多属性决策 (MADM) 的综合算法。利用这一框架，我们旨在评估儿科人群中 ASD 症状的严重程度，同时考虑导致 ASD 发病的各种因素。与现有方法相比，RSFBSS 具有优势，为处理复杂的 ASD 数据提供了一个强大的框架。该算法框架有助于对 ASD 症状进行准确和个性化的评估。为了验证我们模型的有效性，我们采用定量指标和定性评估，与已有的混合模型进行了比较分析。通过这项综合评估，我们证明了 RSFBSS 的卓越性能和多功能性，为推进 ASD 管理提供了前景广阔的途径。

{"title":"A Novel Cognitive Rough Approach for Severity Analysis of Autistic Children Using Spherical Fuzzy Bipolar Soft Sets","authors":"Ghous Ali, Nimra Lateef, Muhammad Usman Zia, Tehseen Abbas","doi":"10.1007/s12559-024-10349-2","DOIUrl":"https://doi.org/10.1007/s12559-024-10349-2","url":null,"abstract":"Autism spectrum disorders (ASDs) pose complex challenges, characterized by atypical behaviors, sensory sensitivities, and difficulties in social interaction. Despite extensive research, their exact causes remain elusive, indicating a multifactorial interplay of genetic, environmental, and neurological factors. This complexity calls for innovative approaches to ASD understanding and management. Motivated by the need to address the nuanced and uncertain nature of ASD-related data, in this study, we introduce a novel hybrid model called rough spherical fuzzy bipolar soft sets (RSFBSSs) by integrating rough sets, spherical fuzzy sets, and bipolar soft sets, which accommodates imprecision inherent in clinical assessments. We build upon foundational concepts of RSFBSS theory, developing a comprehensive algorithm for uncertain multiple attribute decision-making (MADM). Leveraging this framework, we aim to assess ASD symptom severity in pediatric populations, considering diverse contributing factors to ASD pathogenesis. The RSFBSSs offer advantages over existing methodologies, providing a robust framework for handling complex ASD data. The algorithmic framework facilitates accurate and individualized assessments of ASD symptomatology. To validate our model’s efficacy, we conduct a comparative analysis with preexisting hybrid models, employing quantitative metrics and qualitative evaluations. Through this comprehensive evaluation, we demonstrate the superior performance and versatility of RSFBSSs, offering promising avenues for advancing ASD management.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"59 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cognitively Inspired Three-Way Decision Making and Bi-Level Evolutionary Optimization for Mobile Cybersecurity Threats Detection: A Case Study on Android Malware 用于移动网络安全威胁检测的认知启发式三向决策和双级进化优化：安卓恶意软件案例研究

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-09-06 DOI: 10.1007/s12559-024-10337-6

Manel Jerbi, Zaineb Chelly Dagdia, Slim Bechikh, Lamjed Ben Said

Malicious apps use a variety of methods to spread infections, take over computers and/or IoT devices, and steal sensitive data. Several detection techniques have been proposed to counter these attacks. Despite the promising results of recent malware detection strategies, particularly those addressing evolving threats, inefficiencies persist due to potential inconsistency in both the generated malicious malware and the pre-specified detection rules, as well as their crisp decision-making process. In this paper, we propose to address these issues by (i) considering the detection rules generation process as a Bi-Level Optimization Problem, where a competition between two levels (an upper level and a lower one) produces a set of effective detection rules capable of detecting new variants of existing and even unseen malware patterns. This bi-level strategy is subtly inspired by natural evolutionary processes, where organisms adapt and evolve through continuous interaction and competition within their environments. Furthermore, (ii) we leverage the fundamentals of Rough Set Theory, which reflects cognitive decision-making processes, to assess the true nature of artificially generated malicious patterns. This involves retaining only the consistent malicious patterns and detection rules and categorizing these rules into a three-way decision framework comprising accept, abstain, and reject options. Our novel malware detection technique outperforms several state-of-the-art methods on various Android malware datasets, accurately predicting new apps with a 96.76% accuracy rate. Moreover, our approach is versatile and effective in detecting patterns applicable to a variety of cybersecurity threats.

恶意应用程序使用各种方法传播感染、接管计算机和/或物联网设备并窃取敏感数据。目前已提出了多种检测技术来应对这些攻击。尽管最近的恶意软件检测策略取得了可喜的成果，特别是那些应对不断演变的威胁的策略，但由于生成的恶意软件和预先指定的检测规则可能不一致，以及它们的决策过程简单，因此效率低下的问题依然存在。在本文中，我们建议通过以下方法来解决这些问题：(i) 将检测规则生成过程视为双层优化问题，通过两个层次（上层和下层）之间的竞争，产生一套有效的检测规则，能够检测现有甚至未见过的恶意软件模式的新变种。这种双层策略巧妙地受到了自然进化过程的启发，在自然进化过程中，生物通过在其环境中的不断互动和竞争来适应和进化。此外，(ii) 我们利用反映认知决策过程的粗糙集理论的基本原理来评估人工生成的恶意模式的真实性质。这包括只保留一致的恶意模式和检测规则，并将这些规则归类到一个由接受、弃权和拒绝选项组成的三向决策框架中。我们的新型恶意软件检测技术在各种安卓恶意软件数据集上的表现优于几种最先进的方法，准确预测新应用程序的准确率高达 96.76%。此外，我们的方法用途广泛，能有效检测出适用于各种网络安全威胁的模式。

{"title":"Cognitively Inspired Three-Way Decision Making and Bi-Level Evolutionary Optimization for Mobile Cybersecurity Threats Detection: A Case Study on Android Malware","authors":"Manel Jerbi, Zaineb Chelly Dagdia, Slim Bechikh, Lamjed Ben Said","doi":"10.1007/s12559-024-10337-6","DOIUrl":"https://doi.org/10.1007/s12559-024-10337-6","url":null,"abstract":"Malicious apps use a variety of methods to spread infections, take over computers and/or IoT devices, and steal sensitive data. Several detection techniques have been proposed to counter these attacks. Despite the promising results of recent malware detection strategies, particularly those addressing evolving threats, inefficiencies persist due to potential inconsistency in both the generated malicious malware and the pre-specified detection rules, as well as their crisp decision-making process. In this paper, we propose to address these issues by (i) considering the detection rules generation process as a Bi-Level Optimization Problem, where a competition between two levels (an upper level and a lower one) produces a set of effective detection rules capable of detecting new variants of existing and even unseen malware patterns. This bi-level strategy is subtly inspired by natural evolutionary processes, where organisms adapt and evolve through continuous interaction and competition within their environments. Furthermore, (ii) we leverage the fundamentals of Rough Set Theory, which reflects cognitive decision-making processes, to assess the true nature of artificially generated malicious patterns. This involves retaining only the consistent malicious patterns and detection rules and categorizing these rules into a three-way decision framework comprising accept, abstain, and reject options. Our novel malware detection technique outperforms several state-of-the-art methods on various Android malware datasets, accurately predicting new apps with a 96.76% accuracy rate. Moreover, our approach is versatile and effective in detecting patterns applicable to a variety of cybersecurity threats.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"12 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Probing Fundamental Visual Comprehend Capabilities on Vision Language Models via Visual Phrases from Structural Data 通过结构数据中的视觉短语探究视觉语言模型的基本视觉理解能力

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-09-05 DOI: 10.1007/s12559-024-10351-8

Peijin Xie, Bingquan Liu

Does the model demonstrate exceptional proficiency in “item counting,” “color recognition,” or other Fundamental Visual Comprehension Capability (FVCC)? There have been remarkable advancements in the field of multimodal, the pretrained general Vision Language Models exhibit strong performance across a range of intricate Visual Language (VL) tasks and Multimodal Large Language Models (MLLMs) emerge novel visual reasoning abilities from several examples. But models tend to encounter difficulties when confronted with texts supplemented with specific details by simple visual phrases. Moreover, there is a scarcity of datasets in sufficient quantity, variety, and composability to enable the evaluation of each FVCC using statistical metrics. Accordingly, we decomposed the complete VL task into 9 M simple Visual Phrase Triplets (VPTs) across 16 categories representing 16 distinct FVCCs from the structural scene graph. Then, we reconstructed a Multilevel Scene Graph (MLSG) for each image and introduced our unbiased, balanced, and binary Visual Phrase Entailment benchmark with 20 times the data volume of SNLI-VE. The benchmark consisted of three exams and evaluated the performance of 8 widely used VLM and 10 MLLMs respectively. The results demonstrate the performance of each model across 16 classes in FVCC, as well as their lower and upper limits under conditions of increased text complexity or unnoised image input. Finally, we enhanced the efficiency of MLLM and evoked their In-Context Learning characteristics by appending multiple VPT generated QA pairs of identical types to the conversation history without tuning. The proposed structural VPTs and MLSG data hold promise for facilitating future explorations on FVCC.

该模型是否在 "项目计数"、"颜色识别 "或其他基本视觉理解能力（FVCC）方面表现出卓越的能力？在多模态领域已经取得了令人瞩目的进展，预训练的通用视觉语言模型在一系列复杂的视觉语言（VL）任务中表现出强劲的性能，而多模态大语言模型（MLLMs）则从多个实例中展现出新颖的视觉推理能力。但是，在面对以简单视觉短语补充具体细节的文本时，模型往往会遇到困难。此外，我们还缺乏足够数量、种类和可组合性的数据集，因此无法使用统计指标对每个 FVCC 进行评估。因此，我们将完整的 VL 任务分解为 16 个类别中的 9 M 个简单视觉短语三元组 (VPT)，代表了结构场景图中的 16 个不同的 FVCC。然后，我们为每张图像重建了一个多层次场景图（MLSG），并引入了我们的无偏、平衡和二进制视觉短语缺失基准，其数据量是 SNLI-VE 的 20 倍。该基准包括三项考试，分别评估了 8 种广泛使用的 VLM 和 10 种 MLLM 的性能。结果表明了每个模型在 FVCC 的 16 个类别中的性能，以及在文本复杂度增加或图像输入未失真条件下的下限和上限。最后，我们提高了 MLLM 的效率，并通过将多个 VPT 生成的相同类型的 QA 对添加到对话历史记录中，在不进行调整的情况下唤起了它们的上下文学习（In-Context Learning）特性。所提出的结构化 VPT 和 MLSG 数据有望促进未来对 FVCC 的探索。

{"title":"Probing Fundamental Visual Comprehend Capabilities on Vision Language Models via Visual Phrases from Structural Data","authors":"Peijin Xie, Bingquan Liu","doi":"10.1007/s12559-024-10351-8","DOIUrl":"https://doi.org/10.1007/s12559-024-10351-8","url":null,"abstract":"Does the model demonstrate exceptional proficiency in “item counting,” “color recognition,” or other Fundamental Visual Comprehension Capability (FVCC)? There have been remarkable advancements in the field of multimodal, the pretrained general Vision Language Models exhibit strong performance across a range of intricate Visual Language (VL) tasks and Multimodal Large Language Models (MLLMs) emerge novel visual reasoning abilities from several examples. But models tend to encounter difficulties when confronted with texts supplemented with specific details by simple visual phrases. Moreover, there is a scarcity of datasets in sufficient quantity, variety, and composability to enable the evaluation of each FVCC using statistical metrics. Accordingly, we decomposed the complete VL task into 9 M simple Visual Phrase Triplets (VPTs) across 16 categories representing 16 distinct FVCCs from the structural scene graph. Then, we reconstructed a Multilevel Scene Graph (MLSG) for each image and introduced our unbiased, balanced, and binary Visual Phrase Entailment benchmark with 20 times the data volume of SNLI-VE. The benchmark consisted of three exams and evaluated the performance of 8 widely used VLM and 10 MLLMs respectively. The results demonstrate the performance of each model across 16 classes in FVCC, as well as their lower and upper limits under conditions of increased text complexity or unnoised image input. Finally, we enhanced the efficiency of MLLM and evoked their In-Context Learning characteristics by appending multiple VPT generated QA pairs of identical types to the conversation history without tuning. The proposed structural VPTs and MLSG data hold promise for facilitating future explorations on FVCC.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"155 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Comprehensive Survey on Generative AI for Metaverse: Enabling Immersive Experience 针对 Metaverse 的生成式人工智能综合调查：实现身临其境的体验

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-09-04 DOI: 10.1007/s12559-024-10342-9

Vinay Chamola, Siva Sai, Animesh Bhargava, Ashis Sahu, Wenchao Jiang, Zehui Xiong, Dusit Niyato, Amir Hussain

Generative Artificial Intelligence models are Artificial Intelligence models that generate new content based on a prompt or input. The output content can be in various forms, including text, images, and video. Metaverse refers to a virtual world where users can interact with each other, objects and events in an immersive, realistic, and dynamic manner. A critical and foremost step in realizing the Metaverse is content creation for its different realms. Given Metaverse’s need for enormous content, Generative AI is a perfect technology for content creation. This paper explores how Generative AI models can help fulfil the potential of the Metaverse by assisting in the design and production of various aspects of the Metaverse and attracting users not just by creating dynamic, interactive, and personalised content at scale but also by producing various revenue-generating opportunities for users and organisations in the Metaverse. The paper analyses the Generative AI models by grouping them according to the type of content they generate, namely text, image, video, 3D visual, audio, and gaming. Various use cases in the Metaverse are explored and listed according to each type of AI Generated Content (AIGC). This paper also presents several applications and scenarios where the mixture of different Generative AI (GAI) models benefits the Metaverse. Further, this paper also enumerates the limitations and challenges of Generative AI models and the areas of future work. Despite the obstacles, Generative AI can realise the potential of the Metaverse by making it much more functional and interactive owing to the vast use cases of different types of AIGC in the Metaverse, and the age of virtual reality may not be too distant.

生成式人工智能模型是根据提示或输入生成新内容的人工智能模型。输出内容可以是文本、图像和视频等各种形式。元世界（Metaverse）指的是一个虚拟世界，在这个世界里，用户可以身临其境、逼真、动态地与他人、物体和事件进行交互。实现 Metaverse 的关键和首要步骤是为其不同领域创建内容。鉴于 Metaverse 对大量内容的需求，生成式人工智能是内容创建的完美技术。本文探讨了生成式人工智能模型如何通过协助设计和制作 Metaverse 的各个方面来帮助实现 Metaverse 的潜力，以及如何不仅通过大规模创建动态、互动和个性化内容来吸引用户，而且通过为 Metaverse 中的用户和组织创造各种创收机会来吸引用户。本文根据生成内容的类型，即文本、图像、视频、三维视觉、音频和游戏，对生成式人工智能模型进行了分析。根据每种类型的人工智能生成内容（AIGC），探讨并列出了元宇宙中的各种用例。本文还介绍了几种应用和场景，在这些应用和场景中，混合使用不同的生成式人工智能（GAI）模型可为 Metaverse 带来益处。此外，本文还列举了生成式人工智能模型的局限性和挑战，以及未来的工作领域。尽管障碍重重，但由于元宇宙中不同类型 AIGC 的大量使用案例，生成式人工智能可以发挥元宇宙的潜力，使其功能性和交互性大大增强，虚拟现实时代也许并不遥远。

{"title":"A Comprehensive Survey on Generative AI for Metaverse: Enabling Immersive Experience","authors":"Vinay Chamola, Siva Sai, Animesh Bhargava, Ashis Sahu, Wenchao Jiang, Zehui Xiong, Dusit Niyato, Amir Hussain","doi":"10.1007/s12559-024-10342-9","DOIUrl":"https://doi.org/10.1007/s12559-024-10342-9","url":null,"abstract":"Generative Artificial Intelligence models are Artificial Intelligence models that generate new content based on a prompt or input. The output content can be in various forms, including text, images, and video. Metaverse refers to a virtual world where users can interact with each other, objects and events in an immersive, realistic, and dynamic manner. A critical and foremost step in realizing the Metaverse is content creation for its different realms. Given Metaverse’s need for enormous content, Generative AI is a perfect technology for content creation. This paper explores how Generative AI models can help fulfil the potential of the Metaverse by assisting in the design and production of various aspects of the Metaverse and attracting users not just by creating dynamic, interactive, and personalised content at scale but also by producing various revenue-generating opportunities for users and organisations in the Metaverse. The paper analyses the Generative AI models by grouping them according to the type of content they generate, namely text, image, video, 3D visual, audio, and gaming. Various use cases in the Metaverse are explored and listed according to each type of AI Generated Content (AIGC). This paper also presents several applications and scenarios where the mixture of different Generative AI (GAI) models benefits the Metaverse. Further, this paper also enumerates the limitations and challenges of Generative AI models and the areas of future work. Despite the obstacles, Generative AI can realise the potential of the Metaverse by making it much more functional and interactive owing to the vast use cases of different types of AIGC in the Metaverse, and the age of virtual reality may not be too distant.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"17 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Pre-trained Deep Learning Model with Self-Adaptive Reflection 利用自适应反射增强预训练深度学习模型

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-09-03 DOI: 10.1007/s12559-024-10348-3

Xinzhi Wang, Mengyue Li, Hang Yu, Chenyang Wang, Vijayan Sugumaran, Hui Zhang

In the text mining area, prevalent deep learning models primarily focus on mapping input features to result of predicted outputs, which exhibit a deficiency in self-dialectical thinking process. Inspired by self-reflective mechanisms in human cognition, we propose a hypothesis that existing models emulate decision-making processes and automatically rectify erroneous predictions. The Self-adaptive Reflection Enhanced pre-trained deep learning Model (S-REM) is introduced to validate our hypotheses and to determine the types of knowledge that warrant reproduction. Based on the pretrained-model, S-REM introduces the local explanation for pseudo-label and the global explanation for all labels as the explanation knowledge. The keyword knowledge from TF-IDF model is also integrated to form a reflection knowledge. Based on the key explanation features, the pretrained-model reflects on the initial decision by two reflection methods and optimizes the prediction of deep learning models. Experiments with local and global reflection variants of S-REM on two text mining tasks across four datasets, encompassing three public and one private dataset were conducted. The outcomes demonstrate the efficacy of our method in improving the accuracy of state-of-the-art deep learning models. Furthermore, the method can serve as a foundational step towards developing explainable through integration with various deep learning models.

在文本挖掘领域，流行的深度学习模型主要侧重于将输入特征映射到预测输出结果，在自我辩证思维过程中表现出不足。受人类认知中自我反思机制的启发，我们提出了一种假设，即现有模型可模仿决策过程并自动纠正错误预测。我们引入了自适应反思增强型预训练深度学习模型（S-REM）来验证我们的假设，并确定值得复制的知识类型。在预训练模型的基础上，S-REM 引入了伪标签的局部解释和所有标签的全局解释作为解释知识。TF-IDF 模型中的关键词知识也被整合进来，形成反映知识。基于关键解释特征，预训练模型通过两种反思方法对初始决策进行反思，并优化深度学习模型的预测。在四个数据集（包括三个公共数据集和一个私有数据集）的两个文本挖掘任务中，对 S-REM 的局部和全局反射变体进行了实验。实验结果表明，我们的方法能有效提高最先进的深度学习模型的准确性。此外，该方法还可以通过与各种深度学习模型的整合，为开发可解释性奠定基础。

{"title":"Enhancing Pre-trained Deep Learning Model with Self-Adaptive Reflection","authors":"Xinzhi Wang, Mengyue Li, Hang Yu, Chenyang Wang, Vijayan Sugumaran, Hui Zhang","doi":"10.1007/s12559-024-10348-3","DOIUrl":"https://doi.org/10.1007/s12559-024-10348-3","url":null,"abstract":"In the text mining area, prevalent deep learning models primarily focus on mapping input features to result of predicted outputs, which exhibit a deficiency in self-dialectical thinking process. Inspired by self-reflective mechanisms in human cognition, we propose a hypothesis that existing models emulate decision-making processes and automatically rectify erroneous predictions. The Self-adaptive Reflection Enhanced pre-trained deep learning Model (S-REM) is introduced to validate our hypotheses and to determine the types of knowledge that warrant reproduction. Based on the pretrained-model, S-REM introduces the local explanation for pseudo-label and the global explanation for all labels as the explanation knowledge. The keyword knowledge from TF-IDF model is also integrated to form a reflection knowledge. Based on the key explanation features, the pretrained-model reflects on the initial decision by two reflection methods and optimizes the prediction of deep learning models. Experiments with local and global reflection variants of S-REM on two text mining tasks across four datasets, encompassing three public and one private dataset were conducted. The outcomes demonstrate the efficacy of our method in improving the accuracy of state-of-the-art deep learning models. Furthermore, the method can serve as a foundational step towards developing explainable through integration with various deep learning models.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"87 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PDD: Pruning Neural Networks During Knowledge Distillation PDD：在知识提炼过程中剪枝神经网络

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-08-31 DOI: 10.1007/s12559-024-10350-9

Xi Dan, Wenjie Yang, Fuyan Zhang, Yihang Zhou, Zhuojun Yu, Zhen Qiu, Boyuan Zhao, Zeyu Dong, Libo Huang, Chuanguang Yang

Although deep neural networks have developed at a high level, the large computational requirement limits the deployment in end devices. To this end, a variety of model compression and acceleration techniques have been developed. Among these, knowledge distillation has emerged as a popular approach that involves training a small student model to mimic the performance of a larger teacher model. However, the student architectures used in existing knowledge distillation are not optimal and always have redundancy, which raises questions about the validity of this assumption in practice. This study aims to investigate this assumption and empirically demonstrate that student models could contain redundancy, which can be removed through pruning without significant performance degradation. Therefore, we propose a novel pruning method to eliminate redundancy in student models. Instead of using traditional post-training pruning methods, we perform pruning during knowledge distillation (PDD) to prevent any loss of important information from the teacher models to the student models. This is achieved by designing a differentiable mask for each convolutional layer, which can dynamically adjust the channels to be pruned based on the loss. Experimental results show that with ResNet20 as the student model and ResNet56 as the teacher model, a 39.53%-FLOPs reduction was achieved by removing 32.77% of parameters, while the top-1 accuracy on CIFAR10 increased by 0.17%. With VGG11 as the student model and VGG16 as the teacher model, a 74.96%-FLOPs reduction was achieved by removing 76.43% of parameters, with only a loss of 1.34% in the top-1 accuracy on CIFAR10. Our code is available at https://github.com/YihangZhou0424/PDD-Pruning-during-distillation.

虽然深度神经网络已经发展到了很高的水平，但其庞大的计算需求限制了其在终端设备中的部署。为此，人们开发了各种模型压缩和加速技术。其中，知识蒸馏已成为一种流行的方法，它包括训练一个小的学生模型来模仿一个大的教师模型的性能。然而，现有知识蒸馏中使用的学生架构并不是最优的，总是存在冗余，这就对这一假设在实践中的有效性提出了质疑。本研究旨在对这一假设进行调查，并通过实证证明学生模型可能包含冗余，而这些冗余可以通过剪枝去除，且不会明显降低性能。因此，我们提出了一种新颖的剪枝方法来消除学生模型中的冗余。我们不使用传统的训练后剪枝方法，而是在知识蒸馏（PDD）过程中进行剪枝，以防止教师模型中的重要信息流失到学生模型中。这是通过为每个卷积层设计一个可微分掩码来实现的，它可以根据损失情况动态调整要剪枝的通道。实验结果表明，以 ResNet20 作为学生模型，以 ResNet56 作为教师模型，通过去除 32.77% 的参数，实现了 39.53%-FLOPs 的缩减，而 CIFAR10 的 top-1 准确率提高了 0.17%。以 VGG11 作为学生模型，以 VGG16 作为教师模型，通过移除 76.43% 的参数，实现了 74.96%-FLOPs 的缩减，而 CIFAR10 的前 1 名准确率仅下降了 1.34%。我们的代码见 https://github.com/YihangZhou0424/PDD-Pruning-during-distillation。

{"title":"PDD: Pruning Neural Networks During Knowledge Distillation","authors":"Xi Dan, Wenjie Yang, Fuyan Zhang, Yihang Zhou, Zhuojun Yu, Zhen Qiu, Boyuan Zhao, Zeyu Dong, Libo Huang, Chuanguang Yang","doi":"10.1007/s12559-024-10350-9","DOIUrl":"https://doi.org/10.1007/s12559-024-10350-9","url":null,"abstract":"Although deep neural networks have developed at a high level, the large computational requirement limits the deployment in end devices. To this end, a variety of model compression and acceleration techniques have been developed. Among these, knowledge distillation has emerged as a popular approach that involves training a small student model to mimic the performance of a larger teacher model. However, the student architectures used in existing knowledge distillation are not optimal and always have redundancy, which raises questions about the validity of this assumption in practice. This study aims to investigate this assumption and empirically demonstrate that student models could contain redundancy, which can be removed through pruning without significant performance degradation. Therefore, we propose a novel pruning method to eliminate redundancy in student models. Instead of using traditional post-training pruning methods, we perform pruning during knowledge distillation (PDD) to prevent any loss of important information from the teacher models to the student models. This is achieved by designing a differentiable mask for each convolutional layer, which can dynamically adjust the channels to be pruned based on the loss. Experimental results show that with ResNet20 as the student model and ResNet56 as the teacher model, a 39.53%-FLOPs reduction was achieved by removing 32.77% of parameters, while the top-1 accuracy on CIFAR10 increased by 0.17%. With VGG11 as the student model and VGG16 as the teacher model, a 74.96%-FLOPs reduction was achieved by removing 76.43% of parameters, with only a loss of 1.34% in the top-1 accuracy on CIFAR10. Our code is available at https://github.com/YihangZhou0424/PDD-Pruning-during-distillation.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"18 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts 基于基本模糊概念的新型多模态生成学习模型

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-08-30 DOI: 10.1007/s12559-024-10336-7

Huankun Sheng, Hongwei Mo, Tengteng Zhang

Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.

多模态模型的设计目的是在单一生成框架内处理不同类型的数据。以往方法的普遍策略是学习不同模态之间共享的联合表征。这些联合表征通常是通过连接特定模态网络的顶层而获得的。最近，在从文本生成图像以及反向生成图像方面取得了重大进展。尽管取得了这些成就，但目前的模型往往忽略了模糊概念的作用，而模糊概念是至关重要的，因为人类的认知过程本身就存在高度的模糊性。因此，识别并纳入模糊概念对于提高多模态认知模型的有效性至关重要。本文提出了一个名为模糊概念学习模型（FCLM）的新框架，用于处理基于模糊概念的模态。FCLM 中不同模态之间的高级抽象由 "模糊概念函数 "表示。经过训练后，FCLM 能够根据属性描述生成图像，并推断输入图像的属性。此外，它还能提出不同抽象程度的模糊概念。我们在 dSprites 和 3D Chairs 数据集上进行了广泛的实验。这些实验的定性和定量结果都证明了拟议框架的有效性和效率。FCLM 将模糊认知机制与环境的统计特征相结合。这个创新的认知启发框架为处理多模态信息提供了一个新的视角。

{"title":"A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts","authors":"Huankun Sheng, Hongwei Mo, Tengteng Zhang","doi":"10.1007/s12559-024-10336-7","DOIUrl":"https://doi.org/10.1007/s12559-024-10336-7","url":null,"abstract":"Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"31 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PrimeNet: A Framework for Commonsense Knowledge Representation and Reasoning Based on Conceptual Primitives PrimeNet：基于概念原型的常识性知识表示和推理框架

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-08-30 DOI: 10.1007/s12559-024-10345-6

Qian Liu, Sooji Han, Erik Cambria, Yang Li, Kenneth Kwok

Commonsense knowledge acquisition and representation is a core topic in artificial intelligence (AI), which is crucial for building more sophisticated and human-like AI systems. However, existing commonsense knowledge bases organize facts in an isolated manner like bag of facts, lacking the cognitive-level connections that humans commonly possess. People have the ability to efficiently organize vast amounts of knowledge by linking or generalizing concepts using a limited set of conceptual primitives that serve as the fundamental building blocks of reasoning. These conceptual primitives are basic, foundational elements of thought that humans use to make sense of the world. By combining and recombining these primitives, people can construct complex ideas, solve problems, and understand new concepts. To emulate this cognitive mechanism, we design a new commonsense knowledge base, termed PrimeNet, organized in a three-layer structure: a small core of conceptual primitives (e.g., FOOD), a bigger set of concepts that connect to such primitives (e.g., fruit), and an even larger layer of entities connecting to the concepts (e.g., banana). First, we collect commonsense knowledge and employ a gradual expansion strategy for knowledge integration. After refinement, PrimeNet contains 6 million edges between 2 million nodes, with 34 different types of relations. Then, we design a new conceptualization method by leveraging a probabilistic taxonomy, to build the concept layer of PrimeNet. Finally, we conduct primitive detection to build the primitive layer, where a lexical substitution task is used to identify related concepts, and large language models are employed to generate a rational primitive to label each concept cluster as well as verify the primitive detection process.

常识性知识的获取和表征是人工智能（AI）的核心课题，对于构建更复杂、更像人类的人工智能系统至关重要。然而，现有的常识知识库以孤立的方式组织事实，就像一袋袋事实，缺乏人类通常拥有的认知层面的联系。人类有能力通过使用一套有限的概念基元（作为推理的基本构件）来连接或概括概念，从而有效地组织大量知识。这些概念基元是人类用来理解世界的基本思维元素。通过组合和重组这些基元，人们可以构建复杂的想法、解决问题并理解新概念。为了模拟这种认知机制，我们设计了一个新的常识知识库，称为 PrimeNet，以三层结构组织：一小部分概念基元核心（如 "食物"）、与这些基元相连的更大概念集（如 "水果"）以及与这些概念相连的更大实体层（如 "香蕉"）。首先，我们收集常识性知识，并采用逐步扩展的策略进行知识整合。经过细化，PrimeNet 包含 200 万个节点之间的 600 万条边，以及 34 种不同类型的关系。然后，我们利用概率分类法设计了一种新的概念化方法，以构建 PrimeNet 的概念层。最后，我们进行基元检测以构建基元层，其中使用词汇替换任务来识别相关概念，并使用大型语言模型生成合理的基元来标记每个概念簇，同时验证基元检测过程。

{"title":"PrimeNet: A Framework for Commonsense Knowledge Representation and Reasoning Based on Conceptual Primitives","authors":"Qian Liu, Sooji Han, Erik Cambria, Yang Li, Kenneth Kwok","doi":"10.1007/s12559-024-10345-6","DOIUrl":"https://doi.org/10.1007/s12559-024-10345-6","url":null,"abstract":"Commonsense knowledge acquisition and representation is a core topic in artificial intelligence (AI), which is crucial for building more sophisticated and human-like AI systems. However, existing commonsense knowledge bases organize facts in an isolated manner like bag of facts, lacking the cognitive-level connections that humans commonly possess. People have the ability to efficiently organize vast amounts of knowledge by linking or generalizing concepts using a limited set of conceptual primitives that serve as the fundamental building blocks of reasoning. These conceptual primitives are basic, foundational elements of thought that humans use to make sense of the world. By combining and recombining these primitives, people can construct complex ideas, solve problems, and understand new concepts. To emulate this cognitive mechanism, we design a new commonsense knowledge base, termed PrimeNet, organized in a three-layer structure: a small core of conceptual primitives (e.g., FOOD), a bigger set of concepts that connect to such primitives (e.g., fruit), and an even larger layer of entities connecting to the concepts (e.g., banana). First, we collect commonsense knowledge and employ a gradual expansion strategy for knowledge integration. After refinement, PrimeNet contains 6 million edges between 2 million nodes, with 34 different types of relations. Then, we design a new conceptualization method by leveraging a probabilistic taxonomy, to build the concept layer of PrimeNet. Finally, we conduct primitive detection to build the primitive layer, where a lexical substitution task is used to identify related concepts, and large language models are employed to generate a rational primitive to label each concept cluster as well as verify the primitive detection process.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"146 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0