基于基本模糊概念的新型多模态生成学习模型

IF 7.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Cognitive Computation Pub Date : 2024-08-30 DOI:10.1007/s12559-024-10336-7

Huankun Sheng, Hongwei Mo, Tengteng Zhang

{"title":"基于基本模糊概念的新型多模态生成学习模型","authors":"Huankun Sheng, Hongwei Mo, Tengteng Zhang","doi":"10.1007/s12559-024-10336-7","DOIUrl":null,"url":null,"abstract":"<p>Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"31 1","pages":""},"PeriodicalIF":7.4000,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts\",\"authors\":\"Huankun Sheng, Hongwei Mo, Tengteng Zhang\",\"doi\":\"10.1007/s12559-024-10336-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.</p>\",\"PeriodicalId\":51243,\"journal\":{\"name\":\"Cognitive Computation\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2024-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s12559-024-10336-7\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s12559-024-10336-7","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

多模态模型的设计目的是在单一生成框架内处理不同类型的数据。以往方法的普遍策略是学习不同模态之间共享的联合表征。这些联合表征通常是通过连接特定模态网络的顶层而获得的。最近，在从文本生成图像以及反向生成图像方面取得了重大进展。尽管取得了这些成就，但目前的模型往往忽略了模糊概念的作用，而模糊概念是至关重要的，因为人类的认知过程本身就存在高度的模糊性。因此，识别并纳入模糊概念对于提高多模态认知模型的有效性至关重要。本文提出了一个名为模糊概念学习模型（FCLM）的新框架，用于处理基于模糊概念的模态。FCLM 中不同模态之间的高级抽象由 "模糊概念函数 "表示。经过训练后，FCLM 能够根据属性描述生成图像，并推断输入图像的属性。此外，它还能提出不同抽象程度的模糊概念。我们在 dSprites 和 3D Chairs 数据集上进行了广泛的实验。这些实验的定性和定量结果都证明了拟议框架的有效性和效率。FCLM 将模糊认知机制与环境的统计特征相结合。这个创新的认知启发框架为处理多模态信息提供了一个新的视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Novel Multimodal Generative Learning Model based on Basic Fuzzy Concepts

Multimodal models are designed to process different types of data within a single generative framework. The prevalent strategy in previous methods involves learning joint representations that are shared across different modalities. These joint representations are typically obtained by concatenating the top of layers of modality-specific networks. Recently, significant advancements have been made in generating images from text and vice versa. Despite these successes, current models often overlook the role of fuzzy concepts, which are crucial given that human cognitive processes inherently involve a high degree of fuzziness. Recognizing and incorporating fuzzy concepts is therefore essential for enhancing the effectiveness of multimodal cognition models. In this paper, a novel framework, named the Fuzzy Concept Learning Model (FCLM), is proposed to process modalities based on fuzzy concepts. The high-level abstractions between different modalities in the FCLM are represented by the ‘fuzzy concept functions.’ After training, the FCLM is capable of generating images from attribute descriptions and inferring the attributes of input images. Additionally, it can formulate fuzzy concepts at various levels of abstraction. Extensive experiments were conducted on the dSprites and 3D Chairs datasets. Both qualitative and quantitative results from these experiments demonstrate the effectiveness and efficiency of the proposed framework. The FCLM integrates the fuzzy cognitive mechanism with the statistical characteristics of the environment. This innovative cognition-inspired framework offers a novel perspective for processing multimodal information.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Cognitive Computation COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-NEUROSCIENCES

CiteScore

9.30

自引率

3.70%

发文量

116

审稿时长

>12 weeks

期刊介绍： Cognitive Computation is an international, peer-reviewed, interdisciplinary journal that publishes cutting-edge articles describing original basic and applied work involving biologically-inspired computational accounts of all aspects of natural and artificial cognitive systems. It provides a new platform for the dissemination of research, current practices and future trends in the emerging discipline of cognitive computation that bridges the gap between life sciences, social sciences, engineering, physical and mathematical sciences, and humanities.