Cognitive Computation最新文献_第9页

Synchronization of Hypercomplex Neural Networks with Mixed Time-Varying Delays 具有混合时变延迟的超复杂神经网络的同步问题

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-03-11 DOI: 10.1007/s12559-024-10253-9

Abstract

This article discusses the fixed-time synchronization (FTS) of hypercomplex neural networks (HCNNs) with mixed time-varying delays. Unlike finite-time synchronization (FNTS) based on initial conditions, the settling time of FTS can be adjusted to meet the needs. The state vector, weight matrices, activation functions, and input vectors of HCNNs are all hypercomplex numbers. The techniques used in complex-valued neural networks (CVNNs) and quaternion-valued neural networks (QVNNs) cannot be used directly with HCNNs because they do not work with eight or more dimensions. To begin with, the decomposition method is used to split the HCNNs into ((n+1)) real-valued neural networks (RVNNs) applying distributive law to handle non-commutativity and non-associativity. A nonlinear controller is constructed to synchronize the master-response systems of the HCNNs. Lyapunov-based method is used to prove the stability of an error system. The FTS of mixed time-varying delayed HCNNs is achieved using a suitable lemma, Lipschitz condition, appropriate Lyapunov functional construction, and designing suitable controllers. Two different algebraic criteria for settling time have been achieved by employing two distinct lemmas. It is demonstrated that the settling time derived from Lemma 1 produces a more precise result than that obtained from Lemma 2. Three numerical examples for CVNNs, QVNNs, and octonions-valued neural networks (OVNNs) are provided to demonstrate the efficacy and effectiveness of the proposed theoretical results.

摘要本文讨论了具有混合时变延迟的超复杂神经网络（HCNN）的固定时间同步（FTS）。与基于初始条件的有限时间同步（FNTS）不同，FTS 的沉淀时间可以根据需要进行调整。HCNN 的状态向量、权重矩阵、激活函数和输入向量都是超复数。复值神经网络（CVNN）和四元值神经网络（QVNN）中使用的技术无法直接用于 HCNN，因为它们无法处理八维或更多维的问题。首先，我们使用分解法将 HCNNs 分解为（(n+1)）实值神经网络 (RVNNs)，并应用分配律来处理非交换性和非连通性。构建了一个非线性控制器来同步 HCNNs 的主响应系统。使用基于 Lyapunov 的方法证明了误差系统的稳定性。利用合适的两点定理、Lipschitz 条件、适当的 Lyapunov 函数构造和设计合适的控制器，实现了混合时变延迟 HCNN 的 FTS。通过使用两个不同的定理，实现了两种不同的沉降时间代数标准。结果表明，从定理 1 得出的沉降时间比从定理 2 得出的沉降时间更精确。本文提供了 CVNN、QVNN 和八元值神经网络 (OVNN) 的三个数值示例，以证明所提理论结果的有效性。

{"title":"Synchronization of Hypercomplex Neural Networks with Mixed Time-Varying Delays","authors":"","doi":"10.1007/s12559-024-10253-9","DOIUrl":"https://doi.org/10.1007/s12559-024-10253-9","url":null,"abstract":"<h3>Abstract</h3> This article discusses the fixed-time synchronization (FTS) of hypercomplex neural networks (HCNNs) with mixed time-varying delays. Unlike finite-time synchronization (FNTS) based on initial conditions, the settling time of FTS can be adjusted to meet the needs. The state vector, weight matrices, activation functions, and input vectors of HCNNs are all hypercomplex numbers. The techniques used in complex-valued neural networks (CVNNs) and quaternion-valued neural networks (QVNNs) cannot be used directly with HCNNs because they do not work with eight or more dimensions. To begin with, the decomposition method is used to split the HCNNs into ((n+1)) real-valued neural networks (RVNNs) applying distributive law to handle non-commutativity and non-associativity. A nonlinear controller is constructed to synchronize the master-response systems of the HCNNs. Lyapunov-based method is used to prove the stability of an error system. The FTS of mixed time-varying delayed HCNNs is achieved using a suitable lemma, Lipschitz condition, appropriate Lyapunov functional construction, and designing suitable controllers. Two different algebraic criteria for settling time have been achieved by employing two distinct lemmas. It is demonstrated that the settling time derived from Lemma 1 produces a more precise result than that obtained from Lemma 2. Three numerical examples for CVNNs, QVNNs, and octonions-valued neural networks (OVNNs) are provided to demonstrate the efficacy and effectiveness of the proposed theoretical results.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"11 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140097932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ArQuAD: An Expert-Annotated Arabic Machine Reading Comprehension Dataset ArQuAD：专家注释的阿拉伯语机器阅读理解数据集

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-03-11 DOI: 10.1007/s12559-024-10248-6

Rasha Obeidat, Marwa Al-Harbi, Mahmoud Al-Ayyoub, Luay Alawneh

Machine Reading Comprehension (MRC) is a task that enables machines to mirror key cognitive processes involving reading, comprehending a text passage, and answering questions about it. There has been significant progress in this task for English in recent years, where recent systems not only surpassed human-level performance but also demonstrated advancements in emulating complex human cognitive processes. However, the development of Arabic MRC has not kept pace due to language challenges and the lack of large-scale, high-quality datasets. Existing datasets are either small, low quality or released as a part of large multilingual corpora. We present the Arabic Question Answering Dataset (ArQuaD), a large MRC dataset for the Arabic language. The dataset comprises 16,020 questions posed by language experts on passages extracted from Arabic Wikipedia articles, where the answer to each question is a text segment from the corresponding reading passage. Besides providing various dataset analyses, we fine-tuned several pre-trained language models to obtain benchmark results. Among the compared methods, AraBERTv0.2-large achieved the best performance with an exact match of 68.95% and an F1-score of 87.15%. However, the significantly higher performance observed in human evaluations (exact match of 86% and F1-score of 95.5%) suggests a significant margin of possible improvement in future research. We release the dataset publicly at https://github.com/RashaMObeidat/ArQuAD to encourage further development of language-aware MRC models for the Arabic language.

机器阅读理解（MRC）是一项能让机器模拟关键认知过程的任务，包括阅读、理解文本段落和回答相关问题。近年来，这项任务在英语方面取得了重大进展，最近的系统不仅超越了人类水平，而且在模拟复杂的人类认知过程方面也取得了进步。然而，由于语言方面的挑战和缺乏大规模、高质量的数据集，阿拉伯语 MRC 的发展未能跟上步伐。现有的数据集要么规模小、质量低，要么作为大型多语言语料库的一部分发布。我们推出的阿拉伯语问题解答数据集（ArQuaD）是一个阿拉伯语的大型 MRC 数据集。该数据集由语言专家针对从阿拉伯语维基百科文章中提取的段落提出的 16,020 个问题组成，每个问题的答案都是相应阅读段落中的一个文本片段。除了提供各种数据集分析外，我们还对多个预训练语言模型进行了微调，以获得基准结果。在比较的方法中，AraBERTv0.2-large 的性能最好，精确匹配率为 68.95%，F1 分数为 87.15%。然而，在人类评估中观察到的更高的性能（精确匹配率为 86%，F1 分数为 95.5%）表明，在未来的研究中还有很大的改进余地。我们在 https://github.com/RashaMObeidat/ArQuAD 上公开发布了该数据集，以鼓励进一步开发适用于阿拉伯语的语言感知 MRC 模型。

{"title":"ArQuAD: An Expert-Annotated Arabic Machine Reading Comprehension Dataset","authors":"Rasha Obeidat, Marwa Al-Harbi, Mahmoud Al-Ayyoub, Luay Alawneh","doi":"10.1007/s12559-024-10248-6","DOIUrl":"https://doi.org/10.1007/s12559-024-10248-6","url":null,"abstract":"Machine Reading Comprehension (MRC) is a task that enables machines to mirror key cognitive processes involving reading, comprehending a text passage, and answering questions about it. There has been significant progress in this task for English in recent years, where recent systems not only surpassed human-level performance but also demonstrated advancements in emulating complex human cognitive processes. However, the development of Arabic MRC has not kept pace due to language challenges and the lack of large-scale, high-quality datasets. Existing datasets are either small, low quality or released as a part of large multilingual corpora. We present the Arabic Question Answering Dataset (ArQuaD), a large MRC dataset for the Arabic language. The dataset comprises 16,020 questions posed by language experts on passages extracted from Arabic Wikipedia articles, where the answer to each question is a text segment from the corresponding reading passage. Besides providing various dataset analyses, we fine-tuned several pre-trained language models to obtain benchmark results. Among the compared methods, AraBERTv0.2-large achieved the best performance with an exact match of 68.95% and an F1-score of 87.15%. However, the significantly higher performance observed in human evaluations (exact match of 86% and F1-score of 95.5%) suggests a significant margin of possible improvement in future research. We release the dataset publicly at https://github.com/RashaMObeidat/ArQuAD to encourage further development of language-aware MRC models for the Arabic language.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"42 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140097509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Two-layer Ensemble of Deep Learning Models for Medical Image Segmentation 用于医学图像分割的双层深度学习模型集合

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-31 DOI: 10.1007/s12559-024-10257-5

Truong Dang, Tien Thanh Nguyen, John McCall, Eyad Elyan, Carlos Francisco Moreno-García

One of the most important areas in medical image analysis is segmentation, in which raw image data is partitioned into structured and meaningful regions to gain further insights. By using Deep Neural Networks (DNN), AI-based automated segmentation algorithms can potentially assist physicians with more effective imaging-based diagnoses. However, since it is difficult to acquire high-quality ground truths for medical images and DNN hyperparameters require significant manual tuning, the results by DNN-based medical models might be limited. A potential solution is to combine multiple DNN models using ensemble learning. We propose a two-layer ensemble of deep learning models in which the prediction of each training image pixel made by each model in the first layer is used as the augmented data of the training image for the second layer of the ensemble. The prediction of the second layer is then combined by using a weight-based scheme which is found by solving linear regression problems. To the best of our knowledge, our paper is the first work which proposes a two-layer ensemble of deep learning models with an augmented data technique in medical image segmentation. Experiments conducted on five different medical image datasets for diverse segmentation tasks show that proposed method achieves better results in terms of several performance metrics compared to some well-known benchmark algorithms. Our proposed two-layer ensemble of deep learning models for segmentation of medical images shows effectiveness compared to several benchmark algorithms. The research can be expanded in several directions like image classification.

医学图像分析中最重要的领域之一是分割，即把原始图像数据分割成结构化和有意义的区域，以获得进一步的洞察力。通过使用深度神经网络（DNN），基于人工智能的自动分割算法有可能帮助医生进行更有效的影像诊断。然而，由于很难获得高质量的医学图像地面真相，而且 DNN 的超参数需要大量的手动调整，因此基于 DNN 的医学模型的结果可能会受到限制。一个潜在的解决方案是利用集合学习将多个 DNN 模型结合起来。我们提出了一种双层深度学习模型集合，其中第一层中每个模型对每个训练图像像素的预测都被用作集合第二层训练图像的增强数据。第二层的预测结果通过基于权重的方案进行组合，该方案是通过求解线性回归问题得出的。据我们所知，我们的论文是第一篇在医学影像分割中使用增强数据技术提出双层深度学习模型集合的论文。在五个不同的医学图像数据集上针对不同的分割任务进行的实验表明，与一些著名的基准算法相比，我们提出的方法在多个性能指标上都取得了更好的结果。与几种基准算法相比，我们提出的用于医学图像分割的双层深度学习模型集合显示出了有效性。这项研究可以向多个方向扩展，如图像分类。

{"title":"Two-layer Ensemble of Deep Learning Models for Medical Image Segmentation","authors":"Truong Dang, Tien Thanh Nguyen, John McCall, Eyad Elyan, Carlos Francisco Moreno-García","doi":"10.1007/s12559-024-10257-5","DOIUrl":"https://doi.org/10.1007/s12559-024-10257-5","url":null,"abstract":" One of the most important areas in medical image analysis is segmentation, in which raw image data is partitioned into structured and meaningful regions to gain further insights. By using Deep Neural Networks (DNN), AI-based automated segmentation algorithms can potentially assist physicians with more effective imaging-based diagnoses. However, since it is difficult to acquire high-quality ground truths for medical images and DNN hyperparameters require significant manual tuning, the results by DNN-based medical models might be limited. A potential solution is to combine multiple DNN models using ensemble learning. We propose a two-layer ensemble of deep learning models in which the prediction of each training image pixel made by each model in the first layer is used as the augmented data of the training image for the second layer of the ensemble. The prediction of the second layer is then combined by using a weight-based scheme which is found by solving linear regression problems. To the best of our knowledge, our paper is the first work which proposes a two-layer ensemble of deep learning models with an augmented data technique in medical image segmentation. Experiments conducted on five different medical image datasets for diverse segmentation tasks show that proposed method achieves better results in terms of several performance metrics compared to some well-known benchmark algorithms. Our proposed two-layer ensemble of deep learning models for segmentation of medical images shows effectiveness compared to several benchmark algorithms. The research can be expanded in several directions like image classification.\u0000","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"2013 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140889421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Keys Attention Network for Image Captioning 用于图像字幕的多关键注意网络

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-24 DOI: 10.1007/s12559-023-10231-7

Ziqian Yang, Hui Li, Renrong Ouyang, Quan Zhang, Jimin Xiao

The image captioning task aims to generate descriptions from the main content of images. Recently, the Transformer with a self-attention mechanism has been widely used for the image captioning task, where the attention mechanism helps the encoder to generate image region features, and guides caption output in the decoder. However, the vanilla decoder uses a simple conventional self-attention mechanism, resulting in captions with poor semantic information and incomplete sentence logic. In this paper, we propose a novel attention block, Multi-Keys attention block, that fully enhances the relevance between explicit and implicit semantic information. Technically, the Multi-Keys attention block first concatenates the key vector and the value vector and spreads it into both the explicit channel and the implicit channel. Then, the “related value” is generated with more semantic information by applying the element-wise multiplication to them. Moreover, to perfect the sentence logic, the reverse key vector with another information flow is residually connected to the final attention result. We also apply the Multi-Keys attention block into the sentence decoder in the transformer named as Multi-Keys Transformer (MKTrans). The experiments demonstrate that our MKTrans achieves 138.6% CIDEr score on MS COCO “Karpathy” offline test split. The proposed Multi-Keys attention block and MKTrans model are proven to be more effective and superior than the state-of-the-art methods.

图像标题任务旨在根据图像的主要内容生成描述。最近，带有自注意机制的变换器被广泛应用于图像字幕任务，其中的注意机制可以帮助编码器生成图像区域特征，并在解码器中指导字幕输出。然而，普通解码器使用的是简单的传统自注意机制，导致字幕语义信息贫乏，句子逻辑不完整。在本文中，我们提出了一种新颖的注意力区块--多键注意力区块，它能充分增强显性和隐性语义信息之间的相关性。从技术上讲，多键关注块首先将键向量和值向量连接起来，并将其传播到显式通道和隐式通道中。然后，通过对它们进行元素乘法运算，生成包含更多语义信息的 "相关值"。此外，为了完善句子逻辑，带有另一个信息流的反向密钥向量与最终关注结果保持连接。我们还在句子解码器的转换器中应用了多密钥注意模块，该转换器被命名为多密钥转换器（MKTrans）。实验证明，我们的 MKTrans 在 MS COCO "Karpathy "离线测试分词上取得了 138.6% 的 CIDEr 分数。事实证明，所提出的多密钥关注块和 MKTrans 模型比最先进的方法更有效、更优越。

{"title":"Multi-Keys Attention Network for Image Captioning","authors":"Ziqian Yang, Hui Li, Renrong Ouyang, Quan Zhang, Jimin Xiao","doi":"10.1007/s12559-023-10231-7","DOIUrl":"https://doi.org/10.1007/s12559-023-10231-7","url":null,"abstract":"The image captioning task aims to generate descriptions from the main content of images. Recently, the Transformer with a self-attention mechanism has been widely used for the image captioning task, where the attention mechanism helps the encoder to generate image region features, and guides caption output in the decoder. However, the vanilla decoder uses a simple conventional self-attention mechanism, resulting in captions with poor semantic information and incomplete sentence logic. In this paper, we propose a novel attention block, Multi-Keys attention block, that fully enhances the relevance between explicit and implicit semantic information. Technically, the Multi-Keys attention block first concatenates the key vector and the value vector and spreads it into both the explicit channel and the implicit channel. Then, the “related value” is generated with more semantic information by applying the element-wise multiplication to them. Moreover, to perfect the sentence logic, the reverse key vector with another information flow is residually connected to the final attention result. We also apply the Multi-Keys attention block into the sentence decoder in the transformer named as Multi-Keys Transformer (MKTrans). The experiments demonstrate that our MKTrans achieves 138.6% CIDEr score on MS COCO “Karpathy” offline test split. The proposed Multi-Keys attention block and MKTrans model are proven to be more effective and superior than the state-of-the-art methods.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"9 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MFCTrans: Multi-scale Feature Connection Transformer for Deformable Medical Image Registration MFCTrans：用于可变形医学图像配准的多尺度特征连接变换器

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-24 DOI: 10.1007/s12559-023-10239-z

Longji Wang, Zhiyue Yan, Wenming Cao, Jianhua Ji

Deformable Medical Image Registration (DMIR) aims to establish precise anatomical alignment of multiple medical images. However, the existing U-shape networks encounter difficulties in efficiently transferring multi-scale feature information from the encoder to the decoder. To address this issue, we propose a novel backbone network called MFCTrans, which constructs effective feature connection in DMIR. Drawing inspiration from the attention mechanism observed in the human cognitive system, our proposed method employs a Feature Fusion and Assignment Transformer (FFAT) module and a Spatial Cross Attention Fusion (SCAF) module. The former facilitates the fusion of multi-channel features, while the latter guides the integration of multi-scale information. A Multiple Residual (MR) branch is also deployed between the encoder and FFAT to improve the network’s generalization. We conduct extensive qualitative and quantitative evaluations on the OASIS and LPBA40 datasets. The proposed method achieves higher Dice scores than Transmorph by 1.3% and 2.0% on the respective datasets while maintaining a comparable voxel folding percentage. Ablation studies analyze the impacts and efficiency of each component in the proposed method. In summary, our proposed network offers a promising framework for achieving high-quality medical image registration and holds significant potential for applications in computer vision and cognitive computation.

可变形医学图像配准（DMIR）旨在为多幅医学图像建立精确的解剖配准。然而，现有的 U 型网络在将多尺度特征信息从编码器有效传输到解码器时遇到了困难。为了解决这个问题，我们提出了一种名为 MFCTrans 的新型骨干网络，它能在 DMIR 中构建有效的特征连接。从人类认知系统中观察到的注意力机制中汲取灵感，我们提出的方法采用了特征融合与分配转换器（FFAT）模块和空间交叉注意力融合（SCAF）模块。前者有助于多通道特征的融合，后者则指导多尺度信息的整合。在编码器和 FFAT 之间还部署了多重残差（MR）分支，以提高网络的泛化能力。我们在 OASIS 和 LPBA40 数据集上进行了广泛的定性和定量评估。在相应的数据集上，所提出的方法比 Transmorph 的 Dice 分数分别高出 1.3% 和 2.0%，同时保持了相当的体素折叠率。消融研究分析了拟议方法中每个组件的影响和效率。总之，我们提出的网络为实现高质量的医学图像配准提供了一个前景广阔的框架，在计算机视觉和认知计算领域具有巨大的应用潜力。

{"title":"MFCTrans: Multi-scale Feature Connection Transformer for Deformable Medical Image Registration","authors":"Longji Wang, Zhiyue Yan, Wenming Cao, Jianhua Ji","doi":"10.1007/s12559-023-10239-z","DOIUrl":"https://doi.org/10.1007/s12559-023-10239-z","url":null,"abstract":"Deformable Medical Image Registration (DMIR) aims to establish precise anatomical alignment of multiple medical images. However, the existing U-shape networks encounter difficulties in efficiently transferring multi-scale feature information from the encoder to the decoder. To address this issue, we propose a novel backbone network called MFCTrans, which constructs effective feature connection in DMIR. Drawing inspiration from the attention mechanism observed in the human cognitive system, our proposed method employs a Feature Fusion and Assignment Transformer (FFAT) module and a Spatial Cross Attention Fusion (SCAF) module. The former facilitates the fusion of multi-channel features, while the latter guides the integration of multi-scale information. A Multiple Residual (MR) branch is also deployed between the encoder and FFAT to improve the network’s generalization. We conduct extensive qualitative and quantitative evaluations on the OASIS and LPBA40 datasets. The proposed method achieves higher Dice scores than Transmorph by 1.3% and 2.0% on the respective datasets while maintaining a comparable voxel folding percentage. Ablation studies analyze the impacts and efficiency of each component in the proposed method. In summary, our proposed network offers a promising framework for achieving high-quality medical image registration and holds significant potential for applications in computer vision and cognitive computation.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"88 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Transforming Conversations with AI—A Comprehensive Study of ChatGPT 用人工智能改变对话--对 ChatGPT 的全面研究

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-24 DOI: 10.1007/s12559-023-10236-2

Gaurang Bansal, Vinay Chamola, Amir Hussain, Mohsen Guizani, Dusit Niyato

The field of cognitive computing, conversational AI has witnessed remarkable progress, largely driven by the development of the Generative Pre-trained Transformer (GPT) series, notably ChatGPT. These transformer-based models have revolutionized natural language understanding by effectively capturing context and long-range dependencies. In light of this, this paper conducts a comprehensive exploration of ChatGPT, encompassing its architectural design, training methodology, real-world applications, and future potential within the conversational AI landscape. The paper studies the ChatGPT ability for advanced control and responsiveness, exhibiting a superior capacity for comprehending language and generating precise, informative responses. The comprehensive survey depicts ChatGPT excels in sustaining context and engaging in multi-turn dialogues, thereby fostering more interactive and meaningful conversations. Furthermore, its adaptability for integration into various systems and scalability has broadened its applicability across diverse domains, including customer service, education, content generation, healthcare, gaming, research, and exploration. Additionally, the paper presents alternative conversational AI models, such as Amazon Codewhisperer, Google Bard (LaMDA), Microsoft Bing AI, DeepMind Sparrow, and Character AI, providing a comparative analysis that underscores ChatGPT’s advantages in terms of inference capabilities and future promise. Recognizing the evolution and profound impact of ChatGPT holds paramount significance for researchers and developers at the forefront of AI innovation. In a rapidly evolving conversational AI landscape, ChatGPT emerges as a pivotal player, capable of reshaping the way we interact with AI systems across a wide array of applications.

认知计算、会话人工智能领域取得了显著的进展，这主要得益于生成预训练变换器（GPT）系列的发展，特别是 ChatGPT 的发展。这些基于变换器的模型通过有效捕捉上下文和长距离依赖关系，彻底改变了自然语言理解。有鉴于此，本文对 ChatGPT 进行了全面探讨，包括其架构设计、训练方法、实际应用以及在对话式人工智能领域的未来潜力。本文研究了 ChatGPT 的高级控制能力和响应能力，展示了其卓越的语言理解能力，以及生成精确、翔实的响应的能力。综合调查显示，ChatGPT 在保持语境和参与多轮对话方面表现出色，从而促进了更多互动和更有意义的对话。此外，它在集成到各种系统方面的适应性和可扩展性也拓宽了它在不同领域的适用性，包括客户服务、教育、内容生成、医疗保健、游戏、研究和探索。此外，本文还介绍了亚马逊 Codewhisperer、谷歌 Bard (LaMDA)、微软 Bing AI、DeepMind Sparrow 和 Character AI 等其他对话式人工智能模型，通过对比分析，强调了 ChatGPT 在推理能力和未来前景方面的优势。认识到 ChatGPT 的演变和深远影响，对于处于人工智能创新前沿的研究人员和开发人员来说意义重大。在快速发展的对话式人工智能领域，ChatGPT 扮演着举足轻重的角色，它能够重塑我们在各种应用中与人工智能系统交互的方式。

{"title":"Transforming Conversations with AI—A Comprehensive Study of ChatGPT","authors":"Gaurang Bansal, Vinay Chamola, Amir Hussain, Mohsen Guizani, Dusit Niyato","doi":"10.1007/s12559-023-10236-2","DOIUrl":"https://doi.org/10.1007/s12559-023-10236-2","url":null,"abstract":"The field of cognitive computing, conversational AI has witnessed remarkable progress, largely driven by the development of the Generative Pre-trained Transformer (GPT) series, notably ChatGPT. These transformer-based models have revolutionized natural language understanding by effectively capturing context and long-range dependencies. In light of this, this paper conducts a comprehensive exploration of ChatGPT, encompassing its architectural design, training methodology, real-world applications, and future potential within the conversational AI landscape. The paper studies the ChatGPT ability for advanced control and responsiveness, exhibiting a superior capacity for comprehending language and generating precise, informative responses. The comprehensive survey depicts ChatGPT excels in sustaining context and engaging in multi-turn dialogues, thereby fostering more interactive and meaningful conversations. Furthermore, its adaptability for integration into various systems and scalability has broadened its applicability across diverse domains, including customer service, education, content generation, healthcare, gaming, research, and exploration. Additionally, the paper presents alternative conversational AI models, such as Amazon Codewhisperer, Google Bard (LaMDA), Microsoft Bing AI, DeepMind Sparrow, and Character AI, providing a comparative analysis that underscores ChatGPT’s advantages in terms of inference capabilities and future promise. Recognizing the evolution and profound impact of ChatGPT holds paramount significance for researchers and developers at the forefront of AI innovation. In a rapidly evolving conversational AI landscape, ChatGPT emerges as a pivotal player, capable of reshaping the way we interact with AI systems across a wide array of applications.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"37 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139553354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings 大量向量：词嵌入的内在评估工具

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-22 DOI: 10.1007/s12559-023-10235-3

Roberto Ascari, Anna Giabelli, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica

The utilization of word embeddings—powerful models computed through Neural Network architectures that encode words as vectors—has witnessed rapid growth across various Natural Language Processing applications, encompassing semantic analysis, information retrieval, dependency parsing, question answering, and machine translation. The efficacy of these tasks is strictly linked to the quality of the embeddings, underscoring the critical importance of evaluating and selecting optimal embedding models. While established procedures and benchmarks exist for intrinsic evaluation, the authors note a conspicuous absence of comprehensive evaluations of intrinsic embedding quality across multiple tasks. This paper introduces vec2best, a unified tool encompassing state-of-the-art intrinsic evaluation tasks across diverse benchmarks. vec2best furnishes the user with an extensive evaluation of word embedding models. It represents a framework for evaluating word embeddings trained using various methods and hyper-parameters on a range of tasks from the literature. The tool yields a holistic evaluation metric for each model called the PCE (Principal Component Evaluation). We conducted evaluations on 135 word embedding models, trained using GloVe, fastText, and word2vec, across four tasks integrated into vec2best (similarity, analogy, categorization, and outlier detection), along with their respective benchmarks. Additionally, we leveraged vec2best to optimize embedding hyper-parameter configurations in a real-world scenario. vec2best is conveniently accessible as a pip-installable Python package.

词嵌入--通过神经网络架构计算出的强大模型，可将单词编码为向量--在各种自然语言处理应用中得到了快速发展，包括语义分析、信息检索、依赖关系解析、问题解答和机器翻译。这些任务的效率与嵌入的质量密切相关，因此评估和选择最佳嵌入模型至关重要。虽然已经有了内在评估的既定程序和基准，但作者注意到明显缺乏对多个任务的内在嵌入质量的全面评估。本文介绍的 vec2best 是一个统一的工具，涵盖了不同基准下最先进的内在评估任务。它是一个框架，用于在一系列文献任务中评估使用各种方法和超参数训练的词嵌入。该工具为每个模型提供一个整体评估指标，称为 PCE（主成分评估）。我们对使用 GloVe、fastText 和 word2vec 训练的 135 个词嵌入模型进行了评估，评估涉及集成到 vec2best 中的四个任务（相似性、类比、分类和离群点检测）及其各自的基准。此外，我们还利用 vec2best 优化了实际场景中的嵌入超参数配置。

{"title":"A Fistful of Vectors: A Tool for Intrinsic Evaluation of Word Embeddings","authors":"Roberto Ascari, Anna Giabelli, Lorenzo Malandri, Fabio Mercorio, Mario Mezzanzanica","doi":"10.1007/s12559-023-10235-3","DOIUrl":"https://doi.org/10.1007/s12559-023-10235-3","url":null,"abstract":"The utilization of word embeddings—powerful models computed through Neural Network architectures that encode words as vectors—has witnessed rapid growth across various Natural Language Processing applications, encompassing semantic analysis, information retrieval, dependency parsing, question answering, and machine translation. The efficacy of these tasks is strictly linked to the quality of the embeddings, underscoring the critical importance of evaluating and selecting optimal embedding models. While established procedures and benchmarks exist for intrinsic evaluation, the authors note a conspicuous absence of comprehensive evaluations of intrinsic embedding quality across multiple tasks. This paper introduces vec2best, a unified tool encompassing state-of-the-art intrinsic evaluation tasks across diverse benchmarks. vec2best furnishes the user with an extensive evaluation of word embedding models. It represents a framework for evaluating word embeddings trained using various methods and hyper-parameters on a range of tasks from the literature. The tool yields a holistic evaluation metric for each model called the PCE (Principal Component Evaluation). We conducted evaluations on 135 word embedding models, trained using GloVe, fastText, and word2vec, across four tasks integrated into vec2best (similarity, analogy, categorization, and outlier detection), along with their respective benchmarks. Additionally, we leveraged vec2best to optimize embedding hyper-parameter configurations in a real-world scenario. vec2best is conveniently accessible as a pip-installable Python package.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"256 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139516317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Cognitive Medical Decision Support System for IoT-Based Human-Computer Interface in Pervasive Computing Environment 普适计算环境中基于物联网的人机接口认知医疗决策支持系统

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-22 DOI: 10.1007/s12559-023-10242-4

Haosong Gou, Gaoyi Zhang, Elias Paulino Medeiros, Senthil Kumar Jagatheesaperumal, Victor Hugo C. de Albuquerque

In today’s advanced applications, such as memory interfaces, feature-based detection, and sensory games, human-computer interaction (HCI) plays a pivotal role. A medical decision support system (MDSS) emerges from the integration of a data system with resources for medical decision-making. Within MDSS, human-computer interaction and perceptual medical decision-making stand out as two highly valuable technologies. Systems enabled by the Internet of Things (IoT), which leverage decentralized, diverse communication and networking technology to cater to a wide range of end-users, are referred to as pervasive computing. A challenging aspect of pervasive computing is ensuring transparency in interaction, managing administration levels, and accommodating varying tolerance levels for widely dispersed users. This paper presents a uniquely flexible MDSS framework designed to enhance end-user confidence in the availability of MDSS through ubiquitous IoT devices within the context of HCI. This architecture utilizes recurring training to assess resource allocation based on demand and collaborative characteristics. Projected resource requirements enable pervasive computing to better serve end-users by reducing latency and increasing communication speeds for MDSS in HCI. The primary goal of this framework is to simplify the management of terminal transitions by facilitating the allocation and utilization of resources for data transfer from peripheral technology. Experimental analysis is employed to estimate the framework’s performance, utilizing various metrics to demonstrate its consistency. These metrics encompass responsiveness, transaction success rates, processed demands, application caseloads, capacity utilization, and memory usage. The uniquely flexible and distributed computing framework optimizes request handling, network accuracy, and memory utilization, resulting in reduced transaction failures and lower latency, ultimately leading to shorter response times. The proposed UFDSS maintains a transaction failure rate below 25% with increasing requests and achieves 100 MHz bandwidth utilization, surpassing other techniques capped at 80 MHz. UFDSS exhibits a lower average latency of around 30 ms for a range of energy data inputs. This uniquely flexible MDSS framework showcases its potential to enhance MDSS availability through IoT devices within HCI contexts. By optimizing resource allocation and utilization, it successfully reduces latency, improves communication speeds, and ultimately leads to shorter response times, contributing to more efficient and reliable medical decision support. Further, integrating generative AI into MDSS for IoT-based HCI could also enhance data-driven decision support.

在记忆界面、基于特征的检测和感官游戏等当今先进的应用中，人机交互（HCI）发挥着举足轻重的作用。医疗决策支持系统（MDSS）是将数据系统与医疗决策资源整合后产生的。在 MDSS 中，人机交互和感知医疗决策是两项极具价值的技术。由物联网（IoT）支持的系统利用分散、多样化的通信和网络技术来满足广大终端用户的需求，被称为普适计算。普适计算具有挑战性的一个方面是确保交互的透明度、管理水平，以及适应广泛分散的用户的不同容忍度。本文介绍了一种独特灵活的 MDSS 框架，旨在通过人机交互环境下的泛在物联网设备增强最终用户对 MDSS 可用性的信心。该架构利用循环培训来评估基于需求和协作特征的资源分配。预测的资源需求可减少延迟并提高人机交互中 MDSS 的通信速度，从而使普适计算能够更好地为终端用户服务。该框架的主要目标是通过促进外围技术数据传输资源的分配和利用，简化终端转换管理。实验分析用于估算该框架的性能，利用各种指标来证明其一致性。这些指标包括响应速度、交易成功率、处理需求、应用工作量、容量利用率和内存使用率。独特灵活的分布式计算框架可优化请求处理、网络准确性和内存利用率，从而减少事务失败和降低延迟，最终缩短响应时间。随着请求的增加，拟议的 UFDSS 将事务失败率保持在 25% 以下，并实现了 100 MHz 的带宽利用率，超过了上限为 80 MHz 的其他技术。对于一系列能源数据输入，UFDSS 的平均延迟时间更低，约为 30 毫秒。这种独特灵活的 MDSS 框架展示了通过人机交互环境中的物联网设备提高 MDSS 可用性的潜力。通过优化资源分配和利用，它成功地降低了延迟，提高了通信速度，并最终缩短了响应时间，有助于提供更高效、更可靠的医疗决策支持。此外，将生成式人工智能集成到基于物联网的人机交互 MDSS 中，还能增强数据驱动的决策支持。

{"title":"A Cognitive Medical Decision Support System for IoT-Based Human-Computer Interface in Pervasive Computing Environment","authors":"Haosong Gou, Gaoyi Zhang, Elias Paulino Medeiros, Senthil Kumar Jagatheesaperumal, Victor Hugo C. de Albuquerque","doi":"10.1007/s12559-023-10242-4","DOIUrl":"https://doi.org/10.1007/s12559-023-10242-4","url":null,"abstract":"In today’s advanced applications, such as memory interfaces, feature-based detection, and sensory games, human-computer interaction (HCI) plays a pivotal role. A medical decision support system (MDSS) emerges from the integration of a data system with resources for medical decision-making. Within MDSS, human-computer interaction and perceptual medical decision-making stand out as two highly valuable technologies. Systems enabled by the Internet of Things (IoT), which leverage decentralized, diverse communication and networking technology to cater to a wide range of end-users, are referred to as pervasive computing. A challenging aspect of pervasive computing is ensuring transparency in interaction, managing administration levels, and accommodating varying tolerance levels for widely dispersed users. This paper presents a uniquely flexible MDSS framework designed to enhance end-user confidence in the availability of MDSS through ubiquitous IoT devices within the context of HCI. This architecture utilizes recurring training to assess resource allocation based on demand and collaborative characteristics. Projected resource requirements enable pervasive computing to better serve end-users by reducing latency and increasing communication speeds for MDSS in HCI. The primary goal of this framework is to simplify the management of terminal transitions by facilitating the allocation and utilization of resources for data transfer from peripheral technology. Experimental analysis is employed to estimate the framework’s performance, utilizing various metrics to demonstrate its consistency. These metrics encompass responsiveness, transaction success rates, processed demands, application caseloads, capacity utilization, and memory usage. The uniquely flexible and distributed computing framework optimizes request handling, network accuracy, and memory utilization, resulting in reduced transaction failures and lower latency, ultimately leading to shorter response times. The proposed UFDSS maintains a transaction failure rate below 25% with increasing requests and achieves 100 MHz bandwidth utilization, surpassing other techniques capped at 80 MHz. UFDSS exhibits a lower average latency of around 30 ms for a range of energy data inputs. This uniquely flexible MDSS framework showcases its potential to enhance MDSS availability through IoT devices within HCI contexts. By optimizing resource allocation and utilization, it successfully reduces latency, improves communication speeds, and ultimately leads to shorter response times, contributing to more efficient and reliable medical decision support. Further, integrating generative AI into MDSS for IoT-based HCI could also enhance data-driven decision support.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"52 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139516028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature Analysis Network: An Interpretable Idea in Deep Learning 特征分析网络：深度学习中的可解读理念

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-20 DOI: 10.1007/s12559-023-10238-0

Xinyu Li, Xiaoguang Gao, Qianglong Wang, Chenfeng Wang, Bo Li, Kaifang Wan

Deep Learning (DL) stands out as a leading model for processing high-dimensional data, where the nonlinear transformation of hidden layers effectively extracts features. However, these unexplainable features make DL a low interpretability model. Conversely, Bayesian network (BN) is transparent and highly interpretable, and it can be helpful for interpreting DL. To improve the interpretability of DL from the perspective of feature cognition, we propose the feature analysis network (FAN), a DL structure fused with BN. FAN retains the DL feature extraction capability and applies BN as the output layer to learn the relationships between the features and the outputs. These relationships can be probabilistically represented by the structure and parameters of the BN, intuitively. In a further study, a correlation clustering-based feature analysis network (cc-FAN) is proposed to detect the correlations among inputs and to preserve this information to explain the features’ physical meaning to a certain extent. To quantitatively evaluate the interpretability of the model, we design the network simplification and interpretability indicators separately. Experiments on eight datasets show that FAN has better interpretability than that of the other models with basically unchanged model accuracy and similar model complexities. On the radar effect mechanism dataset, from the feature structure-based relevance interpretability indicator, FAN is up to 4.8 times better than that of the other models, and cc-FAN is up to 21.5 times better than that of the other models. FAN and cc-FAN enhance the interpretability of the DL model structure from the aspects of features; moreover, based on the input correlations, cc-FAN can help us to better understand the physical meaning of features.

深度学习（DL）是处理高维数据的领先模型，其隐藏层的非线性变换可有效提取特征。然而，这些无法解释的特征使得 DL 成为一种可解释性较低的模型。相反，贝叶斯网络（BN）透明度高，可解释性强，有助于解释 DL。为了从特征认知的角度提高 DL 的可解释性，我们提出了与贝叶斯网络融合的 DL 结构--特征分析网络（FAN）。FAN 保留了 DL 的特征提取能力，并将 BN 用作输出层，以学习特征与输出之间的关系。这些关系可以通过 BN 的结构和参数直观地用概率表示出来。在进一步的研究中，提出了基于相关聚类的特征分析网络（cc-FAN）来检测输入之间的相关性，并保留这些信息以在一定程度上解释特征的物理意义。为了定量评估模型的可解释性，我们分别设计了网络简化指标和可解释性指标。在八个数据集上的实验表明，在模型精度基本不变、模型复杂度相近的情况下，FAN 比其他模型具有更好的可解释性。在雷达效应机制数据集上，从基于特征结构的相关性可解释性指标来看，FAN 是其他模型的 4.8 倍，cc-FAN 是其他模型的 21.5 倍。FAN和cc-FAN从特征方面提高了DL模型结构的可解释性；此外，基于输入相关性，cc-FAN可以帮助我们更好地理解特征的物理意义。

{"title":"Feature Analysis Network: An Interpretable Idea in Deep Learning","authors":"Xinyu Li, Xiaoguang Gao, Qianglong Wang, Chenfeng Wang, Bo Li, Kaifang Wan","doi":"10.1007/s12559-023-10238-0","DOIUrl":"https://doi.org/10.1007/s12559-023-10238-0","url":null,"abstract":"Deep Learning (DL) stands out as a leading model for processing high-dimensional data, where the nonlinear transformation of hidden layers effectively extracts features. However, these unexplainable features make DL a low interpretability model. Conversely, Bayesian network (BN) is transparent and highly interpretable, and it can be helpful for interpreting DL. To improve the interpretability of DL from the perspective of feature cognition, we propose the feature analysis network (FAN), a DL structure fused with BN. FAN retains the DL feature extraction capability and applies BN as the output layer to learn the relationships between the features and the outputs. These relationships can be probabilistically represented by the structure and parameters of the BN, intuitively. In a further study, a correlation clustering-based feature analysis network (cc-FAN) is proposed to detect the correlations among inputs and to preserve this information to explain the features’ physical meaning to a certain extent. To quantitatively evaluate the interpretability of the model, we design the network simplification and interpretability indicators separately. Experiments on eight datasets show that FAN has better interpretability than that of the other models with basically unchanged model accuracy and similar model complexities. On the radar effect mechanism dataset, from the feature structure-based relevance interpretability indicator, FAN is up to 4.8 times better than that of the other models, and cc-FAN is up to 21.5 times better than that of the other models. FAN and cc-FAN enhance the interpretability of the DL model structure from the aspects of features; moreover, based on the input correlations, cc-FAN can help us to better understand the physical meaning of features.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"14 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139509799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pupil Size Variations Reveal Information About Hierarchical Decision-Making Processes 瞳孔大小的变化揭示了分层决策过程的信息

IF 5.4 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Computation

Pub Date : 2024-01-19 DOI: 10.1007/s12559-024-10246-8

Leyla Yahyaie, Reza Ebrahimpour, Abbas Koochari

Introduction: Pupil size is a well-known indicator of low-level decision-making processes. However, it is unclear whether these involuntary eye data can represent information about the interwoven processes of hierarchical decision-making. In hierarchical decisions, high-level decision-making depends on the process of making low-level decisions, and the result of these interwoven processes is determined by feedback. Therefore, the exact cause of negative feedback is unclear, as it may be the result of low-level, high-level, or both low- and high-level incorrect decisions. In this study, we investigated the characteristics of eye data (pupil diameter) in the interwoven processes of hierarchical decision-making. Methods: We designed a hierarchical psychophysical experiment in which participants were asked to report their low- and high-level decisions and their confidence simultaneously on one of the colored bars. Participants received correct feedback in a trial when reporting both decisions correctly. During the experiment, the eye data of the participants were recorded by an eye-tracking device. Results: Our findings suggest that pupil size conveys information about high-level decisions as well. Furthermore, this study shows that three parameters (introduced in previous studies), negative feedback in successive trials, stimulus strength (uniformity with confidence), and decision urgency, are all represented in pupil size. Conclusion: The findings support the idea that involuntary eye data are influenced by decision-making-related brain activity in decision-making processes and not just visual stimulus features.

简介瞳孔大小是低级决策过程的一个众所周知的指标。然而，目前还不清楚这些非自主的眼球数据是否能代表分层决策的交织过程的信息。在分层决策中，高层决策依赖于低层决策过程，而这些交织过程的结果由反馈决定。因此，负反馈的确切原因并不清楚，因为它可能是低层次、高层次或低层次和高层次错误决策的结果。在本研究中，我们研究了眼球数据（瞳孔直径）在层级决策交织过程中的特征。研究方法我们设计了一个分层心理物理实验，要求参与者在其中一个彩条上同时报告他们的低层次和高层次决策及其置信度。如果参与者同时正确报告了两个决策，则会在一次试验中得到正确的反馈。在实验过程中，参与者的眼球数据被眼球追踪设备记录下来。实验结果我们的研究结果表明，瞳孔大小也能传递高层次决策的信息。此外，本研究还表明，瞳孔大小还代表了三个参数（在之前的研究中已经引入），即连续试验中的负反馈、刺激强度（与置信度的一致性）和决策紧迫性。结论研究结果支持这样一种观点，即在决策过程中，非自主眼球数据受到与决策相关的大脑活动的影响，而不仅仅是视觉刺激特征。

{"title":"Pupil Size Variations Reveal Information About Hierarchical Decision-Making Processes","authors":"Leyla Yahyaie, Reza Ebrahimpour, Abbas Koochari","doi":"10.1007/s12559-024-10246-8","DOIUrl":"https://doi.org/10.1007/s12559-024-10246-8","url":null,"abstract":"Introduction: Pupil size is a well-known indicator of low-level decision-making processes. However, it is unclear whether these involuntary eye data can represent information about the interwoven processes of hierarchical decision-making. In hierarchical decisions, high-level decision-making depends on the process of making low-level decisions, and the result of these interwoven processes is determined by feedback. Therefore, the exact cause of negative feedback is unclear, as it may be the result of low-level, high-level, or both low- and high-level incorrect decisions. In this study, we investigated the characteristics of eye data (pupil diameter) in the interwoven processes of hierarchical decision-making. Methods: We designed a hierarchical psychophysical experiment in which participants were asked to report their low- and high-level decisions and their confidence simultaneously on one of the colored bars. Participants received correct feedback in a trial when reporting both decisions correctly. During the experiment, the eye data of the participants were recorded by an eye-tracking device. Results: Our findings suggest that pupil size conveys information about high-level decisions as well. Furthermore, this study shows that three parameters (introduced in previous studies), negative feedback in successive trials, stimulus strength (uniformity with confidence), and decision urgency, are all represented in pupil size. Conclusion: The findings support the idea that involuntary eye data are influenced by decision-making-related brain activity in decision-making processes and not just visual stimulus features.","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"33 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139499918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0