Pub Date : 2024-08-28DOI: 10.1007/s12559-024-10332-x
Pietro Ducange, Francesco Marcelloni, Alessandro Renda, Fabrizio Ruffini
Artificial intelligence (AI) systems are increasingly used in healthcare applications, although some challenges have not been completely overcome to make them fully trustworthy and compliant with modern regulations and societal needs. First of all, sensitive health data, essential to train AI systems, are typically stored and managed in several separate medical centers and cannot be shared due to privacy constraints, thus hindering the use of all available information in learning models. Further, transparency and explainability of such systems are becoming increasingly urgent, especially at a time when “opaque” or “black-box” models are commonly used. Recently, technological and algorithmic solutions to these challenges have been investigated: on the one hand, federated learning (FL) has been proposed as a paradigm for collaborative model training among multiple parties without any disclosure of private raw data; on the other hand, research on eXplainable AI (XAI) aims to enhance the explainability of AI systems, either through interpretable by-design approaches or post-hoc explanation techniques. In this paper, we focus on a healthcare case study, namely predicting the progression of Parkinson’s disease, and assume that raw data originate from different medical centers and data collection for centralized training is precluded due to privacy limitations. We aim to investigate how FL of XAI models can allow achieving a good level of accuracy and trustworthiness. Cognitive and biologically inspired approaches are adopted in our analysis: FL of an interpretable by-design fuzzy rule-based system and FL of a neural network explained using a federated version of the SHAP post-hoc explanation technique. We analyze accuracy, interpretability, and explainability of the two approaches, also varying the degree of heterogeneity across several data distribution scenarios. Although the neural network is generally more accurate, the results show that the fuzzy rule-based system achieves competitive performance in the federated setting and presents desirable properties in terms of interpretability and transparency.
{"title":"Federated Learning of XAI Models in Healthcare: A Case Study on Parkinson’s Disease","authors":"Pietro Ducange, Francesco Marcelloni, Alessandro Renda, Fabrizio Ruffini","doi":"10.1007/s12559-024-10332-x","DOIUrl":"https://doi.org/10.1007/s12559-024-10332-x","url":null,"abstract":"<p>Artificial intelligence (AI) systems are increasingly used in healthcare applications, although some challenges have not been completely overcome to make them fully trustworthy and compliant with modern regulations and societal needs. First of all, sensitive health data, essential to train AI systems, are typically stored and managed in several separate medical centers and cannot be shared due to privacy constraints, thus hindering the use of all available information in learning models. Further, transparency and explainability of such systems are becoming increasingly urgent, especially at a time when “opaque” or “black-box” models are commonly used. Recently, technological and algorithmic solutions to these challenges have been investigated: on the one hand, federated learning (FL) has been proposed as a paradigm for collaborative model training among multiple parties without any disclosure of private raw data; on the other hand, research on eXplainable AI (XAI) aims to enhance the explainability of AI systems, either through interpretable by-design approaches or post-hoc explanation techniques. In this paper, we focus on a healthcare case study, namely predicting the progression of Parkinson’s disease, and assume that raw data originate from different medical centers and data collection for centralized training is precluded due to privacy limitations. We aim to investigate how FL of XAI models can allow achieving a good level of accuracy and trustworthiness. Cognitive and biologically inspired approaches are adopted in our analysis: FL of an interpretable by-design fuzzy rule-based system and FL of a neural network explained using a federated version of the SHAP post-hoc explanation technique. We analyze accuracy, interpretability, and explainability of the two approaches, also varying the degree of heterogeneity across several data distribution scenarios. Although the neural network is generally more accurate, the results show that the fuzzy rule-based system achieves competitive performance in the federated setting and presents desirable properties in terms of interpretability and transparency.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"11 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1007/s12559-024-10341-w
Huanhuan Zhang, Lei Wang, Yuxian Qu, Wei Li, Qiaoyong Jiang
Knowledge tracing (KT) is a technique that can be applied to predict students’ current skill mastery levels and future academic performance based on previous question-answering data. A good KT model can more accurately reflect a student’s cognitive processes and provide a more realistic assessment of skill mastery level. Currently, most KT models regard all students as a whole, while ignoring their personal differences; a few KT models attempt to personalize the modeling of students from the perspective of their learning abilities, among which a typical example is Deep Knowledge Tracing with Dynamic Student Classification (DKT-DSC). However, these models have a relatively coarse-grained approach to modeling students’ learning abilities and cannot accurately capture the nonlinear relationship between students’ learning abilities and the questions they answer. To solve these problems, we propose a novel KT model named the Enhanced Dynamic Key-Value Memory Networks for Dynamic Student Classification (EnDKVMN-DSC). This model is specifically designed for personalized student modeling and learning ability classification. Specifically, first, we propose a novel Enhanced Dynamic Key-Value Memory Network (EnDKVMN) and use it to model each student’s learning ability. Second, students are classified according to their learning abilities based on the K-means algorithm. Finally, the enriched input features are constructed and passed through Gated Recurrent Unit (GRU) networks to obtain prediction results. All experiments are conducted on four real-world datasets to evaluate our proposed model, and the results show that EnDKVMN-DSC outperforms the other four state-of-the-art KT models based on DKT or DKVMN in predicting student performance.
{"title":"Enhanced Dynamic Key-Value Memory Networks for Personalized Student Modeling and Learning Ability Classification","authors":"Huanhuan Zhang, Lei Wang, Yuxian Qu, Wei Li, Qiaoyong Jiang","doi":"10.1007/s12559-024-10341-w","DOIUrl":"https://doi.org/10.1007/s12559-024-10341-w","url":null,"abstract":"<p>Knowledge tracing (KT) is a technique that can be applied to predict students’ current skill mastery levels and future academic performance based on previous question-answering data. A good KT model can more accurately reflect a student’s cognitive processes and provide a more realistic assessment of skill mastery level. Currently, most KT models regard all students as a whole, while ignoring their personal differences; a few KT models attempt to personalize the modeling of students from the perspective of their learning abilities, among which a typical example is Deep Knowledge Tracing with Dynamic Student Classification (DKT-DSC). However, these models have a relatively coarse-grained approach to modeling students’ learning abilities and cannot accurately capture the nonlinear relationship between students’ learning abilities and the questions they answer. To solve these problems, we propose a novel KT model named the Enhanced Dynamic Key-Value Memory Networks for Dynamic Student Classification (EnDKVMN-DSC). This model is specifically designed for personalized student modeling and learning ability classification. Specifically, first, we propose a novel Enhanced Dynamic Key-Value Memory Network (EnDKVMN) and use it to model each student’s learning ability. Second, students are classified according to their learning abilities based on the <i>K</i>-means algorithm. Finally, the enriched input features are constructed and passed through Gated Recurrent Unit (GRU) networks to obtain prediction results. All experiments are conducted on four real-world datasets to evaluate our proposed model, and the results show that EnDKVMN-DSC outperforms the other four state-of-the-art KT models based on DKT or DKVMN in predicting student performance.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"124 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-27DOI: 10.1007/s12559-024-10334-9
Fen Lu, Guang-Hai Liu, Xiao-Zhi Gao
Aggregating the diverse features into a compact representation is a hot issue in image retrieval. However, aggregating the differential feature of multilayer into a discriminative representation remains challenging. Inspired by the value-guided neural mechanisms, a novel representation method, namely, the multilayer feature aggregation histogram was proposed to image retrieval. It can aggregate multilayer features, such as low-, mid-, and high-layer features, into a discriminative yet compact representation via simulating the neural mechanisms that mediate the ability to make value-guided decisions. The highlights of the proposed method have the following: (1) A detail-attentive map was proposed to represent the aggregation of low- and mid-layer features. It can be well used to evaluate the distinguishable detail feature. (2) A simple yet straightforward aggregation method is proposed to re-evaluate the distinguishable high-layer feature. It can provide aggregated features including detail, object, and semantic by using semantic-attentive map. (3) A novel whitening method, namely difference whitening, is introduced to reduce dimensionality. It did not need to seek a training dataset of semantical similarity and can provide a compact yet discriminative representation. Experiments on the popular benchmark datasets demonstrate the proposed method can obviously increase retrieval performance in terms of mAP metric. The proposed method using 128-dimensionality representation can provide significantly higher mAPs than the DSFH, DWDF, and OSAH methods by 0.083, 0.043, and 0.022 on the Oxford5k dataset and by 0.195, 0.036, and 0.071 on the Paris6k dataset. The difference whitening method can conveniently transfer the deep learning model to a new task. Our method provided competitive performance compared with the existing aggregation methods and can retrieve scene images with similar colors, objects, and semantics.
{"title":"Image Retrieval Using Multilayer Feature Aggregation Histogram","authors":"Fen Lu, Guang-Hai Liu, Xiao-Zhi Gao","doi":"10.1007/s12559-024-10334-9","DOIUrl":"https://doi.org/10.1007/s12559-024-10334-9","url":null,"abstract":"<p>Aggregating the diverse features into a compact representation is a hot issue in image retrieval. However, aggregating the differential feature of multilayer into a discriminative representation remains challenging. Inspired by the value-guided neural mechanisms, a novel representation method, namely, the <i>multilayer feature aggregation histogram</i> was proposed to image retrieval. It can aggregate multilayer features, such as low-, mid-, and high-layer features, into a discriminative yet compact representation via simulating the neural mechanisms that mediate the ability to make value-guided decisions. The highlights of the proposed method have the following: (1) A <i>detail-attentive map</i> was proposed to represent the aggregation of low- and mid-layer features. It can be well used to evaluate the distinguishable detail feature. (2) A simple yet straightforward aggregation method is proposed to re-evaluate the distinguishable high-layer feature. It can provide aggregated features including detail, object, and semantic by using <i>semantic-attentive map</i>. (3) A novel whitening method, namely <i>difference whitening</i>, is introduced to reduce dimensionality. It did not need to seek a training dataset of semantical similarity and can provide a compact yet discriminative representation. Experiments on the popular benchmark datasets demonstrate the proposed method can obviously increase retrieval performance in terms of mAP metric. The proposed method using 128-dimensionality representation can provide significantly higher mAPs than the DSFH, DWDF, and OSAH methods by 0.083, 0.043, and 0.022 on the Oxford5k dataset and by 0.195, 0.036, and 0.071 on the Paris6k dataset. The difference whitening method can conveniently transfer the deep learning model to a new task. Our method provided competitive performance compared with the existing aggregation methods and can retrieve scene images with similar colors, objects, and semantics.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"2 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-24DOI: 10.1007/s12559-024-10344-7
Iti Chaturvedi, Vlad Pandelea, Erik Cambria, Roy Welsch, Bithin Datta
In this paper, we target the problem of generating facial expressions from a piece of audio. This is challenging since both audio and video have inherent characteristics that are distinct from the other. Some words may have identical lip movements, and speech impediments may prevent lip-reading in some individuals. Previous approaches to generating such a talking head suffered from stiff expressions. This is because they focused only on lip movements and the facial landmarks did not contain the information flow from the audio. Hence, in this work, we employ spatio-temporal independent component analysis to accurately sync the audio with the corresponding face video. Proper word formation also requires control over the face muscles that can be captured using a barrier function. We first validated the approach on the diffusion of salt water in coastal areas using a synthetic finite element simulation. Next, we applied it to 3D facial expressions in toddlers for which training data is difficult to capture. Prior knowledge in the form of rules is specified using Fuzzy logic, and multi-objective optimization is used to collectively learn a set of rules. We observed significantly higher F-measure on three real-world problems.
{"title":"Barrier Function to Skin Elasticity in Talking Head","authors":"Iti Chaturvedi, Vlad Pandelea, Erik Cambria, Roy Welsch, Bithin Datta","doi":"10.1007/s12559-024-10344-7","DOIUrl":"https://doi.org/10.1007/s12559-024-10344-7","url":null,"abstract":"<p>In this paper, we target the problem of generating facial expressions from a piece of audio. This is challenging since both audio and video have inherent characteristics that are distinct from the other. Some words may have identical lip movements, and speech impediments may prevent lip-reading in some individuals. Previous approaches to generating such a talking head suffered from stiff expressions. This is because they focused only on lip movements and the facial landmarks did not contain the information flow from the audio. Hence, in this work, we employ spatio-temporal independent component analysis to accurately sync the audio with the corresponding face video. Proper word formation also requires control over the face muscles that can be captured using a barrier function. We first validated the approach on the diffusion of salt water in coastal areas using a synthetic finite element simulation. Next, we applied it to 3D facial expressions in toddlers for which training data is difficult to capture. Prior knowledge in the form of rules is specified using Fuzzy logic, and multi-objective optimization is used to collectively learn a set of rules. We observed significantly higher F-measure on three real-world problems.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"116 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181642","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s12559-024-10338-5
Zhenyu Zhou, Qinghua Zhang, Fan Zhao
Sentence-level relation extraction is a technique for extracting factual information about relationships between entities from a sentence. However, the customary method overlooks the semantic information conveyed by the label itself, thereby compromising the efficacy of rare types. Furthermore, there is a growing interest in exploring the use of textual information as a crucial resource to enhance RE models for more effectiveness. To address these two issues, CLERE (Contrastive Learning and Enriched Representation for Relation Extraction) based on contrastive learning and enriched representation of context is proposed. Firstly, by contrastive learning to incorporate semantic information of labels, CLERE is able to effectively convey and exploit the underlying semantics of various sample categories. Thereby enhancing its semantics understanding and classification capabilities, the issue of misclassification due to data imbalance is alleviated. Secondly, both semantics of context and positional information of tagged entities are enhanced by employing weighted layer pooling on pre-trained language models, which improves the representation of context and entity mentions. Experiments are conducted on three public dataset to authenticate the effectiveness of CLERE. The results demonstrate that the proposed model outperforms existing mainstream baseline methods significantly.
句子级关系提取是一种从句子中提取实体间关系事实信息的技术。然而,传统方法忽略了标签本身所传达的语义信息,从而影响了稀有类型的有效性。此外,人们越来越有兴趣探索如何利用文本信息这一重要资源来增强 RE 模型的有效性。为了解决这两个问题,我们提出了基于对比学习和丰富上下文表征的 CLERE(Contrastive Learning and Enriched Representation for Relation Extraction)方法。首先,通过对比学习纳入标签的语义信息,CLERE 能够有效传达和利用各种样本类别的潜在语义。通过增强其语义理解和分类能力,缓解了因数据不平衡而导致的误分类问题。其次,通过在预先训练的语言模型上采用加权层池化技术,增强了上下文语义和标记实体的位置信息,从而提高了上下文和实体提及的表示能力。为了验证 CLERE 的有效性,我们在三个公共数据集上进行了实验。结果表明,所提出的模型明显优于现有的主流基线方法。
{"title":"Scrutinizing Label: Contrastive Learning on Label Semantics and Enriched Representation for Relation Extraction","authors":"Zhenyu Zhou, Qinghua Zhang, Fan Zhao","doi":"10.1007/s12559-024-10338-5","DOIUrl":"https://doi.org/10.1007/s12559-024-10338-5","url":null,"abstract":"<p>Sentence-level relation extraction is a technique for extracting factual information about relationships between entities from a sentence. However, the customary method overlooks the semantic information conveyed by the label itself, thereby compromising the efficacy of rare types. Furthermore, there is a growing interest in exploring the use of textual information as a crucial resource to enhance RE models for more effectiveness. To address these two issues, CLERE (<i>C</i>ontrastive <i>L</i>earning and <i>E</i>nriched Representation for <i>R</i>elation <i>E</i>xtraction) based on contrastive learning and enriched representation of context is proposed. Firstly, by contrastive learning to incorporate semantic information of labels, CLERE is able to effectively convey and exploit the underlying semantics of various sample categories. Thereby enhancing its semantics understanding and classification capabilities, the issue of misclassification due to data imbalance is alleviated. Secondly, both semantics of context and positional information of tagged entities are enhanced by employing weighted layer pooling on pre-trained language models, which improves the representation of context and entity mentions. Experiments are conducted on three public dataset to authenticate the effectiveness of CLERE. The results demonstrate that the proposed model outperforms existing mainstream baseline methods significantly.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"50 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-22DOI: 10.1007/s12559-024-10339-4
Daniel Fernández-González
Intelligent voice assistants, such as Apple Siri and Amazon Alexa, are widely used nowadays. These task-oriented dialogue systems require a semantic parsing module in order to process user utterances and understand the action to be performed. This semantic parsing component was initially implemented by rule-based or statistical slot-filling approaches for processing simple queries; however, the appearance of more complex utterances demanded the application of shift-reduce parsers or sequence-to-sequence models. Although shift-reduce approaches were initially considered the most promising option, the emergence of sequence-to-sequence neural systems has propelled them to the forefront as the highest-performing method for this particular task. In this article, we advance the research on shift-reduce semantic parsing for task-oriented dialogue. We implement novel shift-reduce parsers that rely on Stack-Transformers. This framework allows to adequately model transition systems on the transformer neural architecture, notably boosting shift-reduce parsing performance. Furthermore, our approach goes beyond the conventional top-down algorithm: we incorporate alternative bottom-up and in-order transition systems derived from constituency parsing into the realm of task-oriented parsing. We extensively test our approach on multiple domains from the Facebook TOP benchmark, improving over existing shift-reduce parsers and state-of-the-art sequence-to-sequence models in both high-resource and low-resource settings. We also empirically prove that the in-order algorithm substantially outperforms the commonly used top-down strategy. Through the creation of innovative transition systems and harnessing the capabilities of a robust neural architecture, our study showcases the superiority of shift-reduce parsers over leading sequence-to-sequence methods on the main benchmark.
苹果 Siri 和亚马逊 Alexa 等智能语音助手如今已被广泛使用。这些以任务为导向的对话系统需要一个语义解析模块,以便处理用户语音并理解要执行的操作。这种语义解析组件最初是通过基于规则或统计槽填充的方法来实现的,用于处理简单的查询;然而,更复杂语句的出现要求应用移位还原解析器或序列到序列模型。虽然移位还原法最初被认为是最有前途的选择,但序列到序列神经系统的出现将其推向了前沿,成为这一特定任务中性能最好的方法。在本文中,我们推进了针对任务导向对话的移位还原语义解析研究。我们依靠堆栈转换器(Stack-Transformers)实现了新颖的移位还原解析器。这一框架可以在转换器神经架构上对转换系统进行充分建模,从而显著提高移位还原解析的性能。此外,我们的方法还超越了传统的自上而下算法:我们将从选区解析中衍生出来的自下而上和无序转换系统纳入了任务导向解析领域。我们在 Facebook TOP 基准的多个领域对我们的方法进行了广泛测试,在高资源和低资源环境下,我们的方法都优于现有的移位还原解析器和最先进的序列到序列模型。我们还通过经验证明,无序算法大大优于常用的自上而下策略。通过创建创新的转换系统和利用稳健神经架构的能力,我们的研究展示了移位还原解析器在主要基准上优于领先的序列到序列方法。
{"title":"Shift-Reduce Task-Oriented Semantic Parsing with Stack-Transformers","authors":"Daniel Fernández-González","doi":"10.1007/s12559-024-10339-4","DOIUrl":"https://doi.org/10.1007/s12559-024-10339-4","url":null,"abstract":"<p>Intelligent voice assistants, such as Apple Siri and Amazon Alexa, are widely used nowadays. These task-oriented dialogue systems require a semantic parsing module in order to process user utterances and understand the action to be performed. This semantic parsing component was initially implemented by rule-based or statistical slot-filling approaches for processing simple queries; however, the appearance of more complex utterances demanded the application of shift-reduce parsers or sequence-to-sequence models. Although shift-reduce approaches were initially considered the most promising option, the emergence of sequence-to-sequence neural systems has propelled them to the forefront as the highest-performing method for this particular task. In this article, we advance the research on shift-reduce semantic parsing for task-oriented dialogue. We implement novel shift-reduce parsers that rely on Stack-Transformers. This framework allows to adequately model transition systems on the transformer neural architecture, notably boosting shift-reduce parsing performance. Furthermore, our approach goes beyond the conventional top-down algorithm: we incorporate alternative bottom-up and in-order transition systems derived from constituency parsing into the realm of task-oriented parsing. We extensively test our approach on multiple domains from the Facebook TOP benchmark, improving over existing shift-reduce parsers and state-of-the-art sequence-to-sequence models in both high-resource and low-resource settings. We also empirically prove that the in-order algorithm substantially outperforms the commonly used top-down strategy. Through the creation of innovative transition systems and harnessing the capabilities of a robust neural architecture, our study showcases the superiority of shift-reduce parsers over leading sequence-to-sequence methods on the main benchmark.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"65 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181645","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Early diagnosis plays a crucial role in controlling Alzheimer’s disease (AD) progression and delaying cognitive decline. Traditional diagnostic tools present great challenges to clinical practice due to their invasiveness, high cost, and time-consuming administration. This study was designed to construct a non-invasive and cost-effective classification model based on eye movement parameters to distinguish dementia due to AD (ADD), mild cognitive impairment (MCI), and normal cognition. Eye movement data were collected from 258 subjects, comprising 111 patients with ADD, 81 patients with MCI, and 66 individuals with normal cognition. The fixation, smooth pursuit, prosaccade, and anti-saccade tasks were performed. Machine learning methods were used to screen eye movement parameters and build diagnostic models. Pearson’s correlation analysis was used to assess the correlations between the five most important eye movement indicators in the optimal model and neuropsychological scales. The gradient boosting classifier model demonstrated the best classification performance, achieving 68.2% of accuracy and 66.32% of F1-score in multiclass classification of AD. Moreover, the correlation analysis indicated that the eye movement parameters were associated with various cognitive functions, including general cognitive status, attention, visuospatial ability, episodic memory, short-term memory, and language and instrumental activities of daily life. Eye movement parameters in conjunction with machine learning methods achieve satisfactory overall accuracy, making it an effective and less time-consuming method to assist clinical diagnosis of AD.
早期诊断在控制阿尔茨海默病(AD)进展和延缓认知能力衰退方面起着至关重要的作用。传统的诊断工具因其侵入性强、成本高、管理耗时等特点给临床实践带来了巨大挑战。本研究旨在根据眼球运动参数构建一个非侵入性且经济有效的分类模型,以区分注意力缺失导致的痴呆(ADD)、轻度认知障碍(MCI)和正常认知。研究收集了 258 名受试者的眼动数据,其中包括 111 名注意力缺失症患者、81 名轻度认知障碍患者和 66 名认知功能正常者。受试者完成了固定、平滑追逐、前移和反前移任务。使用机器学习方法筛选眼球运动参数并建立诊断模型。皮尔逊相关分析用于评估最优模型中五个最重要的眼球运动指标与神经心理学量表之间的相关性。梯度提升分类器模型的分类效果最佳,在多类 AD 分类中达到了 68.2% 的准确率和 66.32% 的 F1 分数。此外,相关性分析表明,眼动参数与各种认知功能相关,包括一般认知状态、注意力、视觉空间能力、外显记忆、短期记忆、语言和日常生活工具活动。眼动参数与机器学习方法的结合达到了令人满意的整体准确性,使其成为一种有效且耗时较少的辅助AD临床诊断的方法。
{"title":"Diagnostic Potential of Eye Movements in Alzheimer’s Disease via a Multiclass Machine Learning Model","authors":"Jiaqi Song, Haodong Huang, Jiarui Liu, Jiani Wu, Yingxi Chen, Lisong Wang, Fuxin Zhong, Xiaoqin Wang, Zihan Lin, Mengyu Yan, Wenbo Zhang, Xintong Liu, Xinyi Tang, Yang Lü, Weihua Yu","doi":"10.1007/s12559-024-10346-5","DOIUrl":"https://doi.org/10.1007/s12559-024-10346-5","url":null,"abstract":"<p>Early diagnosis plays a crucial role in controlling Alzheimer’s disease (AD) progression and delaying cognitive decline. Traditional diagnostic tools present great challenges to clinical practice due to their invasiveness, high cost, and time-consuming administration. This study was designed to construct a non-invasive and cost-effective classification model based on eye movement parameters to distinguish dementia due to AD (ADD), mild cognitive impairment (MCI), and normal cognition. Eye movement data were collected from 258 subjects, comprising 111 patients with ADD, 81 patients with MCI, and 66 individuals with normal cognition. The fixation, smooth pursuit, prosaccade, and anti-saccade tasks were performed. Machine learning methods were used to screen eye movement parameters and build diagnostic models. Pearson’s correlation analysis was used to assess the correlations between the five most important eye movement indicators in the optimal model and neuropsychological scales. The gradient boosting classifier model demonstrated the best classification performance, achieving 68.2% of accuracy and 66.32% of F1-score in multiclass classification of AD. Moreover, the correlation analysis indicated that the eye movement parameters were associated with various cognitive functions, including general cognitive status, attention, visuospatial ability, episodic memory, short-term memory, and language and instrumental activities of daily life. Eye movement parameters in conjunction with machine learning methods achieve satisfactory overall accuracy, making it an effective and less time-consuming method to assist clinical diagnosis of AD.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"22 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-20DOI: 10.1007/s12559-024-10329-6
Krishna Raj S R, Srinivasa Chakravarthy V, Anindita Sahoo
Human language is influenced by sensory-motor experiences. Sensory experiences gathered in a spatiotemporal world are used as raw material to create more abstract concepts. In language, one way to encode spatial relationships is through spatial prepositions. Spatial prepositions that specify the proximity of objects in space, like far and near or their variants, are found in most languages. The mechanism for determining the proximity of another entity to itself is a useful evolutionary trait. From the taxic behavior in unicellular organisms like bacteria to the tropism in the plant kingdom, this behavior can be found in almost all organisms. In humans, vision plays a critical role in spatial localization and navigation. This computational study analyzes the relationship between vision and spatial prepositions using an artificial neural network. For this study, a synthetic image dataset was created, with each image featuring a 2D projection of an object placed in 3D space. The objects can be of various shapes, sizes, and colors. A convolutional neural network is trained to classify the object in the images as far or near based on a set threshold. The study mainly explores two visual scenarios: objects confined to a plane (grounded) and objects not confined to a plane (ungrounded), while also analyzing the influence of camera placement. The classification performance is high for the grounded case, demonstrating that the problem of far/near classification is well-defined for grounded objects, given that the camera is at a sufficient height. The network performance showed that depth can be determined in grounded cases only from monocular cues with high accuracy, given the camera is at an adequate height. The difference in the network’s performance between grounded and ungrounded cases can be explained using the physical properties of the retinal imaging system. The task of determining the distance of an object from individual images in the dataset is challenging as they lack any background cues. Still, the network performance shows the influence of spatial constraints placed on the image generation process in determining depth. The results show that monocular cues significantly contribute to depth perception when all the objects are confined to a single plane. A set of sensory inputs (images) and a specific task (far/near classification) allowed us to obtain the aforementioned results. The visual task, along with reaching and motion, may enable humans to carve the space into various spatial prepositional categories like far and near. The network’s performance and how it learns to classify between far and near provided insights into certain visual illusions that involve size constancy.
{"title":"From Pixels to Prepositions: Linking Visual Perception with Spatial Prepositions Far and Near","authors":"Krishna Raj S R, Srinivasa Chakravarthy V, Anindita Sahoo","doi":"10.1007/s12559-024-10329-6","DOIUrl":"https://doi.org/10.1007/s12559-024-10329-6","url":null,"abstract":"<p>Human language is influenced by sensory-motor experiences. Sensory experiences gathered in a spatiotemporal world are used as raw material to create more abstract concepts. In language, one way to encode spatial relationships is through spatial prepositions. Spatial prepositions that specify the proximity of objects in space, like <i>far</i> and <i>near</i> or their variants, are found in most languages. The mechanism for determining the proximity of another entity to itself is a useful evolutionary trait. From the taxic behavior in unicellular organisms like bacteria to the tropism in the plant kingdom, this behavior can be found in almost all organisms. In humans, vision plays a critical role in spatial localization and navigation. This computational study analyzes the relationship between vision and spatial prepositions using an artificial neural network. For this study, a synthetic image dataset was created, with each image featuring a 2D projection of an object placed in 3D space. The objects can be of various shapes, sizes, and colors. A convolutional neural network is trained to classify the object in the images as <i>far</i> or <i>near</i> based on a set threshold. The study mainly explores two visual scenarios: objects confined to a plane (<b>grounded</b>) and objects not confined to a plane (<b>ungrounded</b>), while also analyzing the influence of camera placement. The classification performance is high for the grounded case, demonstrating that the problem of <i>far/near</i> classification is well-defined for grounded objects, given that the camera is at a sufficient height. The network performance showed that depth can be determined in grounded cases only from monocular cues with high accuracy, given the camera is at an adequate height. The difference in the network’s performance between grounded and ungrounded cases can be explained using the physical properties of the retinal imaging system. The task of determining the distance of an object from individual images in the dataset is challenging as they lack any background cues. Still, the network performance shows the influence of spatial constraints placed on the image generation process in determining depth. The results show that monocular cues significantly contribute to depth perception when all the objects are confined to a single plane. A set of sensory inputs (images) and a specific task (<i>far/near</i> classification) allowed us to obtain the aforementioned results. The visual task, along with reaching and motion, may enable humans to carve the space into various spatial prepositional categories like <i>far</i> and <i>near</i>. The network’s performance and how it learns to classify between <i>far</i> and <i>near</i> provided insights into certain visual illusions that involve size constancy.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"42 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142181646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cognitive computation has leveraged the capabilities of computer algorithms, rendering it an exceptionally efficient approach for addressing multi-attribute group decision-making (MAGDM) problems. Due to the stability of MULTIMOORA (Multi-Objective Optimization by Ratio Analysis plus the full MULTIplicative form) and the capability of evidential reasoning (ER) to combine information from multiple sources, the technique of multigranulation probabilistic rough sets (MG PRSs) holds great promise for solving MAGDM problems. Thus, a new and stable method for MAGDM is proposed. Initially, three forms of multigranulation Pythagorean fuzzy probabilistic rough sets (MG PF PRSs) are constructed using MULTIMOORA approaches. Next, the hierarchical clustering method is employed to cluster similar decision information and consolidate the decision-makers’ preferences. Representatives are chosen from each category to simplify information fusion calculations and reduce complexity by reducing the model’s dimensionality. Following that, the rankings obtained from the three methods are fused using ER. Ultimately, the validity of our method is revealed via a case analysis on chickenpox cases from the UCI data set by employing cognitive analysis. The paper outlines a method for MAGDM that provides significant advantages. Specifically, the use of MULTIMOORA improves the stability of decision results, while the incorporation of ER reduces the overall uncertainty of entire decision processes.
{"title":"Cognitive Analysis of Medical Decision-Making: An Extended MULTIMOORA-Based Multigranulation Probabilistic Model with Evidential Reasoning","authors":"Wenhui Bai, Chao Zhang, Yanhui Zhai, Arun Kumar Sangaiah, Baoli Wang, Wentao Li","doi":"10.1007/s12559-024-10340-x","DOIUrl":"https://doi.org/10.1007/s12559-024-10340-x","url":null,"abstract":"<p>Cognitive computation has leveraged the capabilities of computer algorithms, rendering it an exceptionally efficient approach for addressing multi-attribute group decision-making (MAGDM) problems. Due to the stability of MULTIMOORA (Multi-Objective Optimization by Ratio Analysis plus the full MULTIplicative form) and the capability of evidential reasoning (ER) to combine information from multiple sources, the technique of multigranulation probabilistic rough sets (MG PRSs) holds great promise for solving MAGDM problems. Thus, a new and stable method for MAGDM is proposed. Initially, three forms of multigranulation Pythagorean fuzzy probabilistic rough sets (MG PF PRSs) are constructed using MULTIMOORA approaches. Next, the hierarchical clustering method is employed to cluster similar decision information and consolidate the decision-makers’ preferences. Representatives are chosen from each category to simplify information fusion calculations and reduce complexity by reducing the model’s dimensionality. Following that, the rankings obtained from the three methods are fused using ER. Ultimately, the validity of our method is revealed via a case analysis on chickenpox cases from the UCI data set by employing cognitive analysis. The paper outlines a method for MAGDM that provides significant advantages. Specifically, the use of MULTIMOORA improves the stability of decision results, while the incorporation of ER reduces the overall uncertainty of entire decision processes.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"20 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}