首页 > 最新文献

Cognitive Computation最新文献

英文 中文
Evaluating Explainable Machine Learning Models for Clinicians 为临床医生评估可解释的机器学习模型
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-31 DOI: 10.1007/s12559-024-10297-x
Noemi Scarpato, Aria Nourbakhsh, Patrizia Ferroni, Silvia Riondino, Mario Roselli, Francesca Fallucchi, Piero Barbanti, Fiorella Guadagni, Fabio Massimo Zanzotto

Gaining clinicians’ trust will unleash the full potential of artificial intelligence (AI) in medicine, and explaining AI decisions is seen as the way to build trustworthy systems. However, explainable artificial intelligence (XAI) methods in medicine often lack a proper evaluation. In this paper, we present our evaluation methodology for XAI methods using forward simulatability. We define the Forward Simulatability Score (FSS) and analyze its limitations in the context of clinical predictors. Then, we applied FSS to our XAI approach defined over an ML-RO, a machine learning clinical predictor based on random optimization over a multiple kernel support vector machine (SVM) algorithm. To Compare FSS values before and after the explanation phase, we test our evaluation methodology for XAI methods on three clinical datasets, namely breast cancer, VTE, and migraine. The ML-RO system is a good model on which to test our XAI evaluation strategy based on the FSS. Indeed, ML-RO outperforms two other base models—a decision tree (DT) and a plain SVM—in the three datasets and gives the possibility of defining different XAI models: TOPK, MIGF, and F4G. The FSS evaluation score suggests that the explanation method F4G for the ML-RO is the most effective in two datasets out of the three tested, and it shows the limits of the learned model for one dataset. Our study aims to introduce a standard practice for evaluating XAI methods in medicine. By establishing a rigorous evaluation framework, we seek to provide healthcare professionals with reliable tools for assessing the performance of XAI methods to enhance the adoption of AI systems in clinical practice.

赢得临床医生的信任将充分释放人工智能(AI)在医疗领域的潜力,而解释人工智能的决策被视为建立可信系统的途径。然而,医学中的可解释人工智能(XAI)方法往往缺乏适当的评估。在本文中,我们介绍了利用前向可模拟性对 XAI 方法进行评估的方法。我们定义了前向可模拟性评分(FSS),并分析了其在临床预测方面的局限性。然后,我们将 FSS 应用于在 ML-RO 上定义的 XAI 方法,ML-RO 是一种基于多核支持向量机 (SVM) 算法随机优化的机器学习临床预测器。为了比较解释阶段前后的 FSS 值,我们在三个临床数据集(即乳腺癌、VTE 和偏头痛)上测试了 XAI 方法的评估方法。ML-RO 系统是测试我们基于 FSS 的 XAI 评估策略的良好模型。事实上,ML-RO 在三个数据集上的表现优于其他两个基础模型--决策树(DT)和普通 SVM,并为定义不同的 XAI 模型提供了可能性:TOPK、MIGF 和 F4G。FSS 评估得分表明,ML-RO 的解释方法 F4G 在三个测试数据集中的两个数据集中最为有效,同时也显示了所学模型在一个数据集中的局限性。我们的研究旨在为医学领域的 XAI 方法评估引入标准实践。通过建立一个严格的评估框架,我们试图为医疗保健专业人员提供可靠的工具来评估 XAI 方法的性能,从而促进人工智能系统在临床实践中的应用。
{"title":"Evaluating Explainable Machine Learning Models for Clinicians","authors":"Noemi Scarpato, Aria Nourbakhsh, Patrizia Ferroni, Silvia Riondino, Mario Roselli, Francesca Fallucchi, Piero Barbanti, Fiorella Guadagni, Fabio Massimo Zanzotto","doi":"10.1007/s12559-024-10297-x","DOIUrl":"https://doi.org/10.1007/s12559-024-10297-x","url":null,"abstract":"<p>Gaining clinicians’ trust will unleash the full potential of artificial intelligence (AI) in medicine, and explaining AI decisions is seen as the way to build trustworthy systems. However, explainable artificial intelligence (XAI) methods in medicine often lack a proper evaluation. In this paper, we present our evaluation methodology for XAI methods using forward simulatability. We define the Forward Simulatability Score (FSS) and analyze its limitations in the context of clinical predictors. Then, we applied FSS to our XAI approach defined over an ML-RO, a machine learning clinical predictor based on random optimization over a multiple kernel support vector machine (SVM) algorithm. To Compare FSS values before and after the explanation phase, we test our evaluation methodology for XAI methods on three clinical datasets, namely breast cancer, VTE, and migraine. The ML-RO system is a good model on which to test our XAI evaluation strategy based on the FSS. Indeed, ML-RO outperforms two other base models—a decision tree (DT) and a plain SVM—in the three datasets and gives the possibility of defining different XAI models: TOPK, MIGF, and F4G. The FSS evaluation score suggests that the explanation method F4G for the ML-RO is the most effective in two datasets out of the three tested, and it shows the limits of the learned model for one dataset. Our study aims to introduce a standard practice for evaluating XAI methods in medicine. By establishing a rigorous evaluation framework, we seek to provide healthcare professionals with reliable tools for assessing the performance of XAI methods to enhance the adoption of AI systems in clinical practice.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"34 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Counterfactual Explanations in the Big Picture: An Approach for Process Prediction-Driven Job-Shop Scheduling Optimization 大局中的反事实解释:流程预测驱动的作业车间调度优化方法
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-30 DOI: 10.1007/s12559-024-10294-0
Nijat Mehdiyev, Maxim Majlatow, Peter Fettke

In this study, we propose a pioneering framework for generating multi-objective counterfactual explanations in job-shop scheduling contexts, combining predictive process monitoring with advanced mathematical optimization techniques. Using the Non-dominated Sorting Genetic Algorithm II (NSGA-II) for multi-objective optimization, our approach enhances the generation of counterfactual explanations that illuminate potential enhancements at both the operational and systemic levels. Validated with real-world data, our methodology underscores the superiority of NSGA-II in crafting pertinent and actionable counterfactual explanations, surpassing traditional methods in both efficiency and practical relevance. This work advances the domains of explainable artificial intelligence (XAI), predictive process monitoring, and combinatorial optimization, providing an effective tool for improving automated scheduling systems’ clarity, and decision-making capabilities.

在本研究中,我们提出了一个开创性的框架,将预测性流程监控与先进的数学优化技术相结合,用于生成作业车间调度背景下的多目标反事实解释。利用非支配排序遗传算法 II(NSGA-II)进行多目标优化,我们的方法增强了反事实解释的生成,从而揭示了操作和系统层面的潜在改进。经过真实世界数据的验证,我们的方法强调了 NSGA-II 在制作中肯、可操作的反事实解释方面的优越性,在效率和实用性方面都超越了传统方法。这项工作推动了可解释人工智能(XAI)、预测过程监控和组合优化领域的发展,为提高自动调度系统的清晰度和决策能力提供了有效工具。
{"title":"Counterfactual Explanations in the Big Picture: An Approach for Process Prediction-Driven Job-Shop Scheduling Optimization","authors":"Nijat Mehdiyev, Maxim Majlatow, Peter Fettke","doi":"10.1007/s12559-024-10294-0","DOIUrl":"https://doi.org/10.1007/s12559-024-10294-0","url":null,"abstract":"<p>In this study, we propose a pioneering framework for generating multi-objective counterfactual explanations in job-shop scheduling contexts, combining predictive process monitoring with advanced mathematical optimization techniques. Using the Non-dominated Sorting Genetic Algorithm II (NSGA-II) for multi-objective optimization, our approach enhances the generation of counterfactual explanations that illuminate potential enhancements at both the operational and systemic levels. Validated with real-world data, our methodology underscores the superiority of NSGA-II in crafting pertinent and actionable counterfactual explanations, surpassing traditional methods in both efficiency and practical relevance. This work advances the domains of explainable artificial intelligence (XAI), predictive process monitoring, and combinatorial optimization, providing an effective tool for improving automated scheduling systems’ clarity, and decision-making capabilities.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"42 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141192110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of Cardiovascular Diseases Using Data Mining Approaches: Application of an Ensemble-Based Model 利用数据挖掘方法检测心血管疾病:基于集合的模型的应用
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-30 DOI: 10.1007/s12559-024-10306-z
Mojdeh Nazari, Hassan Emami, Reza Rabiei, Azamossadat Hosseini, Shahabedin Rahmatizadeh

Cardiovascular diseases are the leading contributor of mortality worldwide. Accurate cardiovascular disease prediction is crucial, and the application of machine learning and data mining techniques could facilitate decision-making and improve predictive capabilities. This study aimed to present a model for accurate prediction of cardiovascular diseases and identifying key contributing factors with the greatest impact. The Cleveland dataset besides the locally collected dataset, called the Noor dataset, was used in this study. Accordingly, various data mining techniques besides four ensemble learning-based models were implemented on both datasets. Moreover, a novel model for combining individual classifiers in ensemble learning, wherein weights were assigned to each classifier (using a genetic algorithm), was developed. The predictive strength of each feature was also investigated to ensure the generalizability of the outcomes. The ultimate ensemble-based model achieved a precision rate of 88.05% and 90.12% on the Cleveland and Noor datasets, respectively, demonstrating its reliability and suitability for future research in predicting the likelihood of cardiovascular diseases. Not only the proposed model introduces an innovative approach for specifying cardiovascular diseases by unraveling the intricate relationships between various biological variables but also facilitates early detection of cardiovascular diseases.

心血管疾病是导致全球死亡的主要因素。准确预测心血管疾病至关重要,而应用机器学习和数据挖掘技术可以促进决策并提高预测能力。本研究旨在提出一个模型,用于准确预测心血管疾病,并确定影响最大的关键诱因。除本地收集的数据集(称为 Noor 数据集)外,本研究还使用了克利夫兰数据集。因此,除了四个基于集合学习的模型外,还在这两个数据集上实施了各种数据挖掘技术。此外,还开发了一种在集合学习中组合单个分类器的新模型,其中为每个分类器分配了权重(使用遗传算法)。此外,还对每个特征的预测强度进行了研究,以确保结果的通用性。最终基于集合的模型在克利夫兰和努尔数据集上的精确率分别达到了 88.05% 和 90.12%,证明了其可靠性以及在未来预测心血管疾病可能性研究中的适用性。所提出的模型不仅通过揭示各种生物变量之间错综复杂的关系,为心血管疾病的诊断引入了一种创新方法,而且有助于心血管疾病的早期检测。
{"title":"Detection of Cardiovascular Diseases Using Data Mining Approaches: Application of an Ensemble-Based Model","authors":"Mojdeh Nazari, Hassan Emami, Reza Rabiei, Azamossadat Hosseini, Shahabedin Rahmatizadeh","doi":"10.1007/s12559-024-10306-z","DOIUrl":"https://doi.org/10.1007/s12559-024-10306-z","url":null,"abstract":"<p>Cardiovascular diseases are the leading contributor of mortality worldwide. Accurate cardiovascular disease prediction is crucial, and the application of machine learning and data mining techniques could facilitate decision-making and improve predictive capabilities. This study aimed to present a model for accurate prediction of cardiovascular diseases and identifying key contributing factors with the greatest impact. The Cleveland dataset besides the locally collected dataset, called the Noor dataset, was used in this study. Accordingly, various data mining techniques besides four ensemble learning-based models were implemented on both datasets. Moreover, a novel model for combining individual classifiers in ensemble learning, wherein weights were assigned to each classifier (using a genetic algorithm), was developed. The predictive strength of each feature was also investigated to ensure the generalizability of the outcomes. The ultimate ensemble-based model achieved a precision rate of 88.05% and 90.12% on the Cleveland and Noor datasets, respectively, demonstrating its reliability and suitability for future research in predicting the likelihood of cardiovascular diseases. Not only the proposed model introduces an innovative approach for specifying cardiovascular diseases by unraveling the intricate relationships between various biological variables but also facilitates early detection of cardiovascular diseases.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"84 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141191950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative Adversarial Network-Assisted Framework for Power Management 生成式对抗网络辅助电源管理框架
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-27 DOI: 10.1007/s12559-024-10284-2
Noman Khan, Samee Ullah Khan, Ahmed Farouk, Sung Wook Baik

The rise in power consumption (PC) is caused by several factors such as the growing global population, urbanization, technological advances, economic development, and growth of businesses and commercial sectors. In these days, intermittent renewable energy sources (RESs) are widely utilized in electric grids to meet the need for power. Data-driven techniques are essential to assuring the steady operation of the electric grid and accurate power consumption and generation forecasting. Conversely, the available datasets for time series electric power forecasting in the energy industry are not as large as those for other domains such as in computer vision. Thus, a deep learning (DL) framework for predicting PC in residential and commercial buildings as well as the power generation (PG) from RESs is introduced. The raw power data obtained from buildings and RESs-based power plants are conceded by the purging process where absent values are filled in and noise and outliers are eliminated. Next, the proposed generative adversarial network (GAN) uses a portion of the cleaned data to generate synthetic parallel data, which is combined with the actual data to make a hybrid dataset. Subsequently, the stacked gated recurrent unit (GRU) model, which is optimized for power forecasting, is trained using the hybrid dataset. Six existent power data are used to train and test sixteen linear and nonlinear models for energy forecasting. The best-performing network is selected as the proposed method for forecasting tasks. For Korea Yeongam solar power (KYSP), individual household electric power consumption (IHEPC), and advanced institute of convergence technology (AICT) datasets, the proposed model obtains mean absolute error (MAE) values of 0.0716, 0.0819, and 0.0877, respectively. Similarly, its MAE values are 0.1215, 0.5093, and 0.5751, for Australia Alice Springs solar power (AASSP), Korea south east wind power (KSEWP), and, Korea south east solar power (KSESP) datasets, respectively.

全球人口增长、城市化、技术进步、经济发展以及企业和商业部门的增长等因素导致了电力消耗(PC)的增加。如今,间歇性可再生能源(RES)被广泛应用于电网,以满足电力需求。数据驱动技术对于确保电网稳定运行、准确预测用电量和发电量至关重要。相反,能源行业中用于时间序列电力预测的可用数据集不如计算机视觉等其他领域的数据集大。因此,我们引入了一个深度学习(DL)框架,用于预测住宅和商业建筑的 PC 以及可再生能源的发电量(PG)。从建筑物和基于可再生能源的发电厂获得的原始电力数据会经过净化过程,其中缺失的值会被填补,噪声和异常值会被消除。接下来,建议的生成式对抗网络(GAN)使用部分净化数据生成合成并行数据,并将其与实际数据相结合,形成混合数据集。随后,使用混合数据集训练针对电力预测进行了优化的叠加门控递归单元(GRU)模型。六个现有电力数据用于训练和测试十六个线性和非线性模型,以进行电能预测。选择表现最好的网络作为预测任务的建议方法。对于韩国永岩太阳能发电(KYSP)、个人家庭电力消耗(IHEPC)和高级融合技术研究所(AICT)数据集,建议模型获得的平均绝对误差(MAE)值分别为 0.0716、0.0819 和 0.0877。同样,澳大利亚爱丽斯泉太阳能发电数据集(AASSP)、韩国东南部风力发电数据集(KSEWP)和韩国东南部太阳能发电数据集(KSESP)的平均绝对误差(MAE)值分别为 0.1215、0.5093 和 0.5751。
{"title":"Generative Adversarial Network-Assisted Framework for Power Management","authors":"Noman Khan, Samee Ullah Khan, Ahmed Farouk, Sung Wook Baik","doi":"10.1007/s12559-024-10284-2","DOIUrl":"https://doi.org/10.1007/s12559-024-10284-2","url":null,"abstract":"<p>The rise in power consumption (PC) is caused by several factors such as the growing global population, urbanization, technological advances, economic development, and growth of businesses and commercial sectors. In these days, intermittent renewable energy sources (RESs) are widely utilized in electric grids to meet the need for power. Data-driven techniques are essential to assuring the steady operation of the electric grid and accurate power consumption and generation forecasting. Conversely, the available datasets for time series electric power forecasting in the energy industry are not as large as those for other domains such as in computer vision. Thus, a deep learning (DL) framework for predicting PC in residential and commercial buildings as well as the power generation (PG) from RESs is introduced. The raw power data obtained from buildings and RESs-based power plants are conceded by the purging process where absent values are filled in and noise and outliers are eliminated. Next, the proposed generative adversarial network (GAN) uses a portion of the cleaned data to generate synthetic parallel data, which is combined with the actual data to make a hybrid dataset. Subsequently, the stacked gated recurrent unit (GRU) model, which is optimized for power forecasting, is trained using the hybrid dataset. Six existent power data are used to train and test sixteen linear and nonlinear models for energy forecasting. The best-performing network is selected as the proposed method for forecasting tasks. For Korea Yeongam solar power (KYSP), individual household electric power consumption (IHEPC), and advanced institute of convergence technology (AICT) datasets, the proposed model obtains mean absolute error (MAE) values of 0.0716, 0.0819, and 0.0877, respectively. Similarly, its MAE values are 0.1215, 0.5093, and 0.5751, for Australia Alice Springs solar power (AASSP), Korea south east wind power (KSEWP), and, Korea south east solar power (KSESP) datasets, respectively.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"18 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141173345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quasi-projective Synchronization Control of Delayed Stochastic Quaternion-Valued Fuzzy Cellular Neural Networks with Mismatched Parameters 参数不匹配的延迟随机四元数值模糊蜂窝神经网络的准投影同步控制
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-27 DOI: 10.1007/s12559-024-10299-9
Xiaofang Meng, Yu Fei, Zhouhong Li

This paper deals with the quasi-projective synchronization problem of delayed stochastic quaternion fuzzy cellular neural networks with mismatch parameters. Although the parameter mismatch of the drive-response system increases the computational complexity of the article, it is of practical significance to consider the existence of deviations between the two systems. The method of this article is to design an appropriate controller and construct Lyapunov functional and stochastic analysis theory based on the Itô formula in the quaternion domain. We adopt the non-decomposable method of quaternion FCNN, which preserves the original data and reduces computational effort. We obtain sufficient conditions for quasi-projective synchronization of the considered random quaternion numerical FCNNs with mismatched parameters. Additionally, we estimate the error bounds of quasi-projective synchronization and then carry out a numerical example to verify their validity. Our results are novel even if the considered neural networks degenerate into real-valued or complex-valued neural networks. This article provides a good research idea for studying the quasi-projective synchronization problem of random quaternion numerical FCNN with time delay and has obtained good results. The method in this article can also be used to study the quasi-projective synchronization of a Clifford-valued neural network.

本文讨论了参数不匹配的延迟随机四元模糊蜂窝神经网络的准投影同步问题。虽然驱动-响应系统的参数不匹配增加了文章的计算复杂度,但考虑两个系统之间存在偏差具有实际意义。本文的方法是设计一个合适的控制器,并基于四元数域中的 Itô 公式构建 Lyapunov 函数和随机分析理论。我们采用了四元数 FCNN 的不可分解方法,既保留了原始数据,又减少了计算量。我们获得了参数不匹配的随机四元数 FCNN 准投影同步的充分条件。此外,我们还估算了准投影同步的误差边界,并通过一个数值示例验证了其有效性。即使所考虑的神经网络退化为实值或复值神经网络,我们的结果也是新颖的。本文为研究带时延的随机四元数值 FCNN 的准投影同步问题提供了一个很好的研究思路,并取得了很好的效果。本文的方法也可用于研究 Clifford 值神经网络的准投影同步问题。
{"title":"Quasi-projective Synchronization Control of Delayed Stochastic Quaternion-Valued Fuzzy Cellular Neural Networks with Mismatched Parameters","authors":"Xiaofang Meng, Yu Fei, Zhouhong Li","doi":"10.1007/s12559-024-10299-9","DOIUrl":"https://doi.org/10.1007/s12559-024-10299-9","url":null,"abstract":"<p>This paper deals with the quasi-projective synchronization problem of delayed stochastic quaternion fuzzy cellular neural networks with mismatch parameters. Although the parameter mismatch of the drive-response system increases the computational complexity of the article, it is of practical significance to consider the existence of deviations between the two systems. The method of this article is to design an appropriate controller and construct Lyapunov functional and stochastic analysis theory based on the Itô formula in the quaternion domain. We adopt the non-decomposable method of quaternion FCNN, which preserves the original data and reduces computational effort. We obtain sufficient conditions for quasi-projective synchronization of the considered random quaternion numerical FCNNs with mismatched parameters. Additionally, we estimate the error bounds of quasi-projective synchronization and then carry out a numerical example to verify their validity. Our results are novel even if the considered neural networks degenerate into real-valued or complex-valued neural networks. This article provides a good research idea for studying the quasi-projective synchronization problem of random quaternion numerical FCNN with time delay and has obtained good results. The method in this article can also be used to study the quasi-projective synchronization of a Clifford-valued neural network.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"22 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-Enabled Large Language and Deep Learning Models for Image-Based Emotion Recognition 基于视觉的大型语言和深度学习模型,用于基于图像的情感识别
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-27 DOI: 10.1007/s12559-024-10281-5
Mohammad Nadeem, Shahab Saquib Sohail, Laeeba Javed, Faisal Anwer, Abdul Khader Jilani Saudagar, Khan Muhammad

The significant advancements in the capabilities, reasoning, and efficiency of artificial intelligence (AI)-based tools and systems are evident. Some noteworthy examples of such tools include generative AI-based large language models (LLMs) such as generative pretrained transformer 3.5 (GPT 3.5), generative pretrained transformer 4 (GPT-4), and Bard. LLMs are versatile and effective for various tasks such as composing poetry, writing codes, generating essays, and solving puzzles. Thus far, LLMs can only effectively process text-based input. However, recent advancements have enabled them to handle multimodal inputs, such as text, images, and audio, making them highly general-purpose tools. LLMs have achieved decent performance in pattern recognition tasks (such as classification), therefore, there is a curiosity about whether general-purpose LLMs can perform comparable or even superior to specialized deep learning models (DLMs) trained specifically for a given task. In this study, we compared the performances of fine-tuned DLMs with those of general-purpose LLMs for image-based emotion recognition. We trained DLMs, namely, a convolutional neural network (CNN) (two CNN models were used: (CNN_1) and (CNN_2)), ResNet50, and VGG-16 models, using an image dataset for emotion recognition, and then tested their performance on another dataset. Subsequently, we subjected the same testing dataset to two vision-enabled LLMs (LLaVa and GPT-4). The (CNN_2) was found to be the superior model with an accuracy of 62% while VGG16 produced the lowest accuracy with 31%. In the category of LLMs, GPT-4 performed the best, with an accuracy of 55.81%. LLava LLM had a higher accuracy than (CNN_1) and VGG16 models. The other performance metrics such as precision, recall, and F1-score followed similar trends. However, GPT-4 performed the best with small datasets. The poor results observed in LLMs can be attributed to their general-purpose nature, which, despite extensive pretraining, may not fully capture the features required for specific tasks like emotion recognition in images as effectively as models fine-tuned for those tasks. The LLMs did not surpass specialized models but achieved comparable performance, making them a viable option for specific tasks without additional training. In addition, LLMs can be considered a good alternative when the available dataset is small.

基于人工智能(AI)的工具和系统在能力、推理和效率方面的巨大进步是显而易见的。这类工具中值得一提的例子包括基于生成式人工智能的大型语言模型(LLM),如生成式预训练变换器 3.5(GPT 3.5)、生成式预训练变换器 4(GPT-4)和巴德(Bard)。LLMs 用途广泛,可有效完成各种任务,如创作诗歌、编写代码、生成文章和解谜。迄今为止,LLM 只能有效处理基于文本的输入。然而,最近的进步使它们能够处理多模态输入,如文本、图像和音频,从而使它们成为高度通用的工具。LLM 在模式识别任务(如分类)中取得了不俗的表现,因此,人们对通用 LLM 的表现是否能与专为特定任务训练的专业深度学习模型(DLM)相媲美甚至更胜一筹充满了好奇。在本研究中,我们比较了微调 DLM 与通用 LLM 在基于图像的情感识别中的表现。我们训练了 DLMs,即一个卷积神经网络(CNN)(使用了两个 CNN 模型:(CNN_1)和(CNN_2))、ResNet50和VGG-16模型,然后在另一个数据集上测试它们的性能。随后,我们将同一个测试数据集交给了两个支持视觉的 LLM(LLaVa 和 GPT-4)。结果发现,CNN_2是最优秀的模型,准确率为62%,而VGG16的准确率最低,只有31%。在 LLM 类别中,GPT-4 表现最好,准确率为 55.81%。LLava LLM 的准确率高于(CNN_1)和 VGG16 模型。其他性能指标,如精确度、召回率和 F1 分数,也呈现出类似的趋势。然而,GPT-4 在小型数据集上的表现最好。在 LLMs 中观察到的较差结果可归因于它们的通用性,尽管进行了大量的预训练,但它们可能无法像针对特定任务微调的模型那样有效地捕捉特定任务(如图像中的情感识别)所需的特征。LLM 并没有超越专用模型,但取得了不相上下的性能,这使它们成为无需额外训练即可完成特定任务的可行选择。此外,当可用数据集较少时,LLMs 也可被视为一种很好的选择。
{"title":"Vision-Enabled Large Language and Deep Learning Models for Image-Based Emotion Recognition","authors":"Mohammad Nadeem, Shahab Saquib Sohail, Laeeba Javed, Faisal Anwer, Abdul Khader Jilani Saudagar, Khan Muhammad","doi":"10.1007/s12559-024-10281-5","DOIUrl":"https://doi.org/10.1007/s12559-024-10281-5","url":null,"abstract":"<p>The significant advancements in the capabilities, reasoning, and efficiency of artificial intelligence (AI)-based tools and systems are evident. Some noteworthy examples of such tools include generative AI-based large language models (LLMs) such as generative pretrained transformer 3.5 (GPT 3.5), generative pretrained transformer 4 (GPT-4), and Bard. LLMs are versatile and effective for various tasks such as composing poetry, writing codes, generating essays, and solving puzzles. Thus far, LLMs can only effectively process text-based input. However, recent advancements have enabled them to handle multimodal inputs, such as text, images, and audio, making them highly general-purpose tools. LLMs have achieved decent performance in pattern recognition tasks (such as classification), therefore, there is a curiosity about whether general-purpose LLMs can perform comparable or even superior to specialized deep learning models (DLMs) trained specifically for a given task. In this study, we compared the performances of fine-tuned DLMs with those of general-purpose LLMs for image-based emotion recognition. We trained DLMs, namely, a convolutional neural network (CNN) (two CNN models were used: <span>(CNN_1)</span> and <span>(CNN_2)</span>), ResNet50, and VGG-16 models, using an image dataset for emotion recognition, and then tested their performance on another dataset. Subsequently, we subjected the same testing dataset to two vision-enabled LLMs (LLaVa and GPT-4). The <span>(CNN_2)</span> was found to be the superior model with an accuracy of 62% while VGG16 produced the lowest accuracy with 31%. In the category of LLMs, GPT-4 performed the best, with an accuracy of 55.81%. LLava LLM had a higher accuracy than <span>(CNN_1)</span> and VGG16 models. The other performance metrics such as precision, recall, and F1-score followed similar trends. However, GPT-4 performed the best with small datasets. The poor results observed in LLMs can be attributed to their general-purpose nature, which, despite extensive pretraining, may not fully capture the features required for specific tasks like emotion recognition in images as effectively as models fine-tuned for those tasks. The LLMs did not surpass specialized models but achieved comparable performance, making them a viable option for specific tasks without additional training. In addition, LLMs can be considered a good alternative when the available dataset is small.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"97 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the Influence of Scene Video on EEG-Based Evaluation of Interior Sound in Passenger Cars 研究场景视频对基于脑电图的乘用车内声音评估的影响
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-25 DOI: 10.1007/s12559-024-10303-2
Liping Xie, Zhien Liu, Yi Sun, Yawei Zhu

The evaluation of automobile sound quality is an important research topic in the interior sound design of passenger car, and the accurate and effective evaluation methods are required for the determination of the acoustic targets in automobile development. However, there are some deficiencies in the existing evaluation studies of automobile sound quality. (1) Most of subjective evaluations only considered the auditory perception, which is easy to be achieved but does not fully reflect the impacts of sound on participants; (2) similarly, most of the existing subjective evaluations only considered the inherent properties of sounds, such as physical and psychoacoustic parameters, which make it difficult to reflect the complex relationship between the sound and the subjective perception of the evaluators; (3) the construction of evaluation models only from physical and psychoacoustic perspectives does not provide a comprehensive analysis of the real subjective emotions of the participants. Therefore, to alleviate the above flaws, the auditory and visual perceptions are combined to explore the inference of scene video on the evaluation of sound quality, and the EEG signal is introduced as a physiological acoustic index to evaluate the sound quality; simultaneously, an Elman neural network model is constructed to predict the powerful sound quality combined with the proposed indexes of physical acoustics, psychoacoustics, and physiological acoustics. The results show that evaluation results of sound quality combined with scene videos better reflect the subjective perceptions of participants. The proposed objective evaluation indexes of physical, psychoacoustic, and physiological acoustic contribute to mapping the subjective results of the powerful sound quality, and the constructed Elman model outperforms the traditional back propagation (BP) and support vector machine (SVM) models. The analysis method proposed in this paper can be better applied in the field of automotive sound design, providing a clear guideline for the evaluation and optimization of automotive sound quality in the future.

汽车声品质评价是乘用车内部声学设计的重要研究课题,汽车开发中声学目标的确定需要准确有效的评价方法。然而,现有的汽车声品质评价研究存在一些不足。(1)主观评价大多只考虑听觉感受,虽然容易实现,但不能全面反映声音对参与者的影响;(2)同样,现有的主观评价大多只考虑声音的物理和心理声学参数等固有属性,难以反映声音与评价者主观感受之间的复杂关系;(3)仅从物理和心理声学角度构建评价模型,不能全面分析参与者的真实主观情感。因此,为缓解上述缺陷,结合听觉和视觉感知,探索场景视频对音质评价的推断,并引入脑电信号作为生理声学指标对音质进行评价;同时,结合提出的物理声学、心理声学和生理声学指标,构建 Elman 神经网络模型,对强大的音质进行预测。结果表明,结合场景视频的音质评价结果能更好地反映参与者的主观感受。所提出的物理声学、心理声学和生理声学客观评价指标有助于映射出强大音质的主观结果,所构建的 Elman 模型优于传统的反向传播(BP)和支持向量机(SVM)模型。本文提出的分析方法可以更好地应用于汽车声音设计领域,为今后汽车声音质量的评估和优化提供明确的指导。
{"title":"Investigating the Influence of Scene Video on EEG-Based Evaluation of Interior Sound in Passenger Cars","authors":"Liping Xie, Zhien Liu, Yi Sun, Yawei Zhu","doi":"10.1007/s12559-024-10303-2","DOIUrl":"https://doi.org/10.1007/s12559-024-10303-2","url":null,"abstract":"<p>The evaluation of automobile sound quality is an important research topic in the interior sound design of passenger car, and the accurate and effective evaluation methods are required for the determination of the acoustic targets in automobile development. However, there are some deficiencies in the existing evaluation studies of automobile sound quality. (1) Most of subjective evaluations only considered the auditory perception, which is easy to be achieved but does not fully reflect the impacts of sound on participants; (2) similarly, most of the existing subjective evaluations only considered the inherent properties of sounds, such as physical and psychoacoustic parameters, which make it difficult to reflect the complex relationship between the sound and the subjective perception of the evaluators; (3) the construction of evaluation models only from physical and psychoacoustic perspectives does not provide a comprehensive analysis of the real subjective emotions of the participants. Therefore, to alleviate the above flaws, the auditory and visual perceptions are combined to explore the inference of scene video on the evaluation of sound quality, and the EEG signal is introduced as a physiological acoustic index to evaluate the sound quality; simultaneously, an Elman neural network model is constructed to predict the powerful sound quality combined with the proposed indexes of physical acoustics, psychoacoustics, and physiological acoustics. The results show that evaluation results of sound quality combined with scene videos better reflect the subjective perceptions of participants. The proposed objective evaluation indexes of physical, psychoacoustic, and physiological acoustic contribute to mapping the subjective results of the powerful sound quality, and the constructed Elman model outperforms the traditional back propagation (BP) and support vector machine (SVM) models. The analysis method proposed in this paper can be better applied in the field of automotive sound design, providing a clear guideline for the evaluation and optimization of automotive sound quality in the future.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"64 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structured Encoding Based on Semantic Disambiguation for Video Captioning 基于语义消歧的视频字幕结构化编码
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-09 DOI: 10.1007/s12559-024-10275-3
Bo Sun, Jinyu Tian, Yong Wu, Lunjun Yu, Yuanyan Tang

Video captioning, which aims to automatically generate video captions, has gained significant attention due to its wide range of applications in video surveillance and retrieval. However, most existing methods focus on frame-level convolution to extract features, which ignores the semantic relationships between objects, resulting in the inability to encode video details. To address this problem, inspired by human cognitive processes towards the world, we propose a video captioning method based on semantic disambiguation through structured encoding. First, the conceptual semantic graph of a video is constructed by introducing a knowledge graph. Then, the graph convolution networks are used for relational learning of the conceptual semantic graph to mine the semantic relationships of objects and form the detail encoding of video. Aiming to address the semantic ambiguity of multiple relationships between objects, we propose a method to dynamically learn the most relevant relationships using video scene semantics to construct semantic graphs based on semantic disambiguation. Finally, we propose a cross-domain guided relationship learning strategy to avoid the negative impact caused by using only captions as cross-entropy loss. Experiments based on three datasets—MSR-VTT, ActivityNet Captions, and Student Classroom Behavior—showed that our method outperforms other methods. The results show that introducing a knowledge graph for common sense reasoning of objects in videos can deeply encode the semantic relationships between objects to capture video details and improve captioning performance.

视频字幕旨在自动生成视频字幕,因其在视频监控和检索方面的广泛应用而备受关注。然而,现有的大多数方法都是通过帧级卷积来提取特征,忽略了物体之间的语义关系,导致无法对视频细节进行编码。为了解决这一问题,我们从人类对世界的认知过程中汲取灵感,提出了一种通过结构化编码进行语义消歧的视频字幕制作方法。首先,通过引入知识图谱构建视频的概念语义图。然后,利用图卷积网络对概念语义图进行关系学习,挖掘对象的语义关系,形成视频的细节编码。针对物体间多种关系的语义模糊性,我们提出了一种利用视频场景语义动态学习最相关关系的方法,从而在语义消歧的基础上构建语义图。最后,我们提出了一种跨领域引导关系学习策略,以避免仅使用字幕作为交叉熵损失所带来的负面影响。基于三个数据集(SSR-VTT、ActivityNet Captions 和 Student Classroom Behavior)的实验表明,我们的方法优于其他方法。结果表明,引入知识图谱对视频中的对象进行常识推理,可以深入编码对象之间的语义关系,从而捕捉视频细节,提高字幕性能。
{"title":"Structured Encoding Based on Semantic Disambiguation for Video Captioning","authors":"Bo Sun, Jinyu Tian, Yong Wu, Lunjun Yu, Yuanyan Tang","doi":"10.1007/s12559-024-10275-3","DOIUrl":"https://doi.org/10.1007/s12559-024-10275-3","url":null,"abstract":"<p>Video captioning, which aims to automatically generate video captions, has gained significant attention due to its wide range of applications in video surveillance and retrieval. However, most existing methods focus on frame-level convolution to extract features, which ignores the semantic relationships between objects, resulting in the inability to encode video details. To address this problem, inspired by human cognitive processes towards the world, we propose a video captioning method based on semantic disambiguation through structured encoding. First, the conceptual semantic graph of a video is constructed by introducing a knowledge graph. Then, the graph convolution networks are used for relational learning of the conceptual semantic graph to mine the semantic relationships of objects and form the detail encoding of video. Aiming to address the semantic ambiguity of multiple relationships between objects, we propose a method to dynamically learn the most relevant relationships using video scene semantics to construct semantic graphs based on semantic disambiguation. Finally, we propose a cross-domain guided relationship learning strategy to avoid the negative impact caused by using only captions as cross-entropy loss. Experiments based on three datasets—MSR-VTT, ActivityNet Captions, and Student Classroom Behavior—showed that our method outperforms other methods. The results show that introducing a knowledge graph for common sense reasoning of objects in videos can deeply encode the semantic relationships between objects to capture video details and improve captioning performance.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"1 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Category-Aware Siamese Learning Network for Few-Shot Segmentation 分类感知连体学习网络用于少镜头分割
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-08 DOI: 10.1007/s12559-024-10273-5
Hui Sun, Ziyan Zhang, Lili Huang, Bo Jiang, Bin Luo

Few-shot segmentation (FS) which aims to segment unseen query image based on a few annotated support samples is an active problem in computer vision and multimedia field. It is known that the core issue of FS is how to leverage the annotated information from the support images to guide query image segmentation. Existing methods mainly adopt Siamese Convolutional Neural Network (SCNN) which first encodes both support and query images and then utilizes the masked Global Average Pooling (GAP) to facilitate query image pixel-level representation and segmentation. However, this pipeline generally fails to fully exploit the category/class coherent information between support and query images. For FS task, one can observe that both support and query images share the same category information. This inherent property provides an important cue for FS task. However, previous methods generally fail to fully exploit it for FS task. To overcome this limitation, in this paper, we propose a novel Category-aware Siamese Learning Network (CaSLNet) to encode both support and query images. The proposed CaSLNet conducts Category Consistent Learning (CCL) for both support images and query images and thus can achieve the information communication between support and query images more sufficiently. Comprehensive experimental results on several public datasets demonstrate the advantage of our proposed CaSLNet. Our code is publicly available at https://github.com/HuiSun123/CaSLN.

少镜头分割(FS)的目的是根据少数有注释的支持样本来分割未见的查询图像,它是计算机视觉和多媒体领域的一个活跃问题。众所周知,FS 的核心问题是如何利用支持图像中的注释信息来指导查询图像的分割。现有的方法主要采用连体卷积神经网络(SCNN),它首先对支持图像和查询图像进行编码,然后利用掩码全局平均池化(GAP)来促进查询图像像素级的表示和分割。然而,这种方法通常无法充分利用支持图像和查询图像之间的类别/类一致性信息。在 FS 任务中,我们可以观察到支持图像和查询图像共享相同的类别信息。这一固有属性为 FS 任务提供了重要线索。然而,以往的方法通常无法在 FS 任务中充分利用这一特性。为了克服这一局限性,我们在本文中提出了一种新颖的类别感知连体学习网络(CaSLNet)来对支持图像和查询图像进行编码。所提出的 CaSLNet 对支持图像和查询图像都进行了类别一致学习(CCL),因此能更充分地实现支持图像和查询图像之间的信息沟通。在多个公开数据集上的综合实验结果证明了我们提出的 CaSLNet 的优势。我们的代码可在 https://github.com/HuiSun123/CaSLN 公开获取。
{"title":"Category-Aware Siamese Learning Network for Few-Shot Segmentation","authors":"Hui Sun, Ziyan Zhang, Lili Huang, Bo Jiang, Bin Luo","doi":"10.1007/s12559-024-10273-5","DOIUrl":"https://doi.org/10.1007/s12559-024-10273-5","url":null,"abstract":"<p>Few-shot segmentation (FS) which aims to segment unseen query image based on a few annotated support samples is an active problem in computer vision and multimedia field. It is known that the core issue of FS is how to leverage the annotated information from the support images to guide query image segmentation. Existing methods mainly adopt Siamese Convolutional Neural Network (SCNN) which first encodes both support and query images and then utilizes the masked Global Average Pooling (GAP) to facilitate query image pixel-level representation and segmentation. However, this pipeline generally fails to fully exploit the category/class coherent information between support and query images. <i>For FS task, one can observe that both support and query images share the same category information</i>. This inherent property provides an important cue for FS task. However, previous methods generally fail to fully exploit it for FS task. To overcome this limitation, in this paper, we propose a novel Category-aware Siamese Learning Network (CaSLNet) to encode both support and query images. The proposed CaSLNet conducts <i>Category Consistent Learning (CCL)</i> for both support images and query images and thus can achieve the information communication between support and query images more sufficiently. Comprehensive experimental results on several public datasets demonstrate the advantage of our proposed CaSLNet. Our code is publicly available at https://github.com/HuiSun123/CaSLN.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"35 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Constrastive Learning and Visual Transformers for Personal Recommendation 用于个人推荐的联合构造学习和视觉转换器
IF 5.4 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-08 DOI: 10.1007/s12559-024-10286-0
Asma Belhadi, Youcef Djenouri, Fabio Augusto de Alcantara Andrade, Gautam Srivastava

This paper introduces a novel solution for personal recommendation in consumer electronic applications. It addresses, on the one hand, the data confidentiality during the training, by exploring federated learning and trusted authority mechanisms. On the other hand, it deals with data quantity, and quality by exploring both transformers and consumer clustering. The process starts by clustering the consumers into similar clusters using contrastive learning and k-means algorithm. The local model of each consumer is trained on the local data. The local models of the consumers with the clustering information are then sent to the server, where integrity verification is performed by a trusted authority. Instead of traditional federated learning solutions, two kinds of aggregation are performed. The first one is the aggregation of all models of the consumers to derive the global model. The second one is the aggregation of the models of each cluster to derive a local model of similar consumers. Both models are sent to the consumers, where each consumer decides which appropriate model might be used for personal recommendation. Robust experiments have been carried out to demonstrate the applicability of the method using MovieLens-1M, and Amazon-book. The results reveal the superiority of the proposed method compared to the baseline methods, where it reaches an average accuracy of 0.27, against the other methods that do not exceed 0.25.

本文为消费电子应用中的个人推荐介绍了一种新颖的解决方案。一方面,它通过探索联合学习和可信权威机制,解决了训练过程中的数据保密问题。另一方面,它通过探索转换器和消费者聚类来解决数据数量和质量问题。在这一过程中,首先使用对比学习和 k-means 算法将消费者聚类为相似的群组。每个消费者的本地模型都是在本地数据上训练出来的。然后,消费者的本地模型和聚类信息被发送到服务器,由受信任的机构进行完整性验证。与传统的联合学习解决方案不同,有两种聚合方式。第一种是聚合消费者的所有模型,得出全局模型。第二种是聚合每个集群的模型,得出类似消费者的本地模型。这两个模型都会发送给消费者,由每个消费者决定哪一个合适的模型可用于个人推荐。为了证明该方法的适用性,我们使用 MovieLens-1M 和 Amazon-book 进行了大量实验。实验结果表明,与基线方法相比,所提出的方法更胜一筹,其平均准确率达到 0.27,而其他方法的平均准确率不超过 0.25。
{"title":"Federated Constrastive Learning and Visual Transformers for Personal Recommendation","authors":"Asma Belhadi, Youcef Djenouri, Fabio Augusto de Alcantara Andrade, Gautam Srivastava","doi":"10.1007/s12559-024-10286-0","DOIUrl":"https://doi.org/10.1007/s12559-024-10286-0","url":null,"abstract":"<p>This paper introduces a novel solution for personal recommendation in consumer electronic applications. It addresses, on the one hand, the data confidentiality during the training, by exploring federated learning and trusted authority mechanisms. On the other hand, it deals with data quantity, and quality by exploring both transformers and consumer clustering. The process starts by clustering the consumers into similar clusters using contrastive learning and k-means algorithm. The local model of each consumer is trained on the local data. The local models of the consumers with the clustering information are then sent to the server, where integrity verification is performed by a trusted authority. Instead of traditional federated learning solutions, two kinds of aggregation are performed. The first one is the aggregation of all models of the consumers to derive the global model. The second one is the aggregation of the models of each cluster to derive a local model of similar consumers. Both models are sent to the consumers, where each consumer decides which appropriate model might be used for personal recommendation. Robust experiments have been carried out to demonstrate the applicability of the method using MovieLens-1M, and Amazon-book. The results reveal the superiority of the proposed method compared to the baseline methods, where it reaches an average accuracy of 0.27, against the other methods that do not exceed 0.25.</p>","PeriodicalId":51243,"journal":{"name":"Cognitive Computation","volume":"36 1","pages":""},"PeriodicalIF":5.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140935658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Cognitive Computation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1