CAAI Transactions on Intelligence Technology最新文献

Guest Editorial: Knowledge-based deep learning system in bio-medicine 特邀社论：生物医学中基于知识的深度学习系统

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-08-09 DOI: 10.1049/cit2.12364

Yu-Dong Zhang, Juan Manuel Górriz

Numerous healthcare procedures can be viewed as medical sector decisions. In the modern era, computers have become indispensable in the realm of medical decision-making. However, the common view of computers in the medical field typically extends only to applications that support doctors in diagnosing diseases. To more tightly intertwine computers with the biomedical sciences, professionals are now more frequently utilising knowledge-driven deep learning systems (KDLS) and their foundational technologies, especially in the domain of neuroimaging (NI).Data for medical purposes can be sourced from a variety of imaging techniques, including but not limited to Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound, Single Photon Emission Computed Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Particle Imaging (MPI), Electroencephalography (EEG), Magnetoencephalography (MEG), Optical Microscopy and Tomography, Photoacoustic Tomography, Electron Tomography, and Atomic Force Microscopy.Historically, these imaging techniques have been analysed using traditional statistical methods, such as hypothesis testing or Bayesian inference, which often presuppose certain conditions that are not always met. An emerging solution is the implementation of machine learning (ML) within the context of KDLS, allowing for the empirical mapping of complex, multi-dimensional relationships within data sets.The objective of this special issue is to showcase the latest advancements in the methodology of KDLS for evaluating functional connectivity, neurological disorders, and clinical neuroscience, such as conditions like Alzheimer's, Parkinson's, cerebrovascular accidents, brain tumours, epilepsy, multiple sclerosis, ALS, Autism Spectrum Disorder, and more. Additionally, the special issue seeks to elucidate the mechanisms behind the predictive capabilities of ML methods within KDLS for brain-related diseases and disorders.We received an abundance of submissions, totalling more than 40, from over 10 countries. After a meticulous and rigorous peer review process, which employed a double-blind methodology, we ultimately selected eight outstanding papers for publication. This process ensured the highest standards of quality and impartiality in the selection.In the article ‘A deep learning fusion model for accurate classification of brain tumours in Magnetic Resonance images’, Zebari et al. created a robust deep learning (DL) fusion model for accurate brain tumour classification. To enhance performance, they employed data augmentation to expand the training dataset. The model leveraged VGG16, ResNet50, and convolutional deep belief networks to extract features from MRI images using a softmax classifier. By fusing features from two DL models, the fusion model notably boosted classification precision. Tested with a publicly available dataset, it achieved a remarkable 98.98% accuracy rate, outperforming existing me

许多医疗保健程序都可以被视为医疗部门的决策。在现代，计算机已成为医疗决策领域不可或缺的工具。然而，通常人们对计算机在医疗领域的应用仅限于辅助医生诊断疾病。为了将计算机与生物医学更紧密地结合在一起，专业人士现在更频繁地使用知识驱动的深度学习系统（KDLS）及其基础技术，尤其是在神经成像（NI）领域。用于医疗目的的数据可以来自各种成像技术，包括但不限于计算机断层扫描（CT）、磁共振成像（MRI）、超声波、单光子发射计算机断层扫描（SPECT）、正电子发射计算机断层扫描（PET）、磁粉成像（MPI）、脑电图（EEG）、脑磁图（MEG）、光学显微镜和断层扫描、光声断层扫描、电子断层扫描和原子力显微镜。一直以来，这些成像技术都是采用传统的统计方法进行分析的，如假设检验或贝叶斯推理，而这些方法往往预先假定了某些条件，但这些条件并非总能得到满足。本特刊旨在展示 KDLS 在评估功能连接、神经系统疾病和临床神经科学方面的最新进展，例如阿尔茨海默病、帕金森病、脑血管意外、脑肿瘤、癫痫、多发性硬化、渐冻人症、自闭症等疾病。此外，该特刊还试图阐明 KDLS 中的 ML 方法对脑相关疾病和障碍的预测能力背后的机制。我们收到了来自 10 多个国家的大量投稿，共计 40 多篇。我们收到了来自 10 多个国家的大量投稿，共计 40 篇。经过细致严格的同行评审（采用双盲方法），我们最终选出了 8 篇优秀论文予以发表。在《用于磁共振图像中脑肿瘤精确分类的深度学习融合模型》一文中，Zebari 等人创建了一个强大的深度学习（DL）融合模型，用于精确的脑肿瘤分类。为了提高性能，他们采用了数据增强技术来扩展训练数据集。该模型利用 VGG16、ResNet50 和卷积深度信念网络，使用 softmax 分类器从 MRI 图像中提取特征。通过融合两个 DL 模型的特征，融合模型显著提高了分类精度。在《基于知识的深度学习分类系统》一文中，Dhaygude 等人提出了一种融合了多任务学习和注意力机制的深度三维卷积神经网络。他们利用升级后的初级 C3D 网络来创建更粗糙的底层特征图。它引入了一个新的卷积块，重点关注磁共振成像图像的结构方面，另一个卷积块则提取特征图中某些像素位置特有的注意力权重，并与特征图输出相乘。然后，使用多个全连接层实现多任务学习，产生三个输出，包括主要分类任务。另外两个输出在训练过程中采用反向传播，以改进主要分类工作。实验结果表明，作者提出的方法优于当前的 AD 分类方法，在阿尔茨海默病神经影像倡议数据集上实现了更高的分类准确率和其他指标。在题为 "A novel medical image data protection scheme for smart healthcare system "的论文中，Rehman 等人提出了一种利用位平面分解和混沌理论的轻量级医学图像加密方案。实验结果表明，该方案的熵值为 7.999，能量为 0.0156，相关性为 0.0001。在题为 "通过动态网络实现图像超分辨率 "的论文中，Tian 等人提出了一种用于图像超分辨率的动态网络（DSRNet），它包含残差增强块、宽增强块、特征细化块和构造块。

{"title":"Guest Editorial: Knowledge-based deep learning system in bio-medicine","authors":"Yu-Dong Zhang, Juan Manuel Górriz","doi":"10.1049/cit2.12364","DOIUrl":"https://doi.org/10.1049/cit2.12364","url":null,"abstract":"Numerous healthcare procedures can be viewed as medical sector decisions. In the modern era, computers have become indispensable in the realm of medical decision-making. However, the common view of computers in the medical field typically extends only to applications that support doctors in diagnosing diseases. To more tightly intertwine computers with the biomedical sciences, professionals are now more frequently utilising knowledge-driven deep learning systems (KDLS) and their foundational technologies, especially in the domain of neuroimaging (NI).Data for medical purposes can be sourced from a variety of imaging techniques, including but not limited to Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Ultrasound, Single Photon Emission Computed Tomography (SPECT), Positron Emission Tomography (PET), Magnetic Particle Imaging (MPI), Electroencephalography (EEG), Magnetoencephalography (MEG), Optical Microscopy and Tomography, Photoacoustic Tomography, Electron Tomography, and Atomic Force Microscopy.Historically, these imaging techniques have been analysed using traditional statistical methods, such as hypothesis testing or Bayesian inference, which often presuppose certain conditions that are not always met. An emerging solution is the implementation of machine learning (ML) within the context of KDLS, allowing for the empirical mapping of complex, multi-dimensional relationships within data sets.The objective of this special issue is to showcase the latest advancements in the methodology of KDLS for evaluating functional connectivity, neurological disorders, and clinical neuroscience, such as conditions like Alzheimer's, Parkinson's, cerebrovascular accidents, brain tumours, epilepsy, multiple sclerosis, ALS, Autism Spectrum Disorder, and more. Additionally, the special issue seeks to elucidate the mechanisms behind the predictive capabilities of ML methods within KDLS for brain-related diseases and disorders.We received an abundance of submissions, totalling more than 40, from over 10 countries. After a meticulous and rigorous peer review process, which employed a double-blind methodology, we ultimately selected eight outstanding papers for publication. This process ensured the highest standards of quality and impartiality in the selection.In the article ‘A deep learning fusion model for accurate classification of brain tumours in Magnetic Resonance images’, Zebari et al. created a robust deep learning (DL) fusion model for accurate brain tumour classification. To enhance performance, they employed data augmentation to expand the training dataset. The model leveraged VGG16, ResNet50, and convolutional deep belief networks to extract features from MRI images using a softmax classifier. By fusing features from two DL models, the fusion model notably boosted classification precision. Tested with a publicly available dataset, it achieved a remarkable 98.98% accuracy rate, outperforming existing me","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"787-789"},"PeriodicalIF":8.4,"publicationDate":"2024-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12364","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Guest Editorial: Special issue on trustworthy machine learning for behavioural and social computing 客座编辑：行为和社交计算的可信机器学习特刊

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-06-08 DOI: 10.1049/cit2.12353

Zhi-Hui Zhan, Jianxin Li, Xuyun Zhang, Deepak Puthal

Machine learning has been extensively applied in behavioural and social computing, encompassing a spectrum of applications such as social network analysis, click stream analysis, recommendation of points of interest, and sentiment analysis. The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics, posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks. To safeguard individuals from potential privacy breaches, numerous governments have enacted a range of legal frameworks and regulatory measures. Examples include the Personal Information Protection Law of the People's Republic of China, the European Union's GDPR for privacy, and Australia's Artificial Intelligence Ethics Framework for many ethical aspects like fairness and reliability. Despite these legislative efforts, the technical implementation of these regulations to ensure trustworthy machine learning in behavioural and social computing remains a significant challenge. Trustworthy machine learning, being a fast-developing field, necessitates further in-depth exploration across multiple dimensions, including but not limited to fairness, privacy, reliability, explainability, robustness, and security, from a holistic and interdisciplinary viewpoint. This special issue is dedicated to facilitating the exchange and discussion of state-of-the-art research findings from academia and industry alike. The seven high-quality papers collected in this special issue place a particular emphasis on showcasing the latest advancements in concepts, algorithms, systems, platforms, and applications, as well as exploring future trends pertinent to the field of trustworthy machine learning for behavioural and social computing.In the first paper, ‘Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification’, Yong Li et al. have developed a semi-supervised framework for the detection of anomalous merchants within the logistics sector. The methodology begins with an extensive data-driven examination comparing the behaviours of regular and anomalous customers. Utilising the insights from this analysis, the authors then implemented a contrastive learning for data augmentation, which capitalises on the imprecise labelling of customer data. Subsequently, their model is employed to identify customers exhibiting abnormal package reception and dispatch patterns in logistics operations. The framework's efficacy is substantiated by an empirical study that leverages 8 months of authentic order data, sourced from Beijing and provided by one of China's foremost logistics corporations.The second paper, entitled ‘Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs’ by Sandra Carrasco Limeros et al., is advancing toward the creation of dependable motion prediction models, with a focus on the evaluation, robustness, and

机器学习已广泛应用于行为和社交计算领域，包括社交网络分析、点击流分析、兴趣点推荐和情感分析等一系列应用。与这些应用相关的数据集与人类行为和社会动态有着内在联系，如果处理不当或受到攻击，就有可能泄露个人或敏感信息。为了保护个人隐私不被侵犯，许多国家的政府制定了一系列法律框架和监管措施。这方面的例子包括《中华人民共和国个人信息保护法》、欧盟针对隐私的 GDPR 以及澳大利亚针对公平性和可靠性等诸多伦理方面的《人工智能伦理框架》。尽管做出了这些立法努力，但如何在技术上落实这些法规，以确保行为和社交计算中的机器学习值得信赖，仍然是一项重大挑战。值得信赖的机器学习是一个快速发展的领域，需要从整体和跨学科的角度，从多个维度进一步深入探讨，包括但不限于公平性、隐私、可靠性、可解释性、稳健性和安全性。本特刊致力于促进学术界和业界对最新研究成果的交流和讨论。本特刊收录的七篇高质量论文特别强调展示概念、算法、系统、平台和应用方面的最新进展，以及探索与行为和社交计算领域可信机器学习相关的未来趋势。在第一篇论文《商户识别中从线上到线下物流业务的可信半监督异常检测》中，李勇等人开发了一个半监督框架，用于检测物流行业中的异常商户。该方法首先对常规客户和异常客户的行为进行了广泛的数据驱动检查。利用这一分析的洞察力，作者随后实施了一种用于数据增强的对比学习方法，该方法利用了客户数据的不精确标签。随后，他们采用该模型来识别物流运营中表现出异常包裹接收和发送模式的客户。第二篇论文题为 "迈向可信的多模式运动预测"：Sandra Carrasco Limeros 等人撰写的第二篇论文题为 "迈向值得信赖的多模式运动预测：输出的整体评估和可解释性"，该论文致力于创建可靠的运动预测模型，重点关注结果的评估、稳健性和可解释性。论文首先强调了现有评估方法的主要差异和不足，尤其是缺乏多样性评估和与交通场景的兼容性。然后，通过稳健性分析，作者证明了无法感知道路地形比无法感知其他道路使用者对系统性能的影响更为明显。在此基础上，作者提出了 DenseTNT-意图模型的输出结果，该模型展示了多样化、合规和精确的高级意图，从而提高了预测的整体质量。总的来说，我们提出的方法和研究结果为自动驾驶汽车可信运动预测系统的发展做出了重大贡献：Yue Cong 等人的第三篇论文题为 "Ada-FFL：自适应计算公平性联合学习"，介绍了一种自适应公平性联合学习方法，这是一种自适应公平性聚合技术，可在联合学习过程中考虑本地模型更新的差异。这种方法提供了一种更灵活的聚合机制，使其能够适应各种联合数据集。随后，作者详细研究了单个客户端对公平系数的影响。基于这些见解，作者提出了一种新方法，能以更有效的方式显著提高联合学习系统的性能和公平性。在各种联合数据集上进行的一系列综合实验评估得出的结果证明了所提方法的优越性。与现有的基线方法相比，这些结果凸显了所提出的方法在模型性能和公平性方面的明显优势。我们希望这些入选作品能加深社会各界对当前流行趋势的理解，为今后的探索提供路径。我们衷心感谢所有投稿者选择本特刊作为分享其学术见解的场所。我们还要感谢读者，他们富有洞察力和建设性的评论对作者大有裨益。此外，我们还要感谢 IET 团队在本特刊编写过程中给予的坚定支持和指导。

{"title":"Guest Editorial: Special issue on trustworthy machine learning for behavioural and social computing","authors":"Zhi-Hui Zhan, Jianxin Li, Xuyun Zhang, Deepak Puthal","doi":"10.1049/cit2.12353","DOIUrl":"https://doi.org/10.1049/cit2.12353","url":null,"abstract":"Machine learning has been extensively applied in behavioural and social computing, encompassing a spectrum of applications such as social network analysis, click stream analysis, recommendation of points of interest, and sentiment analysis. The datasets pertinent to these applications are inherently linked to human behaviour and societal dynamics, posing a risk of disclosing personal or sensitive information if mishandled or subjected to attacks. To safeguard individuals from potential privacy breaches, numerous governments have enacted a range of legal frameworks and regulatory measures. Examples include the Personal Information Protection Law of the People's Republic of China, the European Union's GDPR for privacy, and Australia's Artificial Intelligence Ethics Framework for many ethical aspects like fairness and reliability. Despite these legislative efforts, the technical implementation of these regulations to ensure trustworthy machine learning in behavioural and social computing remains a significant challenge. Trustworthy machine learning, being a fast-developing field, necessitates further in-depth exploration across multiple dimensions, including but not limited to fairness, privacy, reliability, explainability, robustness, and security, from a holistic and interdisciplinary viewpoint. This special issue is dedicated to facilitating the exchange and discussion of state-of-the-art research findings from academia and industry alike. The seven high-quality papers collected in this special issue place a particular emphasis on showcasing the latest advancements in concepts, algorithms, systems, platforms, and applications, as well as exploring future trends pertinent to the field of trustworthy machine learning for behavioural and social computing.In the first paper, ‘Trustworthy semi-supervised anomaly detection for online-to-offline logistics business in merchant identification’, Yong Li et al. have developed a semi-supervised framework for the detection of anomalous merchants within the logistics sector. The methodology begins with an extensive data-driven examination comparing the behaviours of regular and anomalous customers. Utilising the insights from this analysis, the authors then implemented a contrastive learning for data augmentation, which capitalises on the imprecise labelling of customer data. Subsequently, their model is employed to identify customers exhibiting abnormal package reception and dispatch patterns in logistics operations. The framework's efficacy is substantiated by an empirical study that leverages 8 months of authentic order data, sourced from Beijing and provided by one of China's foremost logistics corporations.The second paper, entitled ‘Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs’ by Sandra Carrasco Limeros et al., is advancing toward the creation of dependable motion prediction models, with a focus on the evaluation, robustness, and","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"541-543"},"PeriodicalIF":5.1,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12353","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141430192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A fault-tolerant and scalable boosting method over vertically partitioned data 垂直分区数据上的容错和可扩展提升方法

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-06-05 DOI: 10.1049/cit2.12339

Hai Jiang, Songtao Shang, Peng Liu, Tong Yi

Vertical federated learning (VFL) can learn a common machine learning model over vertically partitioned datasets. However, VFL are faced with these thorny problems: (1) both the training and prediction are very vulnerable to stragglers; (2) most VFL methods can only support a specific machine learning model. Suppose that VFL incorporates the features of centralised learning, then the above issues can be alleviated. With that in mind, this paper proposes a new VFL scheme, called FedBoost, which makes private parties upload the compressed partial order relations to the honest but curious server before training and prediction. The server can build a machine learning model and predict samples on the union of coded data. The theoretical analysis indicates that the absence of any private party will not affect the training and prediction as long as a round of communication is achieved. Our scheme can support canonical tree-based models such as Tree Boosting methods and Random Forests. The experimental results also demonstrate the availability of our scheme.

垂直联合学习（VFL）可以在垂直分割的数据集上学习一个通用的机器学习模型。然而，垂直联合学习面临着这些棘手的问题：（1）训练和预测都很容易受到散兵游勇的影响；（2）大多数垂直联合学习方法只能支持特定的机器学习模型。假设 VFL 结合了集中学习的特点，那么上述问题就可以得到缓解。有鉴于此，本文提出了一种新的 VFL 方案，称为 FedBoost，它让私人方在训练和预测前将压缩的部分秩关系上传到诚实但好奇的服务器。服务器可以建立机器学习模型，并对编码数据的联合样本进行预测。理论分析表明，只要实现一轮通信，没有任何私有方的存在不会影响训练和预测。我们的方案可以支持基于树的典型模型，如树提升法和随机森林。实验结果也证明了我们方案的可用性。

引用次数: 0

Multi-objective interval type-2 fuzzy linear programming problem with vagueness in coefficient 系数模糊的多目标区间 2 型模糊线性规划问题

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-05-13 DOI: 10.1049/cit2.12336

Shokouh Sargolzaei, Hassan Mishmast Nehi

One of the most widely used fuzzy linear programming models is the multi-objective interval type-2 fuzzy linear programming (IT2FLP) model, which is of particular importance due to the simultaneous integration of multiple criteria and objectives in a single problem, the fuzzy nature of this type of problems, and thus, its closer similarity to real-world problems. So far, many studies have been done for the IT2FLP problem with uncertainties of the vagueness type. However, not enough studies have been done regarding the multi-objective interval type-2 fuzzy linear programming (MOIT2FLP) problem with uncertainties of the vagueness type. As an innovation, this study investigates the MOIT2FLP problem with vagueness-type uncertainties, which are represented by membership functions (MFs) in the problem. Depending on the localisation of vagueness in the problem, that is, vagueness in the objective function vector, vagueness in the technological coefficients, vagueness in the resources vector, and any possible combination of them, various problems may arise. Furthermore, to solve problems with MOIT2FLP, first, using the weighted sum method as an efficient and effective method, each of the MOIT2FLP problems is converted into a single-objective problem. In this research, these types of problems are introduced, their MFs are stated, and different solution methods are suggested. For each of the proposed methods, the authors have provided an example and presented the results in the corresponding tables.

多目标区间-2 型模糊线性规划（IT2FLP）模型是应用最广泛的模糊线性规划模型之一，由于在单一问题中同时集成多个标准和目标、这类问题的模糊性质以及与现实世界问题的相似性，该模型显得尤为重要。迄今为止，针对具有模糊类型不确定性的 IT2FLP 问题已经进行了很多研究。然而，对于具有模糊类型不确定性的多目标区间 2 型模糊线性规划（MOIT2FLP）问题的研究还不够多。作为一项创新，本研究探讨了具有模糊型不确定性的 MOIT2FLP 问题，问题中的模糊型不确定性由成员函数（MF）表示。根据问题中模糊性的定位，即目标函数向量中的模糊性、技术系数中的模糊性、资源向量中的模糊性以及它们的任何可能组合，可能会出现各种问题。此外，要利用 MOIT2FLP 解决问题，首先要利用加权和法这一高效方法，将 MOIT2FLP 的每个问题转化为单目标问题。本研究介绍了这些类型的问题，阐述了它们的 MF，并提出了不同的求解方法。对于每种建议的方法，作者都提供了一个示例，并在相应的表格中给出了结果。

{"title":"Multi-objective interval type-2 fuzzy linear programming problem with vagueness in coefficient","authors":"Shokouh Sargolzaei, Hassan Mishmast Nehi","doi":"10.1049/cit2.12336","DOIUrl":"10.1049/cit2.12336","url":null,"abstract":"One of the most widely used fuzzy linear programming models is the multi-objective interval type-2 fuzzy linear programming (IT2FLP) model, which is of particular importance due to the simultaneous integration of multiple criteria and objectives in a single problem, the fuzzy nature of this type of problems, and thus, its closer similarity to real-world problems. So far, many studies have been done for the IT2FLP problem with uncertainties of the vagueness type. However, not enough studies have been done regarding the multi-objective interval type-2 fuzzy linear programming (MOIT2FLP) problem with uncertainties of the vagueness type. As an innovation, this study investigates the MOIT2FLP problem with vagueness-type uncertainties, which are represented by membership functions (MFs) in the problem. Depending on the localisation of vagueness in the problem, that is, vagueness in the objective function vector, vagueness in the technological coefficients, vagueness in the resources vector, and any possible combination of them, various problems may arise. Furthermore, to solve problems with MOIT2FLP, first, using the weighted sum method as an efficient and effective method, each of the MOIT2FLP problems is converted into a single-objective problem. In this research, these types of problems are introduced, their MFs are stated, and different solution methods are suggested. For each of the proposed methods, the authors have provided an example and presented the results in the corresponding tables.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1229-1248"},"PeriodicalIF":8.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12336","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140983368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prediction and optimisation of gasoline quality in petroleum refining: The use of machine learning model as a surrogate in optimisation framework 预测和优化石油炼制过程中的汽油质量：在优化框架中使用机器学习模型作为替代品

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-05-13 DOI: 10.1049/cit2.12324

Husnain Saghir, Iftikhar Ahmad, Manabu Kano, Hakan Caliskan, Hiki Hong

Hardware-based sensing frameworks such as cooperative fuel research engines are conventionally used to monitor research octane number (RON) in the petroleum refining industry. Machine learning techniques are employed to predict the RON of integrated naphtha reforming and isomerisation processes. A dynamic Aspen HYSYS model was used to generate data by introducing artificial uncertainties in the range of ±5% in process conditions, such as temperature, flow rates, etc. The generated data was used to train support vector machines (SVM), Gaussian process regression (GPR), artificial neural networks (ANN), regression trees (RT), and ensemble trees (ET). Hyperparameter tuning was performed to enhance the prediction capabilities of GPR, ANN, SVM, ET and RT models. Performance analysis of the models indicates that GPR, ANN, and SVM with R² values of 0.99, 0.978, and 0.979 and RMSE values of 0.108, 0.262, and 0.258, respectively performed better than the remaining models and had the prediction capability to capture the RON dependence on predictor variables. ET and RT had an R² value of 0.94 and 0.89, respectively. The GPR model was used as a surrogate model for fitness function evaluations in two optimisation frameworks based on genetic algorithm and particle swarm method. Optimal parameter values found by the optimisation methodology increased the RON value by 3.52%. The proposed methodology of surrogate-based optimisation will provide a platform for plant-level implementation to realise the concept of industry 4.0 in the refinery.

基于硬件的传感框架（如合作燃料研究引擎）通常用于监测石油精炼行业的研究辛烷值（RON）。机器学习技术被用于预测集成石脑油重整和异构化过程的 RON。使用动态 Aspen HYSYS 模型生成数据，在温度、流速等工艺条件中引入 ±5% 范围内的人为不确定性。生成的数据用于训练支持向量机 (SVM)、高斯过程回归 (GPR)、人工神经网络 (ANN)、回归树 (RT) 和集合树 (ET)。对超参数进行了调整，以增强 GPR、ANN、SVM、ET 和 RT 模型的预测能力。对模型的性能分析表明，GPR、ANN 和 SVM 的 R2 值分别为 0.99、0.978 和 0.979，RMSE 值分别为 0.108、0.262 和 0.258，其性能优于其余模型，并具有捕捉 RON 与预测变量相关性的预测能力。ET 和 RT 的 R2 值分别为 0.94 和 0.89。在基于遗传算法和粒子群法的两个优化框架中，GPR 模型被用作适合度函数评估的替代模型。通过优化方法找到的最佳参数值将 RON 值提高了 3.52%。所提出的基于代用模型的优化方法将为在炼油厂实现工业 4.0 概念提供一个工厂级实施平台。

{"title":"Prediction and optimisation of gasoline quality in petroleum refining: The use of machine learning model as a surrogate in optimisation framework","authors":"Husnain Saghir, Iftikhar Ahmad, Manabu Kano, Hakan Caliskan, Hiki Hong","doi":"10.1049/cit2.12324","DOIUrl":"10.1049/cit2.12324","url":null,"abstract":"Hardware-based sensing frameworks such as cooperative fuel research engines are conventionally used to monitor research octane number (RON) in the petroleum refining industry. Machine learning techniques are employed to predict the RON of integrated naphtha reforming and isomerisation processes. A dynamic Aspen HYSYS model was used to generate data by introducing artificial uncertainties in the range of ±5% in process conditions, such as temperature, flow rates, etc. The generated data was used to train support vector machines (SVM), Gaussian process regression (GPR), artificial neural networks (ANN), regression trees (RT), and ensemble trees (ET). Hyperparameter tuning was performed to enhance the prediction capabilities of GPR, ANN, SVM, ET and RT models. Performance analysis of the models indicates that GPR, ANN, and SVM with R2 values of 0.99, 0.978, and 0.979 and RMSE values of 0.108, 0.262, and 0.258, respectively performed better than the remaining models and had the prediction capability to capture the RON dependence on predictor variables. ET and RT had an R2 value of 0.94 and 0.89, respectively. The GPR model was used as a surrogate model for fitness function evaluations in two optimisation frameworks based on genetic algorithm and particle swarm method. Optimal parameter values found by the optimisation methodology increased the RON value by 3.52%. The proposed methodology of surrogate-based optimisation will provide a platform for plant-level implementation to realise the concept of industry 4.0 in the refinery.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1185-1198"},"PeriodicalIF":8.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12324","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140983874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Residual multimodal Transformer for expression-EEG fusion continuous emotion recognition 用于表情-EEG 融合连续情绪识别的残差多模态变换器

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-05-08 DOI: 10.1049/cit2.12346

Xiaofang Jin, Jieyu Xiao, Libiao Jin, Xinruo Zhang

Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has been used in this field, while there are with some limitations in current researches, such as hand-engineered features, simple approaches to integration. Hence, a new continuous emotion recognition model is proposed based on the fusion of EEG and facial expressions videos named residual multimodal Transformer (RMMT). Firstly, the Resnet50 and temporal convolutional network (TCN) are utilised to extract spatiotemporal features from videos, and the TCN is also applied to process the computed EEG frequency power to acquire spatiotemporal features of EEG. Then, a multimodal Transformer is used to fuse the spatiotemporal features from the two modalities. Furthermore, a residual connection is introduced to fuse shallow features with deep features which is verified to be effective for continuous emotion recognition through experiments. Inspired by knowledge distillation, the authors incorporate feature-level loss into the loss function to further enhance the network performance. Experimental results show that the RMMT reaches a superior performance over other methods for the MAHNOB-HCI dataset. Ablation studies on the residual connection and loss function in the RMMT demonstrate that both of them is functional.

连续情绪识别是通过情感信息预测情绪状态，更加关注情绪的连续变化。脑电图（EEG）和面部表情视频的融合已被应用于这一领域，但目前的研究还存在一些局限性，如手工特征设计、简单的融合方法等。因此，我们提出了一种基于脑电图和面部表情视频融合的新的连续情感识别模型，命名为残差多模态变换器（RMMT）。首先，利用 Resnet50 和时空卷积网络（TCN）从视频中提取时空特征，并应用 TCN 处理计算出的脑电图频率功率，以获取脑电图的时空特征。然后，使用多模态变换器融合两种模态的时空特征。此外，还引入了残差连接，将浅层特征与深层特征进行融合，并通过实验验证了该方法对连续情绪识别的有效性。受知识提炼的启发，作者在损失函数中加入了特征级损失，以进一步提高网络性能。实验结果表明，在 MAHNOB-HCI 数据集上，RMMT 的性能优于其他方法。对 RMMT 中的残余连接和损失函数进行的消融研究表明，它们都是功能性的。

{"title":"Residual multimodal Transformer for expression-EEG fusion continuous emotion recognition","authors":"Xiaofang Jin, Jieyu Xiao, Libiao Jin, Xinruo Zhang","doi":"10.1049/cit2.12346","DOIUrl":"10.1049/cit2.12346","url":null,"abstract":"Continuous emotion recognition is to predict emotion states through affective information and more focus on the continuous variation of emotion. Fusion of electroencephalography (EEG) and facial expressions videos has been used in this field, while there are with some limitations in current researches, such as hand-engineered features, simple approaches to integration. Hence, a new continuous emotion recognition model is proposed based on the fusion of EEG and facial expressions videos named residual multimodal Transformer (RMMT). Firstly, the Resnet50 and temporal convolutional network (TCN) are utilised to extract spatiotemporal features from videos, and the TCN is also applied to process the computed EEG frequency power to acquire spatiotemporal features of EEG. Then, a multimodal Transformer is used to fuse the spatiotemporal features from the two modalities. Furthermore, a residual connection is introduced to fuse shallow features with deep features which is verified to be effective for continuous emotion recognition through experiments. Inspired by knowledge distillation, the authors incorporate feature-level loss into the loss function to further enhance the network performance. Experimental results show that the RMMT reaches a superior performance over other methods for the MAHNOB-HCI dataset. Ablation studies on the residual connection and loss function in the RMMT demonstrate that both of them is functional.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1290-1304"},"PeriodicalIF":8.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12346","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141000683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Join multiple Riemannian manifold representation and multi-kernel non-redundancy for image clustering 加入多黎曼流形表示和多核非冗余性以进行图像聚类

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-05-08 DOI: 10.1049/cit2.12347

Mengyuan Zhang, Jinglei Liu

Image clustering has received significant attention due to the growing importance of image recognition. Researchers have explored Riemannian manifold clustering, which is capable of capturing the non-linear shapes found in real-world datasets. However, the complexity of image data poses substantial challenges for modelling and feature extraction. Traditional methods such as covariance matrices and linear subspace have shown promise in image modelling, and they are still in their early stages and suffer from certain limitations. However, these include the uncertainty of representing data using only one Riemannian manifold, limited feature extraction capacity of single kernel functions, and resulting incomplete data representation and redundancy. To overcome these limitations, the authors propose a novel approach called join multiple Riemannian manifold representation and multi-kernel non-redundancy for image clustering (MRMNR-MKC). It combines covariance matrices with linear subspace to represent data and applies multiple kernel functions to map the non-linear structural data into a reproducing kernel Hilbert space, enabling linear model analysis for image clustering. Additionally, the authors use matrix-induced regularisation to improve the clustering kernel selection process by reducing redundancy and assigning lower weights to identical kernels. Finally, the authors also conducted numerous experiments to evaluate the performance of our approach, confirming its superiority to state-of-the-art methods on three benchmark datasets.

由于图像识别的重要性与日俱增，图像聚类受到了广泛关注。研究人员探索了黎曼流形聚类，它能够捕捉现实世界数据集中的非线性形状。然而，图像数据的复杂性给建模和特征提取带来了巨大挑战。协方差矩阵和线性子空间等传统方法在图像建模方面已显示出良好的前景，但这些方法仍处于早期阶段，存在一定的局限性。然而，这些局限包括仅使用一个黎曼流形表示数据的不确定性、单核函数的特征提取能力有限，以及由此导致的数据表示不完整和冗余。为了克服这些局限性，作者提出了一种名为 "联合多黎曼流形表示和多核非冗余图像聚类（MRMNR-MKC）"的新方法。它将协方差矩阵与线性子空间相结合来表示数据，并应用多核函数将非线性结构数据映射到重现核希尔伯特空间，从而实现图像聚类的线性模型分析。此外，作者还利用矩阵诱导正则化技术，通过减少冗余和为相同内核分配较低权重来改进聚类内核选择过程。最后，作者还进行了大量实验来评估我们的方法的性能，并在三个基准数据集上证实其优于最先进的方法。

{"title":"Join multiple Riemannian manifold representation and multi-kernel non-redundancy for image clustering","authors":"Mengyuan Zhang, Jinglei Liu","doi":"10.1049/cit2.12347","DOIUrl":"10.1049/cit2.12347","url":null,"abstract":"Image clustering has received significant attention due to the growing importance of image recognition. Researchers have explored Riemannian manifold clustering, which is capable of capturing the non-linear shapes found in real-world datasets. However, the complexity of image data poses substantial challenges for modelling and feature extraction. Traditional methods such as covariance matrices and linear subspace have shown promise in image modelling, and they are still in their early stages and suffer from certain limitations. However, these include the uncertainty of representing data using only one Riemannian manifold, limited feature extraction capacity of single kernel functions, and resulting incomplete data representation and redundancy. To overcome these limitations, the authors propose a novel approach called join multiple Riemannian manifold representation and multi-kernel non-redundancy for image clustering (MRMNR-MKC). It combines covariance matrices with linear subspace to represent data and applies multiple kernel functions to map the non-linear structural data into a reproducing kernel Hilbert space, enabling linear model analysis for image clustering. Additionally, the authors use matrix-induced regularisation to improve the clustering kernel selection process by reducing redundancy and assigning lower weights to identical kernels. Finally, the authors also conducted numerous experiments to evaluate the performance of our approach, confirming its superiority to state-of-the-art methods on three benchmark datasets.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1305-1319"},"PeriodicalIF":8.4,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12347","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140999031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepGCN based on variable multi-graph and multimodal data for ASD diagnosis 基于可变多图和多模态数据的 DeepGCN，用于 ASD 诊断

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-05-03 DOI: 10.1049/cit2.12340

Shuaiqi Liu, Siqi Wang, Chaolei Sun, Bing Li, Shuihua Wang, Fei Li

Diagnosing individuals with autism spectrum disorder (ASD) accurately faces great challenges in clinical practice, primarily due to the data's high heterogeneity and limited sample size. To tackle this issue, the authors constructed a deep graph convolutional network (GCN) based on variable multi-graph and multimodal data (VMM-DGCN) for ASD diagnosis. Firstly, the functional connectivity matrix was constructed to extract primary features. Then, the authors constructed a variable multi-graph construction strategy to capture the multi-scale feature representations of each subject by utilising convolutional filters with varying kernel sizes. Furthermore, the authors brought the non-imaging information into the feature representation at each scale and constructed multiple population graphs based on multimodal data by fully considering the correlation between subjects. After extracting the deeper features of population graphs using the deep GCN(DeepGCN), the authors fused the node features of multiple subgraphs to perform node classification tasks for typical control and ASD patients. The proposed algorithm was evaluated on the Autism Brain Imaging Data Exchange I (ABIDE I) dataset, achieving an accuracy of 91.62% and an area under the curve value of 95.74%. These results demonstrated its outstanding performance compared to other ASD diagnostic algorithms.

在临床实践中，准确诊断自闭症谱系障碍（ASD）患者面临着巨大挑战，这主要是由于数据的高度异质性和有限的样本量。为解决这一问题，作者构建了基于可变多图和多模态数据的深度图卷积网络（GCN），用于 ASD 诊断。首先，构建功能连接矩阵以提取主要特征。然后，作者构建了一种可变多图构建策略，利用不同核大小的卷积滤波器捕捉每个受试者的多尺度特征表征。此外，作者还将非成像信息引入每个尺度的特征表示中，并通过充分考虑受试者之间的相关性，基于多模态数据构建了多个群体图。在使用深度 GCN（DeepGCN）提取群体图的深层特征后，作者融合了多个子图的节点特征，对典型对照组和 ASD 患者执行了节点分类任务。所提出的算法在自闭症脑成像数据交换 I（ABIDE I）数据集上进行了评估，准确率达到 91.62%，曲线下面积值达到 95.74%。这些结果表明，与其他 ASD 诊断算法相比，该算法具有出色的性能。

{"title":"DeepGCN based on variable multi-graph and multimodal data for ASD diagnosis","authors":"Shuaiqi Liu, Siqi Wang, Chaolei Sun, Bing Li, Shuihua Wang, Fei Li","doi":"10.1049/cit2.12340","DOIUrl":"10.1049/cit2.12340","url":null,"abstract":"Diagnosing individuals with autism spectrum disorder (ASD) accurately faces great challenges in clinical practice, primarily due to the data's high heterogeneity and limited sample size. To tackle this issue, the authors constructed a deep graph convolutional network (GCN) based on variable multi-graph and multimodal data (VMM-DGCN) for ASD diagnosis. Firstly, the functional connectivity matrix was constructed to extract primary features. Then, the authors constructed a variable multi-graph construction strategy to capture the multi-scale feature representations of each subject by utilising convolutional filters with varying kernel sizes. Furthermore, the authors brought the non-imaging information into the feature representation at each scale and constructed multiple population graphs based on multimodal data by fully considering the correlation between subjects. After extracting the deeper features of population graphs using the deep GCN(DeepGCN), the authors fused the node features of multiple subgraphs to perform node classification tasks for typical control and ASD patients. The proposed algorithm was evaluated on the Autism Brain Imaging Data Exchange I (ABIDE I) dataset, achieving an accuracy of 91.62% and an area under the curve value of 95.74%. These results demonstrated its outstanding performance compared to other ASD diagnostic algorithms.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"879-893"},"PeriodicalIF":8.4,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12340","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141017242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian attention-based user behaviour modelling for click-through rate prediction 基于贝叶斯注意力的用户行为建模，用于预测点击率

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-05-01 DOI: 10.1049/cit2.12343

Yihao Zhang, Mian Chen, Ruizhen Chen, Chu Zhao, Meng Yuan, Zhu Sun

Exploiting the hierarchical dependence behind user behaviour is critical for click-through rate (CRT) prediction in recommender systems. Existing methods apply attention mechanisms to obtain the weights of items; however, the authors argue that deterministic attention mechanisms cannot capture the hierarchical dependence between user behaviours because they treat each user behaviour as an independent individual and cannot accurately express users' flexible and changeable interests. To tackle this issue, the authors introduce the Bayesian attention to the CTR prediction model, which treats attention weights as data-dependent local random variables and learns their distribution by approximating their posterior distribution. Specifically, the prior knowledge is constructed into the attention weight distribution, and then the posterior inference is utilised to capture the implicit and flexible user intentions. Extensive experiments on public datasets demonstrate that our algorithm outperforms state-of-the-art algorithms. Empirical evidence shows that random attention weights can predict user intentions better than deterministic ones.

利用用户行为背后的层次依赖性对于推荐系统中的点击率（CRT）预测至关重要。然而，作者认为，确定性注意力机制无法捕捉用户行为之间的层次依赖性，因为它们将每个用户行为视为独立个体，无法准确表达用户灵活多变的兴趣。为了解决这个问题，作者在 CTR 预测模型中引入了贝叶斯注意力模型，该模型将注意力权重视为依赖于数据的局部随机变量，并通过近似其后向分布来学习其分布。具体来说，先验知识被构建为注意力权重分布，然后利用后验推理来捕捉隐含的、灵活的用户意图。在公共数据集上进行的大量实验表明，我们的算法优于最先进的算法。经验证据表明，随机注意力权重能比确定性权重更好地预测用户意图。

{"title":"Bayesian attention-based user behaviour modelling for click-through rate prediction","authors":"Yihao Zhang, Mian Chen, Ruizhen Chen, Chu Zhao, Meng Yuan, Zhu Sun","doi":"10.1049/cit2.12343","DOIUrl":"10.1049/cit2.12343","url":null,"abstract":"Exploiting the hierarchical dependence behind user behaviour is critical for click-through rate (CRT) prediction in recommender systems. Existing methods apply attention mechanisms to obtain the weights of items; however, the authors argue that deterministic attention mechanisms cannot capture the hierarchical dependence between user behaviours because they treat each user behaviour as an independent individual and cannot accurately express users' flexible and changeable interests. To tackle this issue, the authors introduce the Bayesian attention to the CTR prediction model, which treats attention weights as data-dependent local random variables and learns their distribution by approximating their posterior distribution. Specifically, the prior knowledge is constructed into the attention weight distribution, and then the posterior inference is utilised to capture the implicit and flexible user intentions. Extensive experiments on public datasets demonstrate that our algorithm outperforms state-of-the-art algorithms. Empirical evidence shows that random attention weights can predict user intentions better than deterministic ones.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1320-1330"},"PeriodicalIF":8.4,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12343","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141053198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MH-HMR: Human mesh recovery from monocular images via multi-hypothesis learning MH-HMR：通过多假设学习从单目图像中恢复人体网状结构

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2024-04-29 DOI: 10.1049/cit2.12337

Haibiao Xuan, Jinsong Zhang, Yu-Kun Lai, Kun Li

Recovering 3D human meshes from monocular images is an inherently ill-posed and challenging task due to depth ambiguity, joint occlusion, and truncation. However, most existing approaches do not model such uncertainties, typically yielding a single reconstruction for one input. In contrast, the ambiguity of the reconstruction is embraced and the problem is considered as an inverse problem for which multiple feasible solutions exist. To address these issues, the authors propose a multi-hypothesis approach, multi-hypothesis human mesh recovery (MH-HMR), to efficiently model the multi-hypothesis representation and build strong relationships among the hypothetical features. Specifically, the task is decomposed into three stages: (1) generating a reasonable set of initial recovery results (i.e., multiple hypotheses) given a single colour image; (2) modelling intra-hypothesis refinement to enhance every single-hypothesis feature; and (3) establishing inter-hypothesis communication and regressing the final human meshes. Meanwhile, the authors take further advantage of multiple hypotheses and the recovery process to achieve human mesh recovery from multiple uncalibrated views. Compared with state-of-the-art methods, the MH-HMR approach achieves superior performance and recovers more accurate human meshes on challenging benchmark datasets, such as Human3.6M and 3DPW, while demonstrating the effectiveness across a variety of settings. The code will be publicly available at https://cic.tju.edu.cn/faculty/likun/projects/MH-HMR.

由于深度模糊、联合遮挡和截断等原因，从单目图像中恢复三维人体网格本身就是一项困难且具有挑战性的任务。然而，现有的大多数方法并没有对这些不确定性进行建模，通常是对一个输入进行单一重建。相比之下，重构的模糊性被包含在内，问题被视为一个存在多个可行解决方案的逆问题。为了解决这些问题，作者提出了一种多假设方法，即多假设人体网状结构复原（MH-HMR），以有效地建立多假设表示模型，并在假设特征之间建立牢固的关系。具体来说，这项任务被分解为三个阶段：(1) 给定单一彩色图像，生成一组合理的初始复原结果（即多重假设）；(2) 建立假设内细化模型，以增强每个单一假设特征；(3) 建立假设间通信，并对最终人体网格进行回归。同时，作者进一步利用多假设和复原过程的优势，实现了从多个未校准视图中复原人体网格。与最先进的方法相比，MH-HMR 方法在 Human3.6M 和 3DPW 等具有挑战性的基准数据集上实现了更优越的性能，并恢复了更精确的人体网格，同时证明了该方法在各种环境下的有效性。代码将在 https://cic.tju.edu.cn/faculty/likun/projects/MH-HMR 上公开。

{"title":"MH-HMR: Human mesh recovery from monocular images via multi-hypothesis learning","authors":"Haibiao Xuan, Jinsong Zhang, Yu-Kun Lai, Kun Li","doi":"10.1049/cit2.12337","DOIUrl":"https://doi.org/10.1049/cit2.12337","url":null,"abstract":"Recovering 3D human meshes from monocular images is an inherently ill-posed and challenging task due to depth ambiguity, joint occlusion, and truncation. However, most existing approaches do not model such uncertainties, typically yielding a single reconstruction for one input. In contrast, the ambiguity of the reconstruction is embraced and the problem is considered as an inverse problem for which multiple feasible solutions exist. To address these issues, the authors propose a multi-hypothesis approach, multi-hypothesis human mesh recovery (MH-HMR), to efficiently model the multi-hypothesis representation and build strong relationships among the hypothetical features. Specifically, the task is decomposed into three stages: (1) generating a reasonable set of initial recovery results (i.e., multiple hypotheses) given a single colour image; (2) modelling intra-hypothesis refinement to enhance every single-hypothesis feature; and (3) establishing inter-hypothesis communication and regressing the final human meshes. Meanwhile, the authors take further advantage of multiple hypotheses and the recovery process to achieve human mesh recovery from multiple uncalibrated views. Compared with state-of-the-art methods, the MH-HMR approach achieves superior performance and recovers more accurate human meshes on challenging benchmark datasets, such as Human3.6M and 3DPW, while demonstrating the effectiveness across a variety of settings. The code will be publicly available at https://cic.tju.edu.cn/faculty/likun/projects/MH-HMR.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1263-1274"},"PeriodicalIF":8.4,"publicationDate":"2024-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12337","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0