CAAI Transactions on Intelligence Technology最新文献_第9页

Deep learning in crowd counting: A survey 人群计数中的深度学习：调查

IF 8.4 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-14 DOI: 10.1049/cit2.12241

Lijia Deng, Qinghua Zhou, Shuihua Wang, Juan Manuel Górriz, Yudong Zhang

Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.

快速准确地计数高密度物体是一个热门研究领域。人群计数具有重要的社会和经济价值，也是人工智能的一大重点。尽管该领域取得了许多进展，但其中许多并不广为人知，尤其是在研究数据方面。作者提出了三层标准化数据集分类法（TSDT）。该分类法根据不同的应用场景将数据集分为小规模、大规模和超大规模。这一理论可以帮助研究人员更有效地利用数据集，提高人工智能算法在特定领域的性能。此外，作者还为数据集的清晰度提出了一个新的评价指标：每个物体所占的平均像素（APO）。与图像分辨率相比，这一新的评价指标更适合用于评价物体计数任务中数据集的清晰度。此外，作者还从数据驱动的角度对人群计数方法进行了分类：多尺度网络、单列网络、多列网络、多任务网络、注意力网络和弱监督网络，并介绍了每一类中的经典人群计数方法。作者根据三级标准化数据集分类理论对现有的 36 个数据集进行了分类，并对这些数据集进行了讨论和评估。作者评估了过去五年中 100 多种方法在不同级别的流行数据集上的性能。最近，小规模数据集的研究进展有所放缓。关于小规模数据集的新数据集和算法很少。针对大规模或超大规模数据集的研究似乎已达到饱和点。多种方法的结合使用开始成为一个主要的研究方向。作者从数据、算法和计算资源的角度讨论了人群计数的理论和实践挑战。人群统计领域正朝着多种方法相结合的方向发展，需要全新的、有针对性的数据集。尽管取得了进步，该领域仍然面临着挑战，如处理真实世界场景和实时处理大量人群。研究人员正在探索迁移学习，以克服小数据集的局限性。开发有效的人群计数算法仍然是计算机视觉和人工智能领域一项具有挑战性的重要任务，未来的研究还有很多机会。

{"title":"Deep learning in crowd counting: A survey","authors":"Lijia Deng, Qinghua Zhou, Shuihua Wang, Juan Manuel Górriz, Yudong Zhang","doi":"10.1049/cit2.12241","DOIUrl":"10.1049/cit2.12241","url":null,"abstract":"Counting high-density objects quickly and accurately is a popular area of research. Crowd counting has significant social and economic value and is a major focus in artificial intelligence. Despite many advancements in this field, many of them are not widely known, especially in terms of research data. The authors proposed a three-tier standardised dataset taxonomy (TSDT). The Taxonomy divides datasets into small-scale, large-scale and hyper-scale, according to different application scenarios. This theory can help researchers make more efficient use of datasets and improve the performance of AI algorithms in specific fields. Additionally, the authors proposed a new evaluation index for the clarity of the dataset: average pixel occupied by each object (APO). This new evaluation index is more suitable for evaluating the clarity of the dataset in the object counting task than the image resolution. Moreover, the authors classified the crowd counting methods from a data-driven perspective: multi-scale networks, single-column networks, multi-column networks, multi-task networks, attention networks and weak-supervised networks and introduced the classic crowd counting methods of each class. The authors classified the existing 36 datasets according to the theory of three-tier standardised dataset taxonomy and discussed and evaluated these datasets. The authors evaluated the performance of more than 100 methods in the past five years on different levels of popular datasets. Recently, progress in research on small-scale datasets has slowed down. There are few new datasets and algorithms on small-scale datasets. The studies focused on large or hyper-scale datasets appear to be reaching a saturation point. The combined use of multiple approaches began to be a major research direction. The authors discussed the theoretical and practical challenges of crowd counting from the perspective of data, algorithms and computing resources. The field of crowd counting is moving towards combining multiple methods and requires fresh, targeted datasets. Despite advancements, the field still faces challenges such as handling real-world scenarios and processing large crowds in real-time. Researchers are exploring transfer learning to overcome the limitations of small datasets. The development of effective algorithms for crowd counting remains a challenging and important task in computer vision and AI, with many opportunities for future research.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 5","pages":"1043-1077"},"PeriodicalIF":8.4,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12241","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80112114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Rule acquisition of three-way semi-concept lattices in formal decision context 正式决策语境下三向半概念网格的规则获取

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-13 DOI: 10.1049/cit2.12248

Jie Zhao, Renxia Wan, Duoqian Miao, Boyang Zhang

Three-way concept analysis is an important tool for information processing, and rule acquisition is one of the research hotspots of three-way concept analysis. However, compared with three-way concept lattices, three-way semi-concept lattices have three-way operators with weaker constraints, which can generate more concepts. In this article, the problem of rule acquisition for three-way semi-concept lattices is discussed in general. The authors construct the finer relation of three-way semi-concept lattices, and propose a method of rule acquisition for three-way semi-concept lattices. The authors also discuss the set of decision rules and the relationships of decision rules among object-induced three-way semi-concept lattices, object-induced three-way concept lattices, classical concept lattices and semi-concept lattices. Finally, examples are provided to illustrate the validity of our conclusions.

三向概念分析是信息处理的重要工具，规则获取是三向概念分析的研究热点之一。然而，与三向概念网格相比，三向半概念网格的三向算子约束较弱，可以生成更多的概念。本文从总体上讨论了三向半概念网格的规则获取问题。作者构建了三向半概念网格的精细关系，并提出了一种三向半概念网格的规则获取方法。作者还讨论了对象诱导三向半概念网格、对象诱导三向概念网格、经典概念网格和半概念网格之间的决策规则集和决策规则的关系。最后，作者举例说明了我们结论的正确性。

引用次数: 0

Weakly supervised point cloud segmentation via deep morphological semantic information embedding 通过深度形态学语义信息嵌入进行弱监督点云分割

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-12 DOI: 10.1049/cit2.12239

Wenhao Xue, Yang Yang, Lei Li, Zhongling Huang, Xinggang Wang, Junwei Han, Dingwen Zhang

Segmenting the semantic regions of point clouds is a crucial step for intelligent agents to understand 3D scenes. Weakly supervised point cloud segmentation is highly desirable because entirely labelling point clouds is highly time-consuming and costly. For the low-costing labelling of 3D point clouds, the scene-level label is one of the most effortless label strategies. However, due to the limitation of classifier discriminative capability and the orderless and structurless nature of the point cloud data, existing scene-level method is hard to transfer the semantic information, which usually leads to the under-activated or over-activated issues. To this end, a local semantic embedding network is introduced to learn local structural patterns and semantic propagation. Specifically, the proposed network contains graph convolution-based dilation and erosion embedding modules to implement ‘inside-out’ and ‘outside-in’ semantic information dissemination pathways. Therefore, the proposed weakly supervised learning framework could achieve the mutual propagation of semantic information in the foreground and background. Comprehensive experiments on the widely used ScanNet benchmark demonstrate the superior capacity of the proposed approach when compared to the current alternatives and baseline models.

分割点云的语义区域是智能代理理解三维场景的关键步骤。弱监督点云分割非常理想，因为完全标记点云非常耗时且成本高昂。对于三维点云的低成本标注，场景级标注是最省力的标注策略之一。然而，由于分类器判别能力的限制和点云数据无序无结构的特性，现有的场景级方法很难传递语义信息，通常会导致激活不足或激活过度的问题。为此，我们引入了局部语义嵌入网络来学习局部结构模式和语义传播。具体来说，所提出的网络包含基于图卷积的扩张和侵蚀嵌入模块，以实现 "由内而外 "和 "由外而内 "的语义信息传播途径。因此，所提出的弱监督学习框架可以实现前景和背景语义信息的相互传播。在广泛使用的 ScanNet 基准上进行的综合实验证明，与当前的替代方法和基线模型相比，所提出的方法具有卓越的能力。

{"title":"Weakly supervised point cloud segmentation via deep morphological semantic information embedding","authors":"Wenhao Xue, Yang Yang, Lei Li, Zhongling Huang, Xinggang Wang, Junwei Han, Dingwen Zhang","doi":"10.1049/cit2.12239","DOIUrl":"10.1049/cit2.12239","url":null,"abstract":"Segmenting the semantic regions of point clouds is a crucial step for intelligent agents to understand 3D scenes. Weakly supervised point cloud segmentation is highly desirable because entirely labelling point clouds is highly time-consuming and costly. For the low-costing labelling of 3D point clouds, the scene-level label is one of the most effortless label strategies. However, due to the limitation of classifier discriminative capability and the orderless and structurless nature of the point cloud data, existing scene-level method is hard to transfer the semantic information, which usually leads to the under-activated or over-activated issues. To this end, a local semantic embedding network is introduced to learn local structural patterns and semantic propagation. Specifically, the proposed network contains graph convolution-based dilation and erosion embedding modules to implement ‘inside-out’ and ‘outside-in’ semantic information dissemination pathways. Therefore, the proposed weakly supervised learning framework could achieve the mutual propagation of semantic information in the foreground and background. Comprehensive experiments on the widely used ScanNet benchmark demonstrate the superior capacity of the proposed approach when compared to the current alternatives and baseline models.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"695-708"},"PeriodicalIF":5.1,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12239","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75912380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GastroNet: A robust attention‐based deep learning and cosine similarity feature selection framework for gastrointestinal disease classification from endoscopic images GastroNet:一个鲁棒的基于注意力的深度学习和余弦相似度特征选择框架，用于从内镜图像中分类胃肠道疾病

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-11 DOI: 10.1049/cit2.12231

Muhammad Nouman Noor, Muhammad Nazir, Imran Ashraf, N. Almujally, Muhammad Aslam, Syeda Fizzah Jilani

引用次数: 2

Guest Editorial: Special issue on media convergence and intelligent technology in the metaverse 客座编辑：元宇宙中的媒体融合和智能技术特刊

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-08 DOI: 10.1049/cit2.12250

Siwei Ma, Maoguo Gong, Guojun Qi, Yun Tie, Ivan Lee, Bo Li, Cong Jin

The metaverse is a new type of Internet application and social form that integrates a variety of new technologies, including artificial intelligence, digital twins, block chain, cloud computing, virtual reality, robots, with brain-computer interfaces, and 5G. Media convergence technology is a systematic and comprehensive discipline that applies the theories and methods of modern science and technology to the development of media innovation, mainly including multimedia creation, production, communication, service, consumption, reproduction and so on. The emergence of new technologies, such as deep learning, distributed computing, and extended reality has promoted the development of media integration in the metaverse, and these technologies are the key factors that promote the current transformation of the Internet to the metaverse.This Special Issue aims to collect research on the application of media convergence and intelligent technology in the metaverse, focussing on the theory and technology of intelligent generation of multimedia content based on deep learning, the intelligent recommendation algorithm of media content with privacy protection as the core, the prediction model of multimedia communication based on big data analysis, and the immersive experience technology (VR/AR) in metaverse and multimedia communication, 5G/6G mobile Internet ultrahigh-definition video transmission and storage resource allocation algorithm, neural network-based media content encryption algorithm. Original research and review articles are welcome.The first article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes [1]. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.The second article aims at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples, a bilateral U-Net network model with a spatial attention mechanism is designed [2]. The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention (APSA) module compared to the Attenuated Spatial Pyramid module, which can increase the receptive field and enhance the information, and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results, and the model effectively improves the segmentation accuracy of small data sets. The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks, the algorithm has a better segmentation effect and segmentation accur

元宇宙是一种新型的互联网应用和社交形式，融合了各种新技术，包括人工智能、数字孪生、区块链、云计算、虚拟现实、机器人、脑机接口和5G。媒体融合技术是将现代科学技术的理论和方法应用于媒体创新发展的系统性、综合性学科，主要包括多媒体创作、生产、传播、服务、消费、再生产等，扩展现实促进了元宇宙中媒体融合的发展，这些技术是推动当前互联网向元宇宙转型的关键因素。本特刊旨在收集媒体融合和智能技术在元宇宙中的应用研究，重点关注基于深度学习的多媒体内容智能生成理论和技术、以隐私保护为核心的媒体内容智能推荐算法、，基于大数据分析的多媒体通信预测模型，元宇宙和多媒体通信中的沉浸式体验技术（VR/AR），5G/6G移动互联网超高清视频传输和存储资源分配算法，基于神经网络的媒体内容加密算法。欢迎原创研究和评论文章。第一篇文章定义了综合信息损失，同时考虑了记录的抑制和敏感属性之间的关系[1]。利用启发式方法来发现具有最低综合信息损失的最优匿名方案。实验结果验证了所提出的具有多个敏感属性的数据发布方法的实用性。与以前的方法相比，该方法能够保证信息的有效性。第二篇文章针对现有模型对小样本不平衡数据集分割效果差的问题，设计了一个具有空间注意机制的双边U-Net网络模型[2]。该模型使用轻量级的MobileNetV2作为骨干网络进行特征层次提取，并与衰减空间金字塔模块相比，提出了一个衰减金字塔空间注意力（APSA）模块，该模块可以增加感受野并增强信息，最后加入了融合高语义和低语义预测结果的上下文融合预测分支，该模型有效地提高了小数据集的分割精度。在CamVid数据集上的实验结果表明，与现有的一些语义分割网络相比，该算法具有更好的分割效果和分割精度，其mIOU达到75.85%。此外，为了验证模型的通用性和APSA模块的有效性，在VOC 2012数据集上进行了实验，APSA模块将mIOU提高了约12.2%。第三篇文章提出了树状神经模型（DNM）模拟人脑突触的非线性，以模拟神经元的信息处理机制和过程[3]。这增强了对生物神经系统的理解以及该模型在各个领域的适用性。然而，现有的DNM具有高复杂性和有限的泛化能力。为了解决这些问题，我们提出了一种具有枝晶层显著性约束的DNM修剪方法。我们的方法不仅评估了枝晶层的重要性，而且还将训练模型中少数枝晶的重要性分配给少数枝晶层，从而可以去除低重要性的枝晶层。在六个UCI数据集上的仿真实验表明，我们的方法在网络大小和泛化性能方面优于现有的修剪方法。第四篇文章提出了一种用于对话系统的基于语义和情感的双潜在变量生成模型（dual LVG），该模型能够在没有情感词典的情况下生成适当的情感反应[4]。与以往的工作不同，条件变分自动编码器（CVAE）采用了标准的变换器结构。然后，对偶LVG通过引入语义和情感的对偶潜在空间来正则化CVAE潜在空间。通过分别学习情感特征和语义特征，提高了生成响应的内容多样性和情感准确性。此外，在序列层面采用平均注意力机制更好地提取语义特征，在解码步骤采用半监督注意力机制加强模型情感特征的融合。

{"title":"Guest Editorial: Special issue on media convergence and intelligent technology in the metaverse","authors":"Siwei Ma, Maoguo Gong, Guojun Qi, Yun Tie, Ivan Lee, Bo Li, Cong Jin","doi":"10.1049/cit2.12250","DOIUrl":"https://doi.org/10.1049/cit2.12250","url":null,"abstract":"The metaverse is a new type of Internet application and social form that integrates a variety of new technologies, including artificial intelligence, digital twins, block chain, cloud computing, virtual reality, robots, with brain-computer interfaces, and 5G. Media convergence technology is a systematic and comprehensive discipline that applies the theories and methods of modern science and technology to the development of media innovation, mainly including multimedia creation, production, communication, service, consumption, reproduction and so on. The emergence of new technologies, such as deep learning, distributed computing, and extended reality has promoted the development of media integration in the metaverse, and these technologies are the key factors that promote the current transformation of the Internet to the metaverse.This Special Issue aims to collect research on the application of media convergence and intelligent technology in the metaverse, focussing on the theory and technology of intelligent generation of multimedia content based on deep learning, the intelligent recommendation algorithm of media content with privacy protection as the core, the prediction model of multimedia communication based on big data analysis, and the immersive experience technology (VR/AR) in metaverse and multimedia communication, 5G/6G mobile Internet ultrahigh-definition video transmission and storage resource allocation algorithm, neural network-based media content encryption algorithm. Original research and review articles are welcome.The first article defines comprehensive information loss that considers both the suppression of records and the relationship between sensitive attributes [1]. A heuristic method is leveraged to discover the optimal anonymity scheme that has the lowest comprehensive information loss. The experimental results verify the practice of the proposed data publishing method with multiple sensitive attributes. The proposed method can guarantee information utility when compared with previous ones.The second article aims at the problem that the existing models have a poor segmentation effect on imbalanced data sets with small-scale samples, a bilateral U-Net network model with a spatial attention mechanism is designed [2]. The model uses the lightweight MobileNetV2 as the backbone network for feature hierarchical extraction and proposes an Attentive Pyramid Spatial Attention (APSA) module compared to the Attenuated Spatial Pyramid module, which can increase the receptive field and enhance the information, and finally adds the context fusion prediction branch that fuses high-semantic and low-semantic prediction results, and the model effectively improves the segmentation accuracy of small data sets. The experimental results on the CamVid data set show that compared with some existing semantic segmentation networks, the algorithm has a better segmentation effect and segmentation accur","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"285-287"},"PeriodicalIF":5.1,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12250","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50140208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Empowering and conquering infirmity of visually impaired using AI‐technology equipped with object detection and real‐time voice feedback system in healthcare application 在医疗保健应用中，使用配备物体检测和实时语音反馈系统的人工智能技术，增强和征服视障人士的能力

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-08 DOI: 10.1049/cit2.12243

Hania Tarik, Shahzad Hassan, R. A. Naqvi, Saddaf Rubab, Usman Tariq, Monia Hamdi, H. Elmannai, Ye Jin Kim, Jaehyuk Cha

引用次数: 0

Semantic segmentation via pixel-to-center similarity calculation 通过像素到中心的相似性计算进行语义分割

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-07 DOI: 10.1049/cit2.12245

Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao

Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra-class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same-class pixels from different scenes; (ii) The inter-class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra-class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity. Finally, the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.

自全卷积网络在语义分割领域取得巨大成功以来，人们提出了许多提取区分像素表征的方法。然而，作者发现现有的方法仍然存在两个典型的挑战：(i) 不同场景之间的类内特征差异可能很大，导致不同场景的同类像素之间难以保持一致；(ii) 同一场景中的类间特征差异可能很小，导致区分每个场景中不同类的性能有限。作者首先从像素与类中心相似性的角度重新思考语义分割。分割头的每个权重向量都代表其在整个数据集中对应的语义类别，可视为类别中心的嵌入。因此，像素分类相当于计算像素与类中心在最终特征空间中的相似度。在这一新颖观点的指导下，作者提出了类中心相似性（CCS）层，通过生成以每个场景为条件的自适应类中心并监督类中心之间的相似性来应对上述挑战。CCS 层利用自适应类中心模块生成以每个场景为条件的类中心，以适应不同场景之间巨大的类内差异。根据预测的中心到中心和像素到中心的相似性，引入专门设计的类距离损失（CD Loss）来控制类间和类内距离。最后，CCS 层输出处理后的像素到中心相似度作为分割预测。广泛的实验证明，我们的模型与最先进的方法相比表现出色。

{"title":"Semantic segmentation via pixel-to-center similarity calculation","authors":"Dongyue Wu, Zilin Guo, Aoyan Li, Changqian Yu, Nong Sang, Changxin Gao","doi":"10.1049/cit2.12245","DOIUrl":"10.1049/cit2.12245","url":null,"abstract":"Since the fully convolutional network has achieved great success in semantic segmentation, lots of works have been proposed to extract discriminative pixel representations. However, the authors observe that existing methods still suffer from two typical challenges: (i) The intra-class feature variation between different scenes may be large, leading to the difficulty in maintaining the consistency between same-class pixels from different scenes; (ii) The inter-class feature distinction in the same scene could be small, resulting in the limited performance to distinguish different classes in each scene. The authors first rethink semantic segmentation from a perspective of similarity between pixels and class centers. Each weight vector of the segmentation head represents its corresponding semantic class in the whole dataset, which can be regarded as the embedding of the class center. Thus, the pixel-wise classification amounts to computing similarity in the final feature space between pixels and the class centers. Under this novel view, the authors propose a Class Center Similarity (CCS) layer to address the above-mentioned challenges by generating adaptive class centers conditioned on each scenes and supervising the similarities between class centers. The CCS layer utilises the Adaptive Class Center Module to generate class centers conditioned on each scene, which adapt the large intra-class variation between different scenes. Specially designed Class Distance Loss (CD Loss) is introduced to control both inter-class and intra-class distances based on the predicted center-to-center and pixel-to-center similarity. Finally, the CCS layer outputs the processed pixel-to-center similarity as the segmentation prediction. Extensive experiments demonstrate that our model performs favourably against the state-of-the-art methods.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 1","pages":"87-100"},"PeriodicalIF":5.1,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12245","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135449405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep learning's fitness for purpose: A transformation problem frame's perspective 深度学习的目的适应性：一个转换问题框架的视角

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-07 DOI: 10.1049/cit2.12237

Hemanth Gudaparthi, Nan Niu, Yilong Yang, Matthew Van Doren, Reese Johnson

Combined sewer overflows represent significant risks to human health as untreated water is discharged to the environment. Municipalities, such as the Metropolitan Sewer District of Greater Cincinnati (MSDGC), recently began collecting large amounts of water-related data and considering the adoption of deep learning (DL) solutions like recurrent neural network (RNN) for predicting overflow events. Clearly, assessing the DL's fitness for the purpose requires a systematic understanding of the problem context. In this study, we propose a requirements engineering framework that uses the problem frames to identify and structure the stakeholder concerns, analyses the physical situations in which the high-quality data assumptions may not hold, and derives the software testing criteria in the form of metamorphic relations that incorporate both input transformations and output comparisons. Applying our framework to MSDGC's overflow prediction problem enables a principled way to evaluate different RNN solutions in meeting the requirements.

由于未经处理的水被排放到环境中，下水道联合溢流对人类健康构成重大风险。大辛辛那提大都会下水道区（MSDGC）等市政当局最近开始收集大量与水有关的数据，并考虑采用深度学习（DL）解决方案，如递归神经网络（RNN）来预测溢流事件。显然，评估DL是否适合该目的需要系统地了解问题背景。在本研究中，我们提出了一个需求工程框架，该框架使用问题框架来识别和构建利益相关者的关注点，分析高质量数据假设可能不成立的物理情况，并以包含输入转换和输出比较的变形关系的形式推导出软件测试标准。将我们的框架应用于MSDGC的溢出预测问题，可以提供一种原则性的方法来评估满足需求的不同RNN解决方案。

{"title":"Deep learning's fitness for purpose: A transformation problem frame's perspective","authors":"Hemanth Gudaparthi, Nan Niu, Yilong Yang, Matthew Van Doren, Reese Johnson","doi":"10.1049/cit2.12237","DOIUrl":"https://doi.org/10.1049/cit2.12237","url":null,"abstract":"Combined sewer overflows represent significant risks to human health as untreated water is discharged to the environment. Municipalities, such as the Metropolitan Sewer District of Greater Cincinnati (MSDGC), recently began collecting large amounts of water-related data and considering the adoption of deep learning (DL) solutions like recurrent neural network (RNN) for predicting overflow events. Clearly, assessing the DL's fitness for the purpose requires a systematic understanding of the problem context. In this study, we propose a requirements engineering framework that uses the problem frames to identify and structure the stakeholder concerns, analyses the physical situations in which the high-quality data assumptions may not hold, and derives the software testing criteria in the form of metamorphic relations that incorporate both input transformations and output comparisons. Applying our framework to MSDGC's overflow prediction problem enables a principled way to evaluate different RNN solutions in meeting the requirements.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"8 2","pages":"343-354"},"PeriodicalIF":5.1,"publicationDate":"2023-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12237","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50137138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs 实现值得信赖的多模式运动预测：输出的整体评估和可解释性

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-05 DOI: 10.1049/cit2.12244

Sandra Carrasco Limeros, Sylwia Majchrowska, Joakim Johnander, Christoffer Petersson, Miguel Ángel Sotelo, David Fernández Llorca

Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. The authors aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. The focus is on evaluation criteria, robustness, and interpretability of outputs. First, the evaluation metrics are comprehensively analysed, the main gaps of current benchmarks are identified, and a new holistic evaluation framework is proposed. Then, a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system. To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework, an intent prediction layer that can be attached to multi-modal motion prediction models is proposed. The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions. The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autonomous vehicles, advancing the field towards greater safety and reliability.

通过预测其他道路代理的运动轨迹，自动驾驶汽车可以执行安全高效的路径规划。这项任务非常复杂，因为道路代理的行为取决于许多因素，而且未来可能出现的轨迹数量可能相当多（多模式）。之前针对多模式运动预测提出的大多数方法都是基于复杂的机器学习系统，可解释性有限。此外，当前基准中使用的指标并不能评估问题的所有方面，如输出的多样性和可接受性。作者旨在根据可信人工智能设计的一些要求，推进可信运动预测系统的设计。重点是评价标准、鲁棒性和输出的可解释性。首先，对评估指标进行了全面分析，找出了当前基准的主要差距，并提出了一个新的整体评估框架。然后，通过模拟感知系统中的噪声，介绍了一种评估空间和时间鲁棒性的方法。为了提高输出结果的可解释性，并在建议的评估框架中生成更平衡的结果，提出了一个可附加到多模态运动预测模型的意图预测层。通过一项调查，对多模态轨迹和意图可视化中的不同元素进行了探索，从而评估了这种方法的有效性。所提出的方法和研究结果为开发用于自动驾驶汽车的可信运动预测系统做出了重大贡献，推动了该领域向更高安全性和可靠性迈进。

{"title":"Towards trustworthy multi-modal motion prediction: Holistic evaluation and interpretability of outputs","authors":"Sandra Carrasco Limeros, Sylwia Majchrowska, Joakim Johnander, Christoffer Petersson, Miguel Ángel Sotelo, David Fernández Llorca","doi":"10.1049/cit2.12244","DOIUrl":"10.1049/cit2.12244","url":null,"abstract":"Predicting the motion of other road agents enables autonomous vehicles to perform safe and efficient path planning. This task is very complex, as the behaviour of road agents depends on many factors and the number of possible future trajectories can be considerable (multi-modal). Most prior approaches proposed to address multi-modal motion prediction are based on complex machine learning systems that have limited interpretability. Moreover, the metrics used in current benchmarks do not evaluate all aspects of the problem, such as the diversity and admissibility of the output. The authors aim to advance towards the design of trustworthy motion prediction systems, based on some of the requirements for the design of Trustworthy Artificial Intelligence. The focus is on evaluation criteria, robustness, and interpretability of outputs. First, the evaluation metrics are comprehensively analysed, the main gaps of current benchmarks are identified, and a new holistic evaluation framework is proposed. Then, a method for the assessment of spatial and temporal robustness is introduced by simulating noise in the perception system. To enhance the interpretability of the outputs and generate more balanced results in the proposed evaluation framework, an intent prediction layer that can be attached to multi-modal motion prediction models is proposed. The effectiveness of this approach is assessed through a survey that explores different elements in the visualisation of the multi-modal trajectories and intentions. The proposed approach and findings make a significant contribution to the development of trustworthy motion prediction systems for autonomous vehicles, advancing the field towards greater safety and reliability.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 3","pages":"557-572"},"PeriodicalIF":5.1,"publicationDate":"2023-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12244","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81913343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mixed-decomposed convolutional network: A lightweight yet efficient convolutional neural network for ocular disease recognition 混合分解卷积网络：用于眼部疾病识别的轻量级高效卷积神经网络

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

CAAI Transactions on Intelligence Technology

Pub Date : 2023-06-04 DOI: 10.1049/cit2.12246

Xiaoqing Zhang, Xiao Wu, Zunjie Xiao, Lingxi Hu, Zhongxi Qiu, Qingyang Sun, Risa Higashita, Jiang Liu

Eye health has become a global health concern and attracted broad attention. Over the years, researchers have proposed many state-of-the-art convolutional neural networks (CNNs) to assist ophthalmologists in diagnosing ocular diseases efficiently and precisely. However, most existing methods were dedicated to constructing sophisticated CNNs, inevitably ignoring the trade-off between performance and model complexity. To alleviate this paradox, this paper proposes a lightweight yet efficient network architecture, mixed-decomposed convolutional network (MDNet), to recognise ocular diseases. In MDNet, we introduce a novel mixed-decomposed depthwise convolution method, which takes advantage of depthwise convolution and depthwise dilated convolution operations to capture low-resolution and high-resolution patterns by using fewer computations and fewer parameters. We conduct extensive experiments on the clinical anterior segment optical coherence tomography (AS-OCT), LAG, University of California San Diego, and CIFAR-100 datasets. The results show our MDNet achieves a better trade-off between the performance and model complexity than efficient CNNs including MobileNets and MixNets. Specifically, our MDNet outperforms MobileNets by 2.5% of accuracy by using 22% fewer parameters and 30% fewer computations on the AS-OCT dataset.

眼部健康已成为全球关注的健康问题，受到广泛关注。多年来，研究人员提出了许多先进的卷积神经网络（CNN），以帮助眼科医生高效、精确地诊断眼部疾病。然而，大多数现有方法都致力于构建复杂的卷积神经网络，不可避免地忽视了性能与模型复杂性之间的权衡。为了缓解这一矛盾，本文提出了一种轻量级但高效的网络架构--混合分解卷积网络（MDNet），用于识别眼科疾病。在 MDNet 中，我们引入了一种新颖的混合分解深度卷积方法，它利用深度卷积和深度扩张卷积运算的优势，通过更少的计算量和参数来捕捉低分辨率和高分辨率模式。我们在临床前节光学相干断层扫描（AS-OCT）、LAG、加州大学圣地亚哥分校和 CIFAR-100 数据集上进行了大量实验。结果表明，与包括 MobileNets 和 MixNets 在内的高效 CNN 相比，我们的 MDNet 在性能和模型复杂度之间实现了更好的权衡。具体来说，在 AS-OCT 数据集上，我们的 MDNet 减少了 22% 的参数和 30% 的计算，准确率比 MobileNets 高出 2.5%。

{"title":"Mixed-decomposed convolutional network: A lightweight yet efficient convolutional neural network for ocular disease recognition","authors":"Xiaoqing Zhang, Xiao Wu, Zunjie Xiao, Lingxi Hu, Zhongxi Qiu, Qingyang Sun, Risa Higashita, Jiang Liu","doi":"10.1049/cit2.12246","DOIUrl":"10.1049/cit2.12246","url":null,"abstract":"Eye health has become a global health concern and attracted broad attention. Over the years, researchers have proposed many state-of-the-art convolutional neural networks (CNNs) to assist ophthalmologists in diagnosing ocular diseases efficiently and precisely. However, most existing methods were dedicated to constructing sophisticated CNNs, inevitably ignoring the trade-off between performance and model complexity. To alleviate this paradox, this paper proposes a lightweight yet efficient network architecture, mixed-decomposed convolutional network (MDNet), to recognise ocular diseases. In MDNet, we introduce a novel mixed-decomposed depthwise convolution method, which takes advantage of depthwise convolution and depthwise dilated convolution operations to capture low-resolution and high-resolution patterns by using fewer computations and fewer parameters. We conduct extensive experiments on the clinical anterior segment optical coherence tomography (AS-OCT), LAG, University of California San Diego, and CIFAR-100 datasets. The results show our MDNet achieves a better trade-off between the performance and model complexity than efficient CNNs including MobileNets and MixNets. Specifically, our MDNet outperforms MobileNets by 2.5% of accuracy by using 22% fewer parameters and 30% fewer computations on the AS-OCT dataset.","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 2","pages":"319-332"},"PeriodicalIF":5.1,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12246","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86230090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0