首页 > 最新文献

CAAI Transactions on Intelligence Technology最新文献

英文 中文
SFNIC: Hybrid Spatial-Frequency Information for Lightweight Neural Image Compression 用于轻量级神经图像压缩的混合空间-频率信息
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-07-01 DOI: 10.1049/cit2.70034
Youneng Bao, Wen Tan, Mu Li, Jiacong Chen, Qingyu Mao, Yongsheng Liang

Neural image compression (NIC) has shown remarkable rate-distortion (R-D) efficiency. However, the considerable computational and spatial complexity of most NIC methods presents deployment challenges on resource-constrained devices. We introduce a lightweight neural image compression framework designed to efficiently process both local and global information. In this framework, the convolutional branch extracts local information, whereas the frequency domain branch extracts global information. To capture global information without the high computational costs of dense pixel operations, such as attention mechanisms, Fourier transform is employed. This approach allows for the manipulation of global information in the frequency domain. Additionally, we employ feature shift operations as a strategy to acquire large receptive fields without any computational cost, thus circumventing the need for large kernel convolution. Our framework achieves a superior balance between rate-distortion performance and complexity. On varying resolution sets, our method not only achieves rate-distortion (R-D) performance on par with versatile video coding (VVC) intra and other state-of-the-art (SOTA) NIC methods but also exhibits the lowest computational requirements, with approximately 200 KMACs/pixel. The code will be available at https://github.com/baoyu2020/SFNIC.

神经图像压缩(NIC)显示出显著的率失真(R-D)效率。然而,大多数NIC方法的计算和空间复杂性对资源受限设备的部署提出了挑战。我们介绍了一个轻量级的神经图像压缩框架,旨在有效地处理局部和全局信息。在该框架中,卷积分支提取局部信息,而频域分支提取全局信息。为了获取全局信息,避免密集像素操作(如注意机制)的高计算成本,采用傅里叶变换。这种方法允许在频域中对全局信息进行操作。此外,我们采用特征移位操作作为一种策略来获取大的接受域,而不需要任何计算成本,从而避免了对大核卷积的需要。我们的框架在率失真性能和复杂性之间取得了很好的平衡。在不同的分辨率集上,我们的方法不仅实现了与通用视频编码(VVC)和其他最先进的(SOTA) NIC方法相当的率失真(R-D)性能,而且还显示出最低的计算需求,大约为200 kmac /像素。代码可在https://github.com/baoyu2020/SFNIC上获得。
{"title":"SFNIC: Hybrid Spatial-Frequency Information for Lightweight Neural Image Compression","authors":"Youneng Bao,&nbsp;Wen Tan,&nbsp;Mu Li,&nbsp;Jiacong Chen,&nbsp;Qingyu Mao,&nbsp;Yongsheng Liang","doi":"10.1049/cit2.70034","DOIUrl":"https://doi.org/10.1049/cit2.70034","url":null,"abstract":"<p>Neural image compression (NIC) has shown remarkable rate-distortion (R-D) efficiency. However, the considerable computational and spatial complexity of most NIC methods presents deployment challenges on resource-constrained devices. We introduce a lightweight neural image compression framework designed to efficiently process both local and global information. In this framework, the convolutional branch extracts local information, whereas the frequency domain branch extracts global information. To capture global information without the high computational costs of dense pixel operations, such as attention mechanisms, Fourier transform is employed. This approach allows for the manipulation of global information in the frequency domain. Additionally, we employ feature shift operations as a strategy to acquire large receptive fields without any computational cost, thus circumventing the need for large kernel convolution. Our framework achieves a superior balance between rate-distortion performance and complexity. On varying resolution sets, our method not only achieves rate-distortion (R-D) performance on par with versatile video coding (VVC) intra and other state-of-the-art (SOTA) NIC methods but also exhibits the lowest computational requirements, with approximately 200 KMACs/pixel. The code will be available at https://github.com/baoyu2020/SFNIC.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 6","pages":"1717-1730"},"PeriodicalIF":7.3,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70034","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145848212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Guest Editorial: Special Issue on Al Technologies and Applications in Medical Robots 特刊:人工智能技术及其在医疗机器人中的应用
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-28 DOI: 10.1049/cit2.70019
Xiaozhi Qi, Zhongliang Jiang, Ying Hu, Jianwei Zhang
<p>The integration of artificial intelligence (AI) into medical robotics has emerged as a cornerstone of modern healthcare, driving transformative advancements in precision, adaptability and patient outcomes. Although computational tools have long supported diagnostic processes, their role is evolving beyond passive assistance to become active collaborators in therapeutic decision-making. In this paradigm, knowledge-driven deep learning systems are redefining possibilities—enabling robots to interpret complex data, adapt to dynamic clinical environments and execute tasks with human-like contextual awareness.</p><p>The purpose of this special issue is to showcase the latest developments in the application of AI technology in medical robots. The main content includes but is not limited to passive data adaptation, force feedback tracking, image processing and diagnosis, surgical navigation, exoskeleton systems etc. These studies cover various application scenarios of medical robots, with the ultimate goal of maximising AI autonomy.</p><p>We have received 31 paper submissions from around the world, and after a rigorous peer review process, we have finally selected 9 papers for publication. The selected collection of papers covers various fascinating research topics, all of which have achieved key breakthroughs in their respective fields. We believe that these accepted papers have guiding significance for their research fields and can help researchers enhance their understanding of current trends. Sincere thanks to the authors who chose our platform and all the staff who provided assistance for the publication of these papers.</p><p>In the article ‘Model adaptation via credible local context representation’, Tang et al. pointed out that conventional model transfer techniques require labelled source data, which makes them inapplicable in privacy-sensitive medical domains. To address these critical problems of source-free domain adaptation (SFDA), they proposed a credible local context representation (CLCR) method that significantly enhances model generalisation through geometric structure mining in feature space. This method innovatively constructs a two-stage learning framework: introducing a data-enhanced mutual information regularisation term in the pretraining stage of the source model to enhance the model's learning of sample discriminative features; design a deep space fixed step walking strategy during the target domain adaptation phase, dynamically capture the local credible contextual features of each target sample and use them as pseudo-labels for semantic fusion. Experiments on the three benchmark datasets of Office-31, Office Home and VisDA show that CLCR achieves an average accuracy of 89.2% in 12 cross-domain tasks, which is 3.1% higher than the existing optimal SFDA method and even surpasses some domain adaptation methods that require the participation of source data. This work provides a new approach to address the privacy performance c
人工智能(AI)与医疗机器人的集成已成为现代医疗保健的基石,推动了精确度、适应性和患者治疗结果的变革性进步。虽然计算工具长期以来一直支持诊断过程,但它们的作用正在从被动辅助发展成为治疗决策的积极合作者。在这种模式下,知识驱动的深度学习系统正在重新定义可能性——使机器人能够解释复杂的数据,适应动态的临床环境,并以类似人类的上下文感知执行任务。本期特刊旨在展示人工智能技术在医疗机器人领域应用的最新进展。主要内容包括但不限于被动数据自适应、力反馈跟踪、图像处理与诊断、手术导航、外骨骼系统等。这些研究涵盖了医疗机器人的各种应用场景,最终目标是最大限度地提高人工智能的自主性。我们收到了来自世界各地的31篇投稿论文,经过严格的同行评议,我们最终选择了9篇论文发表。精选的论文涵盖了各种引人入胜的研究课题,这些课题都在各自的领域取得了重大突破。我们认为这些被认可的论文对其研究领域具有指导意义,可以帮助研究人员增强对当前趋势的理解。衷心感谢选择我们平台的作者和为论文发表提供帮助的所有工作人员。在文章“通过可信的局部上下文表示进行模型适应”中,Tang等人指出,传统的模型转移技术需要标记源数据,这使得它们不适用于隐私敏感的医疗领域。为了解决这些无源域自适应(SFDA)的关键问题,他们提出了一种可信的局部上下文表示(CLCR)方法,该方法通过特征空间中的几何结构挖掘显著增强了模型的泛化。该方法创新性地构建了两阶段学习框架:在源模型的预训练阶段引入数据增强的互信息正则化项,增强模型对样本判别特征的学习能力;在目标域适应阶段设计深空定步行走策略,动态捕获每个目标样本的局部可信上下文特征,并将其作为伪标签进行语义融合。在Office-31、Office Home和VisDA三个基准数据集上的实验表明,CLCR在12个跨域任务中平均准确率达到89.2%,比现有的最优SFDA方法提高了3.1%,甚至超过了一些需要源数据参与的域自适应方法。该研究为解决医疗保健跨机构模型转移中的隐私绩效冲突提供了一种新方法,其上下文发现机制对无监督表示学习具有普遍意义。在文章“不确定表面扫描的人机协作方法”中,Zhao等人介绍了一种用于不确定表面扫描的人机协作框架,该框架将远程操作与自适应力控制协同起来。该系统使操作人员能够远程引导扫描轨迹,而导纳控制器通过实时刚度调整保持恒定的接触力,在刚度未知的表面上实现±1 N的跟踪精度。当角度偏差超过5°时触发自动工具重新定向,通过摩擦补偿力感知确保垂直对准。使用模拟超声探头的实验验证表明,与纯远程操作相比,工作量减少了63%,成功地处理了海绵和弹簧支撑的幻影。混合控制体系结构将人类引导与机器人依从性分离,允许同时进行xy轴运动控制和z轴力调节,而无需事先进行环境建模。这种方法将人类的直觉与机器人的精度联系起来,对于需要安全组织相互作用的医疗扫描应用特别有价值。在“AESR3D:用于小梁CT超分辨率的3D过完整自编码器”研究中,Zhang等人提出了一种3D过完整自编码器框架AESR3D,通过增强低分辨率小梁CT扫描来解决骨质疏松症诊断的局限性。目前对骨矿物质密度(BMD)的依赖忽略了对生物力学强度至关重要的微观结构恶化。AESR3D结合了CNN-transformer混合架构和双任务正则化,同时优化超分辨率重建和低分辨率恢复,以防止在恢复结构细节时过拟合。 该模型实现了最先进的性能(SSIM: 0.996),并在小梁度量(ICC = 0.917)中显示出与高分辨率地面真值的强相关性。通过整合无监督的k均值分割,它可以在没有标记数据的情况下精确地可视化骨骼微结构。AESR3D超越了现有的医学/自然图像SR方法,连接了微CT研究和临床CT应用,为增强骨质疏松症评估提供了一种无创工具,提高了骨质量评估的诊断准确性。在论文“分割与检测:双参数前列腺MRI上PIRADS病变定位的深度学习模型的开发和评估”中,Min等人通过严格比较分割(nnUNet)和对象检测(nnDetection)深度学习方法,解决了双参数MRI (bp-MRI)中自动前列腺癌检测的关键挑战。前列腺癌是男性死亡的主要原因,需要精确的早期诊断,但MRI解释仍然依赖于放射科医生和时间密集。作者引入了新的损伤级灵敏度和精度度量,克服了传统体素评估的局限性,并提出了集成方法来协同两种模型的优势。结果表明,nnDetection在病变级别上的灵敏度更高(对于PIRADS≥3个病变,3个假阳性时为80.78% vs 60.40%),而nnUNet在体素级别上的准确性更高(DSC为0.46 vs 0.35)。集成技术进一步提高了性能,达到了82.24%的损伤级灵敏度,强调了它们平衡检测鲁棒性和空间精度的潜力。通过外部数据集的验证,该框架强调了分割和检测范式相结合的临床可行性,特别是对于需要高灵敏度的mri引导活检。这项工作通过弥合方法差距和提供与临床优先事项一致的指标来推进计算机辅助诊断,通过人工智能驱动的病变定位为改善前列腺癌管理提供了可扩展的途径。在论文“使用深度学习进行机器人辅助视网膜下注射的针头检测和定位”中,Zhou等人解决了机器人辅助视网膜下注射中精确针头检测和定位的关键挑战,这是一项需要微米级精度的高风险眼科手术。利用显微镜集成光学相干断层扫描(MI-OCT),作者提出了一个结合ROI裁剪和深度学习的强大框架,以克服由组织变形和镜面噪声引起的手动针头跟踪的局限性。对5种卷积神经网络架构进行了评估,其中表现最好的模型(network II)在离体猪眼上实现了100%的检测成功率,并以0.55的交集-过联合(Intersection-over-Union)定位了针段。通过分析边界盒边缘,该方法的深度估计精度低于10 μm,这对于导航脆弱的视网膜层至关重要。相邻OCT扫描的整合增强了空间上下文感知,优于基于几何特征的方法。这项工作通过实现实时、抗变形的针头跟踪,促进了术中成像引导机器人技术的发展,潜在地降低了基因治疗和视网膜下出血治疗的手术风险。经过验证的框架填补了眼科机器人技术的关键空白,为视网膜手术中更安全、更精确的机器人干预提供了一条途径。在论文《基于PointMLP_RegNet的骨盆表面自动特征点提取方法》中,Kou等人指出,在机器人辅助骨折复位中,从复杂骨盆结构中精确提取解剖标志对于提高3D/3D配准精度至关重要。为了解决手动和传统自动化方法的挑战,本研究引入了PointMLP_RegNet,这是一个基于PointMLP的深度学习框架,通过用回归模块替换其分类层来预测10个骨盆地标的空间坐标。通过下采样、平移、旋转和噪声注入增强的40个ct重构点云的临床数据集进行训练,该模型通过留一交叉验证显示了稳健的性能。结果显示,所有地标的精度都在5毫米以下,其中80%的误差低于4毫米,在精度上超过了PointNet++和PointNet(平均误差降低了20%-30%),同时保持了优越的计算效率(0.688 M参数)。通过自动特征提取,该方法最大限度地减少了人为的可变性,简化了术中登记,提高了手术计划的可靠性。 这一创新弥补了骨盆骨折机器人的技术差距,为临床应用提供了可扩展的解决方案,并强调了骨科导航系统中定制深度学习架构的变革潜力。Gao等
{"title":"Guest Editorial: Special Issue on Al Technologies and Applications in Medical Robots","authors":"Xiaozhi Qi,&nbsp;Zhongliang Jiang,&nbsp;Ying Hu,&nbsp;Jianwei Zhang","doi":"10.1049/cit2.70019","DOIUrl":"10.1049/cit2.70019","url":null,"abstract":"&lt;p&gt;The integration of artificial intelligence (AI) into medical robotics has emerged as a cornerstone of modern healthcare, driving transformative advancements in precision, adaptability and patient outcomes. Although computational tools have long supported diagnostic processes, their role is evolving beyond passive assistance to become active collaborators in therapeutic decision-making. In this paradigm, knowledge-driven deep learning systems are redefining possibilities—enabling robots to interpret complex data, adapt to dynamic clinical environments and execute tasks with human-like contextual awareness.&lt;/p&gt;&lt;p&gt;The purpose of this special issue is to showcase the latest developments in the application of AI technology in medical robots. The main content includes but is not limited to passive data adaptation, force feedback tracking, image processing and diagnosis, surgical navigation, exoskeleton systems etc. These studies cover various application scenarios of medical robots, with the ultimate goal of maximising AI autonomy.&lt;/p&gt;&lt;p&gt;We have received 31 paper submissions from around the world, and after a rigorous peer review process, we have finally selected 9 papers for publication. The selected collection of papers covers various fascinating research topics, all of which have achieved key breakthroughs in their respective fields. We believe that these accepted papers have guiding significance for their research fields and can help researchers enhance their understanding of current trends. Sincere thanks to the authors who chose our platform and all the staff who provided assistance for the publication of these papers.&lt;/p&gt;&lt;p&gt;In the article ‘Model adaptation via credible local context representation’, Tang et al. pointed out that conventional model transfer techniques require labelled source data, which makes them inapplicable in privacy-sensitive medical domains. To address these critical problems of source-free domain adaptation (SFDA), they proposed a credible local context representation (CLCR) method that significantly enhances model generalisation through geometric structure mining in feature space. This method innovatively constructs a two-stage learning framework: introducing a data-enhanced mutual information regularisation term in the pretraining stage of the source model to enhance the model's learning of sample discriminative features; design a deep space fixed step walking strategy during the target domain adaptation phase, dynamically capture the local credible contextual features of each target sample and use them as pseudo-labels for semantic fusion. Experiments on the three benchmark datasets of Office-31, Office Home and VisDA show that CLCR achieves an average accuracy of 89.2% in 12 cross-domain tasks, which is 3.1% higher than the existing optimal SFDA method and even surpasses some domain adaptation methods that require the participation of source data. This work provides a new approach to address the privacy performance c","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 3","pages":"635-637"},"PeriodicalIF":7.3,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70019","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Lightweight YOLOv5 Target Detection Model and Its Application to the Measurement of 100-Kernel Weight of Corn Seeds 一种轻量级的YOLOv5目标检测模型及其在玉米种子百粒重测量中的应用
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-28 DOI: 10.1049/cit2.70031
Helong Yu, Jiayao Zhao, Chun Guang Bi, Lei Shi, Huiling Chen

The 100-kernel weight of corn seed is a crucial metric for assessing corn quality, and the current measurement means mostly involve manual counting of kernels followed by weighing on a balance, which is labour-intensive and time-consuming. Aiming to address the problem of low efficiency in measuring the 100-kernel weight of corn seeds, this study proposes a measurement method based on deep learning and machine vision. In this study, high-contrast camera technology was utilised to capture image data of corn seeds. And improvements were made to the feature extraction network of the YOLOv5 model by incorporating the MobileNetV3 network structure. The novel model employs deep separable convolution to decrease parameters and computational load. It incorporates a linear bottleneck and inverted residual structure to enhance efficiency. It introduces an SE attention mechanism for direct learning of channel number features and updates the activation function. Algorithms and experiments were subsequently designed to calculate the 100-grain weight in conjunction with the output of the model. The outcomes revealed that the enhanced model in this study achieved an accuracy of 90.1%, a recall rate of 91.3%, and a mAP (mean average precision) value of 92.2%. While meeting production requirements, this model significantly reduces the number of parameters compared to alternative models—50% of the original model. In an applied study focused on measuring the 100-kernel weight of corn seeds, the counting accuracy yielded a remarkable 97.18%, while the accuracy for weight measurement results reached 94.2%. This study achieves both efficient and precise measurement of the 100-kernel weight of maize seeds, presenting a novel perspective in the exploration of maize seed weight.

玉米种子的百粒重是评估玉米质量的一个关键指标,目前的测量手段主要是人工数粒,然后在天平上称重,这是一种劳动密集型和耗时的方法。针对玉米种子百粒重测量效率低的问题,本研究提出了一种基于深度学习和机器视觉的测量方法。本研究采用高对比度相机技术对玉米种子进行图像数据采集。并结合MobileNetV3网络结构对YOLOv5模型的特征提取网络进行了改进。该模型采用深度可分离卷积来减少参数和计算量。它采用线性瓶颈和倒立残余结构来提高效率。引入了一种SE关注机制,用于直接学习频道号特征,并更新了激活函数。随后设计算法和实验,结合模型的输出计算100粒重。结果表明,该模型的准确率为90.1%,查全率为91.3%,mAP (mean average precision)值为92.2%。在满足生产要求的同时,与其他型号相比,该型号大大减少了参数数量-原始型号的50%。在一项以玉米种子百粒重测量为重点的应用研究中,计数准确率达到了97.18%,而称重结果的准确率达到了94.2%。本研究实现了玉米种子百粒重的高效、精确测量,为玉米种子重的研究提供了新的视角。
{"title":"A Lightweight YOLOv5 Target Detection Model and Its Application to the Measurement of 100-Kernel Weight of Corn Seeds","authors":"Helong Yu,&nbsp;Jiayao Zhao,&nbsp;Chun Guang Bi,&nbsp;Lei Shi,&nbsp;Huiling Chen","doi":"10.1049/cit2.70031","DOIUrl":"https://doi.org/10.1049/cit2.70031","url":null,"abstract":"<p>The 100-kernel weight of corn seed is a crucial metric for assessing corn quality, and the current measurement means mostly involve manual counting of kernels followed by weighing on a balance, which is labour-intensive and time-consuming. Aiming to address the problem of low efficiency in measuring the 100-kernel weight of corn seeds, this study proposes a measurement method based on deep learning and machine vision. In this study, high-contrast camera technology was utilised to capture image data of corn seeds. And improvements were made to the feature extraction network of the YOLOv5 model by incorporating the MobileNetV3 network structure. The novel model employs deep separable convolution to decrease parameters and computational load. It incorporates a linear bottleneck and inverted residual structure to enhance efficiency. It introduces an SE attention mechanism for direct learning of channel number features and updates the activation function. Algorithms and experiments were subsequently designed to calculate the 100-grain weight in conjunction with the output of the model. The outcomes revealed that the enhanced model in this study achieved an accuracy of 90.1%, a recall rate of 91.3%, and a mAP (mean average precision) value of 92.2%. While meeting production requirements, this model significantly reduces the number of parameters compared to alternative models—50% of the original model. In an applied study focused on measuring the 100-kernel weight of corn seeds, the counting accuracy yielded a remarkable 97.18%, while the accuracy for weight measurement results reached 94.2%. This study achieves both efficient and precise measurement of the 100-kernel weight of maize seeds, presenting a novel perspective in the exploration of maize seed weight.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 5","pages":"1521-1534"},"PeriodicalIF":7.3,"publicationDate":"2025-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70031","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145366938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-24 DOI: 10.1049/cit2.70029
Jin Zhang, Ziyue Zhang, Lobsang Yeshi, Dorje Tashi, Xiangshi Wang, Yuqing Cai, Yongbin Yu, Xiangxiang Wang, Nyima Tashi, Gadeng Luosang

Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information, failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi-level and multi-granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.

本文提出了一种改进的嵌入表示方法——音节-词-句嵌入。通过利用不同粒度的特征,利用无尺度点积关注集中关键特征进行特征融合,将音节-词-句嵌入融入到特征表示中,增强了特征表示的专一性和多样性。我们在不同领域的数据集上评估了我们提出的模型。59%,与香草FLAT相比提高了1.24%。39%,比vanilla FLAT提高了1.34%。
{"title":"Tibetan Medical Named Entity Recognition Based on Syllable-Word-Sentence Embedding Transformer","authors":"Jin Zhang,&nbsp;Ziyue Zhang,&nbsp;Lobsang Yeshi,&nbsp;Dorje Tashi,&nbsp;Xiangshi Wang,&nbsp;Yuqing Cai,&nbsp;Yongbin Yu,&nbsp;Xiangxiang Wang,&nbsp;Nyima Tashi,&nbsp;Gadeng Luosang","doi":"10.1049/cit2.70029","DOIUrl":"10.1049/cit2.70029","url":null,"abstract":"<p>Tibetan medical named entity recognition (Tibetan MNER) involves extracting specific types of medical entities from unstructured Tibetan medical texts. Tibetan MNER provide important data support for the work related to Tibetan medicine. However, existing Tibetan MNER methods often struggle to comprehensively capture multi-level semantic information, failing to sufficiently extract multi-granularity features and effectively filter out irrelevant information, which ultimately impacts the accuracy of entity recognition. This paper proposes an improved embedding representation method called syllable–word–sentence embedding. By leveraging features at different granularities and using un-scaled dot-product attention to focus on key features for feature fusion, the syllable–word–sentence embedding is integrated into the transformer, enhancing the specificity and diversity of feature representations. The model leverages multi-level and multi-granularity semantic information, thereby improving the performance of Tibetan MNER. We evaluate our proposed model on datasets from various domains. The results indicate that the model effectively identified three types of entities in the Tibetan news dataset we constructed, achieving an F1 score of 93.59%, which represents an improvement of 1.24% compared to the vanilla FLAT. Additionally, results from the Tibetan medical dataset we developed show that it is effective in identifying five kinds of medical entities, with an F1 score of 71.39%, which is a 1.34% improvement over the vanilla FLAT.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1148-1158"},"PeriodicalIF":7.3,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70029","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robot Manipulation Based on Embodied Visual Perception: A Survey 基于具身视觉感知的机器人操作研究进展
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-21 DOI: 10.1049/cit2.70022
Sicheng Wang, Milutin N. Nikolić, Tin Lun Lam, Qing Gao, Runwei Ding, Tianwei Zhang

Visual perception is critical in robotic operations, particularly in collaborative and autonomous robot systems. Through efficient visual systems, robots can acquire and process environmental information in real-time, recognise objects, assess spatial relationships, and make adaptive decisions. This review aims to provide a comprehensive overview of the latest advancements in the field of vision as applied to robotic perception, focusing primarily on visual applications in the areas of object perception, self-perception, human–robot collaboration, and multi-robot collaboration. By summarising the current state of development and analysing the challenges and opportunities that remain in these areas, this paper offers a thorough examination of the integration of visual perception with operational robotics. It further inspires future research and drives the application and development of visual perception across various robotic domains, enabling operational robots to better adapt to complex environments and reliably accomplish tasks.

视觉感知在机器人操作中至关重要,特别是在协作和自主机器人系统中。通过高效的视觉系统,机器人可以实时获取和处理环境信息,识别物体,评估空间关系,并做出适应性决策。本文综述了视觉技术在机器人感知领域的最新进展,重点介绍了视觉技术在物体感知、自我感知、人机协作和多机器人协作等领域的应用。通过总结当前的发展状况,分析这些领域仍然存在的挑战和机遇,本文提供了视觉感知与操作机器人集成的全面检查。它进一步激发了未来的研究,并推动了视觉感知在各种机器人领域的应用和发展,使操作机器人能够更好地适应复杂的环境并可靠地完成任务。
{"title":"Robot Manipulation Based on Embodied Visual Perception: A Survey","authors":"Sicheng Wang,&nbsp;Milutin N. Nikolić,&nbsp;Tin Lun Lam,&nbsp;Qing Gao,&nbsp;Runwei Ding,&nbsp;Tianwei Zhang","doi":"10.1049/cit2.70022","DOIUrl":"10.1049/cit2.70022","url":null,"abstract":"<p>Visual perception is critical in robotic operations, particularly in collaborative and autonomous robot systems. Through efficient visual systems, robots can acquire and process environmental information in real-time, recognise objects, assess spatial relationships, and make adaptive decisions. This review aims to provide a comprehensive overview of the latest advancements in the field of vision as applied to robotic perception, focusing primarily on visual applications in the areas of object perception, self-perception, human–robot collaboration, and multi-robot collaboration. By summarising the current state of development and analysing the challenges and opportunities that remain in these areas, this paper offers a thorough examination of the integration of visual perception with operational robotics. It further inspires future research and drives the application and development of visual perception across various robotic domains, enabling operational robots to better adapt to complex environments and reliably accomplish tasks.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"945-958"},"PeriodicalIF":7.3,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70022","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNN-Based Sequence-Aware Recommenders for Tourist Attractions 基于rnn的旅游景点序列感知推荐
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-21 DOI: 10.1049/cit2.70027
Hee Jun Lee, Yang Sok Kim, Won Seok Lee, In Hyeok Choi, Choong Kwon Lee

Selecting appropriate tourist attractions to visit in real time is an important problem for travellers. Since recommenders proactively suggest items based on user preference, they are a promising solution for this problem. Travellers visit tourist attractions sequentially by considering multiple attributes at the same time. Therefore, it is desirable to consider this when developing recommenders for tourist attractions. Using GRU4REC, we proposed RNN-based sequence-aware recommenders (RNN-SARs) that use multiple sequence datasets for training the recommended model, named multi-RNN-SARs. We proposed two types of multi-RNN-SARs—concatenate-RNN-SARs and parallel-RNN-SARs. In order to evaluate multi-RNN-SARs, we compared hit rate (HR) and mean reciprocal rank (MRR) of the item-based collaborative filtering recommender (item-CFR), RNN-SAR with the single-sequence dataset (basic-RNN-SAR), multi-RNN-SARs and the state-of-the-art SARs using a real-world travel dataset. Our research shows that multi-RNN-SARs have significantly higher performances compared to item-CFR. Not all multi-RNN-SARs outperform basic-RNN-SAR but the best multi-RNN-SAR achieves comparable performance to that of the state-of-the-art algorithms. These results highlight the importance of using multiple sequence datasets in RNN-SARs and the importance of choosing appropriate sequence datasets and learning methods for implementing multi-RNN-SARs in practice.

实时选择合适的旅游景点是旅游者面临的一个重要问题。由于推荐器会根据用户偏好主动推荐商品,因此它们是解决这个问题的一个很有希望的解决方案。旅游者通过同时考虑多个属性来顺序访问旅游景点。因此,在制定旅游景点推荐时,最好考虑到这一点。使用GRU4REC,我们提出了基于rnn的序列感知推荐器(rnn - sar),它使用多个序列数据集来训练推荐模型,称为multi- rnn - sar。我们提出了两种类型的多rnn - sars -串联rnn - sars和并行rnn - sars。为了评估多RNN-SAR,我们比较了基于项目的协同过滤推荐器(item-CFR)、RNN-SAR与单序列数据集(basic-RNN-SAR)、多RNN-SAR和使用真实旅行数据集的最先进sar的命中率(HR)和平均倒数等级(MRR)。我们的研究表明,与单项cfr相比,多重rnn - sar具有显著更高的性能。并非所有的多rnn - sar都优于基本rnn - sar,但最好的多rnn - sar可以达到与最先进算法相当的性能。这些结果强调了在rnn - sar中使用多个序列数据集的重要性,以及在实践中选择合适的序列数据集和学习方法来实现多rnn - sar的重要性。
{"title":"RNN-Based Sequence-Aware Recommenders for Tourist Attractions","authors":"Hee Jun Lee,&nbsp;Yang Sok Kim,&nbsp;Won Seok Lee,&nbsp;In Hyeok Choi,&nbsp;Choong Kwon Lee","doi":"10.1049/cit2.70027","DOIUrl":"10.1049/cit2.70027","url":null,"abstract":"<p>Selecting appropriate tourist attractions to visit in real time is an important problem for travellers. Since recommenders proactively suggest items based on user preference, they are a promising solution for this problem. Travellers visit tourist attractions sequentially by considering multiple attributes at the same time. Therefore, it is desirable to consider this when developing recommenders for tourist attractions. Using GRU4REC, we proposed RNN-based sequence-aware recommenders (RNN-SARs) that use multiple sequence datasets for training the recommended model, named multi-RNN-SARs. We proposed two types of multi-RNN-SARs—concatenate-RNN-SARs and parallel-RNN-SARs. In order to evaluate multi-RNN-SARs, we compared hit rate (HR) and mean reciprocal rank (MRR) of the item-based collaborative filtering recommender (item-CFR), RNN-SAR with the single-sequence dataset (basic-RNN-SAR), multi-RNN-SARs and the state-of-the-art SARs using a real-world travel dataset. Our research shows that multi-RNN-SARs have significantly higher performances compared to item-CFR. Not all multi-RNN-SARs outperform basic-RNN-SAR but the best multi-RNN-SAR achieves comparable performance to that of the state-of-the-art algorithms. These results highlight the importance of using multiple sequence datasets in RNN-SARs and the importance of choosing appropriate sequence datasets and learning methods for implementing multi-RNN-SARs in practice.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1077-1088"},"PeriodicalIF":7.3,"publicationDate":"2025-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70027","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Diverse Models, United Goal: A Comprehensive Survey of Ensemble Learning 多元模式,统一目标:集成学习的综合研究
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-20 DOI: 10.1049/cit2.70030
Ziwei Fan, Zhiwen Yu, Kaixiang Yang, Wuxing Chen, Xiaoqing Liu, Guojie Li, Xianling Yang, C. L. Philip Chen

Ensemble learning, a pivotal branch of machine learning, amalgamates multiple base models to enhance the overarching performance of predictive models, capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting. In this review, a four-layer research framework is established for the research of ensemble learning, which can offer a comprehensive and structured review of ensemble learning from bottom to top. Firstly, this survey commences by introducing fundamental ensemble learning techniques, including bagging, boosting, and stacking, while also exploring the ensemble's diversity. Then, deep ensemble learning and semi-supervised ensemble learning are studied in detail. Furthermore, the utilisation of ensemble learning techniques to navigate challenging datasets, such as imbalanced and high-dimensional data, is discussed. The application of ensemble learning techniques across various research domains, including healthcare, transportation, finance, manufacturing, and the Internet, is also examined. The survey concludes by discussing challenges intrinsic to ensemble learning.

集成学习是机器学习的一个关键分支,它整合了多个基本模型来增强预测模型的总体性能,利用集成的多样性和集体智慧来超越单个模型并减轻过拟合。本文为集成学习的研究建立了一个四层的研究框架,可以从下到上对集成学习进行全面、结构化的回顾。首先,本研究首先介绍了基本的集成学习技术,包括bagging、boosting和stacking,同时也探索了集成的多样性。然后详细研究了深度集成学习和半监督集成学习。此外,还讨论了利用集成学习技术来导航具有挑战性的数据集,如不平衡和高维数据。还研究了集成学习技术在各种研究领域的应用,包括医疗保健、交通运输、金融、制造业和互联网。调查最后讨论了集成学习的内在挑战。
{"title":"Diverse Models, United Goal: A Comprehensive Survey of Ensemble Learning","authors":"Ziwei Fan,&nbsp;Zhiwen Yu,&nbsp;Kaixiang Yang,&nbsp;Wuxing Chen,&nbsp;Xiaoqing Liu,&nbsp;Guojie Li,&nbsp;Xianling Yang,&nbsp;C. L. Philip Chen","doi":"10.1049/cit2.70030","DOIUrl":"10.1049/cit2.70030","url":null,"abstract":"<p>Ensemble learning, a pivotal branch of machine learning, amalgamates multiple base models to enhance the overarching performance of predictive models, capitalising on the diversity and collective wisdom of the ensemble to surpass individual models and mitigate overfitting. In this review, a four-layer research framework is established for the research of ensemble learning, which can offer a comprehensive and structured review of ensemble learning from bottom to top. Firstly, this survey commences by introducing fundamental ensemble learning techniques, including bagging, boosting, and stacking, while also exploring the ensemble's diversity. Then, deep ensemble learning and semi-supervised ensemble learning are studied in detail. Furthermore, the utilisation of ensemble learning techniques to navigate challenging datasets, such as imbalanced and high-dimensional data, is discussed. The application of ensemble learning techniques across various research domains, including healthcare, transportation, finance, manufacturing, and the Internet, is also examined. The survey concludes by discussing challenges intrinsic to ensemble learning.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"959-982"},"PeriodicalIF":7.3,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70030","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-DeepNet: A Cooperative Convolutional Neural Network for DNA Methylation-Based Age Prediction 协同深度网络:基于DNA甲基化的年龄预测的协同卷积神经网络
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-20 DOI: 10.1049/cit2.70026
Najmeh Sadat Jaddi, Mohammad Saniee Abadeh, Niousha Bagheri Khoulenjani, Salwani Abdullah, MohammadMahdi Ariannejad, Mohd Zakree Ahmad Nazri, Fatemeh Alvankarian

Prediction of the age of each individual is possible using the changing pattern of DNA methylation with age. In this paper an age prediction approach to work out multivariate regression problems using DNA methylation data is developed. In this research study a convolutional neural network (CNN)-based model optimised by the genetic algorithm (GA) is addressed. This paper contributes to enhancing age prediction as a regression problem using a union of two CNNs and exchanging knowledge between them. This specifically re-starts the training process from a possibly higher-quality point in different iterations and, consequently, causes potentially yeilds better results at each iteration. The method proposed, which is called cooperative deep neural network (Co-DeepNet), is tested on two types of age prediction problems. Sixteen datasets containing 1899 healthy blood samples and nine datasets containing 2395 diseased blood samples are employed to examine the method's efficiency. As a result, the mean absolute deviation (MAD) is 1.49 and 3.61 years for training and testing data, respectively, when the healthy data is tested. The diseased blood data show MAD results of 3.81 and 5.43 years for training and testing data, respectively. The results of the Co-DeepNet are compared with six other methods proposed in previous studies and a single CNN using four prediction accuracy measurements (R2, MAD, MSE and RMSE). The effectiveness of the Co-DeepNet and superiority of its results is proved through the statistical analysis.

利用DNA甲基化随年龄变化的模式来预测每个人的年龄是可能的。本文提出了一种利用DNA甲基化数据求解多元回归问题的年龄预测方法。本文研究了一种基于卷积神经网络(CNN)的遗传算法优化模型。本文利用两个cnn的联合和它们之间的知识交换,将年龄预测作为一个回归问题来增强。这特别地在不同的迭代中从可能更高质量的点重新开始训练过程,因此,在每次迭代中都可能产生更好的结果。该方法被称为合作深度神经网络(Co-DeepNet),并在两类年龄预测问题上进行了测试。使用包含1899个健康血液样本的16个数据集和包含2395个患病血液样本的9个数据集来检验该方法的有效性。因此,对健康数据进行测试时,训练数据和测试数据的平均绝对偏差(MAD)分别为1.49和3.61年。病变血液数据的训练和检测数据的MAD结果分别为3.81年和5.43年。将Co-DeepNet的结果与先前研究中提出的其他六种方法以及使用四种预测精度测量(R2, MAD, MSE和RMSE)的单个CNN进行了比较。通过统计分析,证明了协同深度网络的有效性和结果的优越性。
{"title":"Co-DeepNet: A Cooperative Convolutional Neural Network for DNA Methylation-Based Age Prediction","authors":"Najmeh Sadat Jaddi,&nbsp;Mohammad Saniee Abadeh,&nbsp;Niousha Bagheri Khoulenjani,&nbsp;Salwani Abdullah,&nbsp;MohammadMahdi Ariannejad,&nbsp;Mohd Zakree Ahmad Nazri,&nbsp;Fatemeh Alvankarian","doi":"10.1049/cit2.70026","DOIUrl":"10.1049/cit2.70026","url":null,"abstract":"<p>Prediction of the age of each individual is possible using the changing pattern of DNA methylation with age. In this paper an age prediction approach to work out multivariate regression problems using DNA methylation data is developed. In this research study a convolutional neural network (CNN)-based model optimised by the genetic algorithm (GA) is addressed. This paper contributes to enhancing age prediction as a regression problem using a union of two CNNs and exchanging knowledge between them. This specifically re-starts the training process from a possibly higher-quality point in different iterations and, consequently, causes potentially yeilds better results at each iteration. The method proposed, which is called cooperative deep neural network (Co-DeepNet), is tested on two types of age prediction problems. Sixteen datasets containing 1899 healthy blood samples and nine datasets containing 2395 diseased blood samples are employed to examine the method's efficiency. As a result, the mean absolute deviation (MAD) is 1.49 and 3.61 years for training and testing data, respectively, when the healthy data is tested. The diseased blood data show MAD results of 3.81 and 5.43 years for training and testing data, respectively. The results of the Co-DeepNet are compared with six other methods proposed in previous studies and a single CNN using four prediction accuracy measurements (<i>R</i><sup>2</sup>, MAD, MSE and RMSE). The effectiveness of the Co-DeepNet and superiority of its results is proved through the statistical analysis.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1118-1134"},"PeriodicalIF":7.3,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70026","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring High Dimensional Feature Space With Channel-Spatial Nonlinear Transforms for Learned Image Compression 利用通道空间非线性变换探索高维特征空间用于学习图像压缩
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-10 DOI: 10.1049/cit2.70025
Wen Tan, Fanyang Meng, Chao Li, Youneng Bao, Yongsheng Liang

Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.

非线性变换极大地提高了学习图像压缩(LIC),特别是使用残差块。该变换增强了非线性表达能力,通过扩大接收野获得紧凑的特征表示,表明卷积过程是如何在高维特征空间中提取特征的。但其功能受限于空间维度和网络深度,由于信息交互和表征不足,限制了网络性能的进一步提高。至关重要的是,通道维度上高维特征空间的潜力和网络宽度/分辨率的探索在很大程度上仍未得到开发。本文从特征空间的角度考虑非线性变换,定义不同维度的高维特征空间,并研究其具体效果。首先,在通道维度和空间维度上引入增维和降维变换,获得高维特征空间,实现更好的特征提取;其次,我们设计了信道空间融合残差变换(CSR),该变换结合了多维变换以获得更有效的表示。此外,我们简化了所提出的融合变换,以获得一个精简架构(CSR-sm),平衡了网络复杂性和压缩性能。最后,我们利用堆叠CSR变换构建整体网络,以达到更好的压缩和重构。实验结果表明,与现有的LIC方法和传统编解码器相比,该方法具有更好的率失真性能。具体来说,我们提出的方法在柯达数据集上比VVC降低了9.38%的bd率。
{"title":"Exploring High Dimensional Feature Space With Channel-Spatial Nonlinear Transforms for Learned Image Compression","authors":"Wen Tan,&nbsp;Fanyang Meng,&nbsp;Chao Li,&nbsp;Youneng Bao,&nbsp;Yongsheng Liang","doi":"10.1049/cit2.70025","DOIUrl":"10.1049/cit2.70025","url":null,"abstract":"<p>Nonlinear transforms have significantly advanced learned image compression (LIC), particularly using residual blocks. This transform enhances the nonlinear expression ability and obtain compact feature representation by enlarging the receptive field, which indicates how the convolution process extracts features in a high dimensional feature space. However, its functionality is restricted to the spatial dimension and network depth, limiting further improvements in network performance due to insufficient information interaction and representation. Crucially, the potential of high dimensional feature space in the channel dimension and the exploration of network width/resolution remain largely untapped. In this paper, we consider nonlinear transforms from the perspective of feature space, defining high-dimensional feature spaces in different dimensions and investigating the specific effects. Firstly, we introduce the dimension increasing and decreasing transforms in both channel and spatial dimensions to obtain high dimensional feature space and achieve better feature extraction. Secondly, we design a channel-spatial fusion residual transform (CSR), which incorporates multi-dimensional transforms for a more effective representation. Furthermore, we simplify the proposed fusion transform to obtain a slim architecture (CSR-sm), balancing network complexity and compression performance. Finally, we build the overall network with stacked CSR transforms to achieve better compression and reconstruction. Experimental results demonstrate that the proposed method can achieve superior rate-distortion performance compared to the existing LIC methods and traditional codecs. Specifically, our proposed method achieves 9.38% BD-rate reduction over VVC on Kodak dataset.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1235-1253"},"PeriodicalIF":7.3,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70025","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144910121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intelligent Medical Diagnosis Model Based on Graph Neural Networks for Medical Images 基于图神经网络的医学图像智能诊断模型
IF 7.3 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-06-05 DOI: 10.1049/cit2.70020
Ashutosh Sharma, Amit Sharma, Kai Guo

Recently, numerous estimation issues have been solved due to the developments in data-driven artificial neural networks (ANN) and graph neural networks (GNN). The primary limitation of previous methodologies has been the dependence on data that can be structured in a grid format. However, physiological recordings often exhibit irregular and unordered patterns, posing a significant challenge in conceptualising them as matrices. As a result, GNNs which comprise interactive nodes connected by edges whose weights are defined by anatomical junctions or temporal relationships have received a lot of consideration by leveraging implicit data that exists in a biological system. Additionally, our study incorporates a structural GNN to effectively differentiate between different degrees of infection in both the left and right hemispheres of the brain. Subsequently, demographic data are included, and a multi-task learning architecture is devised, integrating classification and regression tasks. The trials used an authentic dataset, including 800 brain x-ray pictures, consisting of 560 instances classified as moderate cases and 240 instances classified as severe cases. Based on empirical evidence, our methodology demonstrates superior performance in classification, surpassing other comparison methods with a notable achievement of 92.27% in terms of area under the curve as well as a correlation coefficient of 0.62.

近年来,由于数据驱动的人工神经网络(ANN)和图神经网络(GNN)的发展,许多估计问题得到了解决。以前的方法的主要限制是依赖于可以以网格格式结构化的数据。然而,生理记录经常表现出不规则和无序的模式,这对将它们概念化为矩阵提出了重大挑战。因此,gnn包含由边连接的交互节点,其权重由解剖连接或时间关系定义,通过利用生物系统中存在的隐式数据得到了很多考虑。此外,我们的研究纳入了结构性GNN,以有效区分大脑左右半球不同程度的感染。在此基础上,结合人口统计数据,设计了一种集分类和回归任务于一体的多任务学习架构。试验使用了一个真实的数据集,包括800张脑x线照片,其中560例被分类为中度病例,240例被分类为重度病例。基于经验证据,我们的方法在分类方面表现出优越的性能,优于其他比较方法,在曲线下面积方面取得了92.27%的显著成绩,相关系数为0.62。
{"title":"Intelligent Medical Diagnosis Model Based on Graph Neural Networks for Medical Images","authors":"Ashutosh Sharma,&nbsp;Amit Sharma,&nbsp;Kai Guo","doi":"10.1049/cit2.70020","DOIUrl":"10.1049/cit2.70020","url":null,"abstract":"<p>Recently, numerous estimation issues have been solved due to the developments in data-driven artificial neural networks (ANN) and graph neural networks (GNN). The primary limitation of previous methodologies has been the dependence on data that can be structured in a grid format. However, physiological recordings often exhibit irregular and unordered patterns, posing a significant challenge in conceptualising them as matrices. As a result, GNNs which comprise interactive nodes connected by edges whose weights are defined by anatomical junctions or temporal relationships have received a lot of consideration by leveraging implicit data that exists in a biological system. Additionally, our study incorporates a structural GNN to effectively differentiate between different degrees of infection in both the left and right hemispheres of the brain. Subsequently, demographic data are included, and a multi-task learning architecture is devised, integrating classification and regression tasks. The trials used an authentic dataset, including 800 brain x-ray pictures, consisting of 560 instances classified as moderate cases and 240 instances classified as severe cases. Based on empirical evidence, our methodology demonstrates superior performance in classification, surpassing other comparison methods with a notable achievement of 92.27% in terms of area under the curve as well as a correlation coefficient of 0.62.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"10 4","pages":"1201-1216"},"PeriodicalIF":7.3,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.70020","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144909957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
CAAI Transactions on Intelligence Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1