首页 > 最新文献

Journal of King Saud University-Computer and Information Sciences最新文献

英文 中文
Fast and robust JND-guided video watermarking scheme in spatial domain 空间域快速稳健的 JND 引导视频水印方案
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-09-30 DOI: 10.1016/j.jksuci.2024.102199
Antonio Cedillo-Hernandez , Lydia Velazquez-Garcia , Manuel Cedillo-Hernandez , David Conchouso-Gonzalez
Generally speaking, those watermarking studies using the spatial domain tend to be fast but with limited robustness and imperceptibility while those performed in other transform domains are robust but have high computational cost. Watermarking applied to digital video has as one of the main challenges the large amount of computational power required due to the huge amount of information to be processed. In this paper we propose a watermarking algorithm for digital video that addresses this problem. To increase the speed, the watermark is embedded using a technique to modify the DCT coefficients directly in the spatial domain, in addition to carrying out this process considering the video scene as the basic unit and not the video frame. In terms of robustness, the watermark is modulated by a Just Noticeable Distortion (JND) scheme computed directly in the spatial domain guided by visual attention to increase the strength of the watermark to the maximum level but without this operation being perceivable by human eyes. Experimental results confirm that the proposed method achieves remarkable performance in terms of processing time, robustness and imperceptibility compared to previous studies.
一般来说,使用空间域进行的水印研究往往速度快,但鲁棒性和不可感知性有限,而使用其他变换域进行的水印研究鲁棒性强,但计算成本高。数字视频水印技术面临的主要挑战之一是,由于需要处理的信息量巨大,因此需要大量的计算能力。本文针对这一问题提出了一种数字视频水印算法。为了提高速度,我们采用了一种在空间域直接修改 DCT 系数的技术来嵌入水印,此外,我们还将视频场景而不是视频帧作为基本单位来执行这一过程。在鲁棒性方面,水印是通过直接在空间域计算的 "刚注意到的失真"(JND)方案调制的,该方案以视觉注意力为导向,将水印强度提高到最大水平,但人眼无法感知这一操作。实验结果证实,与之前的研究相比,所提出的方法在处理时间、鲁棒性和不可感知性方面都取得了显著的性能。
{"title":"Fast and robust JND-guided video watermarking scheme in spatial domain","authors":"Antonio Cedillo-Hernandez ,&nbsp;Lydia Velazquez-Garcia ,&nbsp;Manuel Cedillo-Hernandez ,&nbsp;David Conchouso-Gonzalez","doi":"10.1016/j.jksuci.2024.102199","DOIUrl":"10.1016/j.jksuci.2024.102199","url":null,"abstract":"<div><div>Generally speaking, those watermarking studies using the spatial domain tend to be fast but with limited robustness and imperceptibility while those performed in other transform domains are robust but have high computational cost. Watermarking applied to digital video has as one of the main challenges the large amount of computational power required due to the huge amount of information to be processed. In this paper we propose a watermarking algorithm for digital video that addresses this problem. To increase the speed, the watermark is embedded using a technique to modify the DCT coefficients directly in the spatial domain, in addition to carrying out this process considering the video scene as the basic unit and not the video frame. In terms of robustness, the watermark is modulated by a Just Noticeable Distortion (JND) scheme computed directly in the spatial domain guided by visual attention to increase the strength of the watermark to the maximum level but without this operation being perceivable by human eyes. Experimental results confirm that the proposed method achieves remarkable performance in terms of processing time, robustness and imperceptibility compared to previous studies.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102199"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advanced security measures in coupled phase-shift STAR-RIS networks: A DRL approach 耦合相移 STAR-RIS 网络中的高级安全措施:DRL 方法
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-09 DOI: 10.1016/j.jksuci.2024.102215
Abdul Wahid , Syed Zain Ul Abideen , Manzoor Ahmed , Wali Ullah Khan , Muhammad Sheraz , Teong Chee Chuah , Ying Loong Lee
The rapid development of next-generation wireless networks has intensified the need for robust security measures, particularly in environments susceptible to eavesdropping. Simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) have emerged as a transformative technology, offering full-space coverage by manipulating electromagnetic wave propagation. However, the inherent flexibility of STAR-RIS introduces new vulnerabilities, making secure communication a significant challenge. To overcome these challenges, we propose a deep reinforcement learning (DRL) based secure and efficient beamforming optimization strategy, leveraging the deep deterministic policy gradient (DDPG) algorithm. By framing the problem as a Markov decision process (MDP), our approach enables the DDPG algorithm to learn optimal strategies for beamforming and transmission and reflection coefficients (TARCs) configurations. This method is specifically designed to optimize phase-shift coefficients within the STAR-RIS environment, effectively managing the coupled phase shifts and complex interactions that are critical for enhancing physical layer security (PLS). Through extensive simulations, we demonstrate that our DRL-based strategy not only outperforms traditional optimization techniques but also achieves real-time adaptive optimization, significantly improving both confidentiality and network efficiency. This research addresses key gaps in secure wireless communication and sets a new standard for future advancements in intelligent, adaptable network technologies.
下一代无线网络的快速发展加剧了对稳健安全措施的需求,尤其是在易被窃听的环境中。同时发射和反射可重构智能表面(STAR-RIS)作为一种变革性技术应运而生,通过操纵电磁波传播提供全空间覆盖。然而,STAR-RIS 固有的灵活性带来了新的漏洞,使安全通信成为一项重大挑战。为了克服这些挑战,我们利用深度确定性策略梯度(DDPG)算法,提出了一种基于深度强化学习(DRL)的安全高效波束成形优化策略。通过将问题框架化为马尔可夫决策过程(MDP),我们的方法使 DDPG 算法能够学习波束成形和传输与反射系数(TARC)配置的最佳策略。这种方法专为在 STAR-RIS 环境中优化相移系数而设计,可有效管理耦合相移和复杂的相互作用,这对增强物理层安全性(PLS)至关重要。通过大量仿真,我们证明了基于 DRL 的策略不仅优于传统优化技术,还能实现实时自适应优化,从而显著提高保密性和网络效率。这项研究填补了安全无线通信领域的关键空白,为未来智能、自适应网络技术的发展树立了新标准。
{"title":"Advanced security measures in coupled phase-shift STAR-RIS networks: A DRL approach","authors":"Abdul Wahid ,&nbsp;Syed Zain Ul Abideen ,&nbsp;Manzoor Ahmed ,&nbsp;Wali Ullah Khan ,&nbsp;Muhammad Sheraz ,&nbsp;Teong Chee Chuah ,&nbsp;Ying Loong Lee","doi":"10.1016/j.jksuci.2024.102215","DOIUrl":"10.1016/j.jksuci.2024.102215","url":null,"abstract":"<div><div>The rapid development of next-generation wireless networks has intensified the need for robust security measures, particularly in environments susceptible to eavesdropping. Simultaneous transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS) have emerged as a transformative technology, offering full-space coverage by manipulating electromagnetic wave propagation. However, the inherent flexibility of STAR-RIS introduces new vulnerabilities, making secure communication a significant challenge. To overcome these challenges, we propose a deep reinforcement learning (DRL) based secure and efficient beamforming optimization strategy, leveraging the deep deterministic policy gradient (DDPG) algorithm. By framing the problem as a Markov decision process (MDP), our approach enables the DDPG algorithm to learn optimal strategies for beamforming and transmission and reflection coefficients (TARCs) configurations. This method is specifically designed to optimize phase-shift coefficients within the STAR-RIS environment, effectively managing the coupled phase shifts and complex interactions that are critical for enhancing physical layer security (PLS). Through extensive simulations, we demonstrate that our DRL-based strategy not only outperforms traditional optimization techniques but also achieves real-time adaptive optimization, significantly improving both confidentiality and network efficiency. This research addresses key gaps in secure wireless communication and sets a new standard for future advancements in intelligent, adaptable network technologies.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102215"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142432505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Endoscopic video aided identification method for gastric area 内窥镜视频辅助胃区识别方法
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-05 DOI: 10.1016/j.jksuci.2024.102208
Xiangwei Zheng, Dejian Su, Xuanchi Chen, Mingzhe Zhang
Probe-based confocal laser endomicroscopy (pCLE) is a significant diagnostic instrument and is frequently utilized to diagnose the severity of gastric intestinal metaplasia (GIM). The physicians must comprehensively analyze video clips recorded with pCLE from the gastric antrum, gastric body, and gastric angle area to determine the patient’s condition. However, due to the limitations of the pCLE’s microscopic imaging structure, the gastric areas detected cannot be identified and recorded in real time, which may poses a risk of missing potential areas of disease occurrence and is not conducive to the subsequent precise treatment of the lesion area. Therefore, this paper proposes an endoscopic video aided identification method for identifying gastric areas (EVIGA), which are utilized for determining the detected areas of pCLE in real-time. Firstly, the start time of the diagnosis clip is determined by real-time detecting the working states of pCLE. Then, the endoscopic video clip is truncated according to the correspondence between pCLE and endoscopic video in the time sequence for detecting gastric areas. In order to accurately identify pCLE detected gastric areas, a probe-based confocal laser endomicroscopy diagnosis area identification model (pCLEDAM) is constructed with an hourglass convolution designed for single-frame feature extraction and a temporal feature-sensitive extraction structure for spatial feature extraction. The extracted feature maps are unfolded and fed into the fully connected layer to classify the detected areas. To validate the proposed method, 67 clinical confocal laser endomicroscopy diagnosis cases are collected from a tertiary care hospital, and 500 video clips are finally reserved after audited for dataset construction. Experiments show that the accuracy of area identification on the test dataset achieves 96.0% and is much higher than other related algorithms, achieving the accurate identification of pCLE detected areas.
探针共焦激光内窥镜(pCLE)是一种重要的诊断仪器,经常被用来诊断胃肠变性(GIM)的严重程度。医生必须全面分析 pCLE 从胃窦、胃体和胃角区域记录的视频片段,以确定患者的病情。然而,由于pCLE显微成像结构的局限性,所检测到的胃部区域无法被实时识别和记录,有可能遗漏潜在的疾病发生区域,不利于后续对病变区域的精确治疗。因此,本文提出了一种内镜视频辅助胃区识别方法(EVIGA),用于实时确定检测到的胃癌病变区域。首先,通过实时检测 pCLE 的工作状态来确定诊断片段的开始时间。然后,根据 pCLE 和内窥镜视频在时间序列上的对应关系截断内窥镜视频片段,以检测胃部区域。为了准确识别 pCLE 检测到的胃区,构建了一个基于探针的共聚焦激光内窥镜诊断区域识别模型(pCLEDAM),其沙漏卷积设计用于单帧特征提取,时间特征敏感提取结构用于空间特征提取。提取的特征图被展开并输入全连接层,对检测到的区域进行分类。为了验证所提出的方法,从一家三甲医院收集了 67 个临床共焦激光内窥镜诊断病例,经审核后最终保留了 500 个视频片段用于数据集构建。实验表明,测试数据集的区域识别准确率达到 96.0%,远高于其他相关算法,实现了对 pCLE 检测区域的准确识别。
{"title":"Endoscopic video aided identification method for gastric area","authors":"Xiangwei Zheng,&nbsp;Dejian Su,&nbsp;Xuanchi Chen,&nbsp;Mingzhe Zhang","doi":"10.1016/j.jksuci.2024.102208","DOIUrl":"10.1016/j.jksuci.2024.102208","url":null,"abstract":"<div><div>Probe-based confocal laser endomicroscopy (pCLE) is a significant diagnostic instrument and is frequently utilized to diagnose the severity of gastric intestinal metaplasia (GIM). The physicians must comprehensively analyze video clips recorded with pCLE from the gastric antrum, gastric body, and gastric angle area to determine the patient’s condition. However, due to the limitations of the pCLE’s microscopic imaging structure, the gastric areas detected cannot be identified and recorded in real time, which may poses a risk of missing potential areas of disease occurrence and is not conducive to the subsequent precise treatment of the lesion area. Therefore, this paper proposes an endoscopic video aided identification method for identifying gastric areas (EVIGA), which are utilized for determining the detected areas of pCLE in real-time. Firstly, the start time of the diagnosis clip is determined by real-time detecting the working states of pCLE. Then, the endoscopic video clip is truncated according to the correspondence between pCLE and endoscopic video in the time sequence for detecting gastric areas. In order to accurately identify pCLE detected gastric areas, a probe-based confocal laser endomicroscopy diagnosis area identification model (pCLEDAM) is constructed with an hourglass convolution designed for single-frame feature extraction and a temporal feature-sensitive extraction structure for spatial feature extraction. The extracted feature maps are unfolded and fed into the fully connected layer to classify the detected areas. To validate the proposed method, 67 clinical confocal laser endomicroscopy diagnosis cases are collected from a tertiary care hospital, and 500 video clips are finally reserved after audited for dataset construction. Experiments show that the accuracy of area identification on the test dataset achieves 96.0% and is much higher than other related algorithms, achieving the accurate identification of pCLE detected areas.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102208"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Energy-efficient resource allocation for UAV-aided full-duplex OFDMA wireless powered IoT communication networks 无人机辅助全双工 OFDMA 无线供电物联网通信网络的高能效资源分配
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-30 DOI: 10.1016/j.jksuci.2024.102225
Tong Wang
The rapid development of wireless-powered Internet of Things (IoT) networks, supported by multiple unmanned aerial vehicles (UAVs) and full-duplex technologies, has opened new avenues for simultaneous data transmission and energy harvesting. In this context, optimizing energy efficiency (EE) is crucial for ensuring sustainable and efficient network operation. This paper proposes a novel approach to EE optimization in multi-UAV-aided wireless-powered IoT networks, focusing on balancing the uplink data transmission rates and total system energy consumption within an orthogonal frequency-division multiple access (OFDMA) framework. This involves formulating the EE optimization problem as a Multi-Objective Optimization Problem (MOOP), consisting of the maximization of the uplink total rate and the minimization of the total system energy consumption, which is then transformed into a Single-Objective Optimization Problem (SOOP) using the Tchebycheff method. To address the non-convex nature of the resulting SOOP, characterized by combinatorial variables and coupled constraints, we developed an iterative algorithm that combines Block Coordinate Descent (BCD) with Successive Convex Approximation (SCA). This algorithm decouples the subcarrier assignment and power control subproblems, incorporates a penalty term to relax integer constraints, and alternates between solving each subproblem until convergence is reached. Simulation results demonstrate that our proposed method outperforms baseline approaches in key performance metrics, highlighting the practical applicability and robustness of our framework for enhancing the efficiency and sustainability of real-world UAV-assisted wireless networks. Our findings provide insights for future research on extending the proposed framework to scenarios involving dynamic UAV mobility, multi-hop communication, and enhanced energy management, thereby supporting the development of next-generation sustainable communication systems.
在多种无人飞行器(UAV)和全双工技术的支持下,无线供电的物联网(IoT)网络发展迅速,为同时进行数据传输和能量采集开辟了新的途径。在这种情况下,优化能源效率(EE)对于确保网络的可持续高效运行至关重要。本文提出了一种在多无人机辅助的无线供电物联网网络中优化能效的新方法,重点是在正交频分多址(OFDMA)框架内平衡上行数据传输速率和系统总能耗。这涉及将 EE 优化问题表述为多目标优化问题(MOOP),包括上行链路总速率最大化和系统总能耗最小化,然后使用 Tchebycheff 方法将其转化为单目标优化问题(SOOP)。为了解决以组合变量和耦合约束为特征的 SOOP 的非凸性质,我们开发了一种结合了块坐标下降 (BCD) 和连续凸逼近 (SCA) 的迭代算法。该算法将子载波分配和功率控制子问题分离开来,加入惩罚项以放松整数约束,并交替解决每个子问题,直至达到收敛。仿真结果表明,我们提出的方法在关键性能指标上优于基准方法,突出了我们的框架在提高现实世界无人机辅助无线网络的效率和可持续性方面的实际适用性和稳健性。我们的研究结果为未来研究提供了启示,有助于将所提出的框架扩展到涉及无人机动态移动性、多跳通信和增强能源管理的场景,从而支持下一代可持续通信系统的开发。
{"title":"Energy-efficient resource allocation for UAV-aided full-duplex OFDMA wireless powered IoT communication networks","authors":"Tong Wang","doi":"10.1016/j.jksuci.2024.102225","DOIUrl":"10.1016/j.jksuci.2024.102225","url":null,"abstract":"<div><div>The rapid development of wireless-powered Internet of Things (IoT) networks, supported by multiple unmanned aerial vehicles (UAVs) and full-duplex technologies, has opened new avenues for simultaneous data transmission and energy harvesting. In this context, optimizing energy efficiency (EE) is crucial for ensuring sustainable and efficient network operation. This paper proposes a novel approach to EE optimization in multi-UAV-aided wireless-powered IoT networks, focusing on balancing the uplink data transmission rates and total system energy consumption within an orthogonal frequency-division multiple access (OFDMA) framework. This involves formulating the EE optimization problem as a Multi-Objective Optimization Problem (MOOP), consisting of the maximization of the uplink total rate and the minimization of the total system energy consumption, which is then transformed into a Single-Objective Optimization Problem (SOOP) using the Tchebycheff method. To address the non-convex nature of the resulting SOOP, characterized by combinatorial variables and coupled constraints, we developed an iterative algorithm that combines Block Coordinate Descent (BCD) with Successive Convex Approximation (SCA). This algorithm decouples the subcarrier assignment and power control subproblems, incorporates a penalty term to relax integer constraints, and alternates between solving each subproblem until convergence is reached. Simulation results demonstrate that our proposed method outperforms baseline approaches in key performance metrics, highlighting the practical applicability and robustness of our framework for enhancing the efficiency and sustainability of real-world UAV-assisted wireless networks. Our findings provide insights for future research on extending the proposed framework to scenarios involving dynamic UAV mobility, multi-hop communication, and enhanced energy management, thereby supporting the development of next-generation sustainable communication systems.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102225"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DNE-YOLO: A method for apple fruit detection in Diverse Natural Environments DNE-YOLO:在多样化自然环境中检测苹果果实的方法
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-21 DOI: 10.1016/j.jksuci.2024.102220
Haitao Wu , Xiaotian Mo , Sijian Wen , Kanglei Wu , Yu Ye , Yongmei Wang , Youhua Zhang
The apple industry, recognized as a pivotal sector in agriculture, increasingly emphasizes the mechanization and intelligent advancement of picking technology. This study innovatively applies a mist simulation algorithm to apple image generation, constructing a dataset of apple images under mixed sunny, cloudy, drizzling and foggy weather conditions called DNE-APPLE. It introduces a lightweight and efficient target detection network called DNE-YOLO. Building upon the YOLOv8 base model, DNE-YOLO incorporates the CBAM attention mechanism and CARAFE up-sampling operator to enhance the focus on apples. Additionally, it utilizes GSConv and the dynamic non-monotonic focusing mechanism loss function WIOU to reduce model parameters and decrease reliance on dataset quality. Extensive experimental results underscore the efficacy of the DNE-YOLO model, which achieves a detection accuracy (precision) of 90.7%, a recall of 88.9%, a mean accuracy (mAP50) of 94.3%, a computational complexity (GFLOPs) of 25.4G, and a parameter count of 10.46M across various environmentally diverse datasets. Compared to YOLOv8, it exhibits superior detection accuracy and robustness in sunny, drizzly, cloudy, and misty environments, making it especially suitable for practical applications such as apple picking for agricultural robots. The code for this model is open source at https://github.com/wuhaitao2178827/DNE-YOLO.
苹果产业作为农业中举足轻重的行业,越来越重视采摘技术的机械化和智能化。本研究创新性地将雾气模拟算法应用于苹果图像生成,构建了一个名为 DNE-APPLE 的晴天、多云、小雨和大雾混合天气条件下的苹果图像数据集。它引入了一种名为 DNE-YOLO 的轻量级高效目标检测网络。在 YOLOv8 基本模型的基础上,DNE-YOLO 加入了 CBAM 注意机制和 CARAFE 上采样算子,以加强对苹果的关注。此外,它还利用 GSConv 和动态非单调聚焦机制损失函数 WIOU 来减少模型参数,降低对数据集质量的依赖。广泛的实验结果证明了 DNE-YOLO 模型的有效性,它在各种不同环境的数据集上实现了 90.7% 的检测准确率(精确度)、88.9% 的召回率、94.3% 的平均准确率(mAP50)、25.4G 的计算复杂度(GFLOPs)和 10.46M 的参数数。与 YOLOv8 相比,它在晴天、小雨、多云和雾霾环境中都表现出了更高的检测精度和鲁棒性,因此特别适合农业机器人采摘苹果等实际应用。该模型的代码开源于 https://github.com/wuhaitao2178827/DNE-YOLO。
{"title":"DNE-YOLO: A method for apple fruit detection in Diverse Natural Environments","authors":"Haitao Wu ,&nbsp;Xiaotian Mo ,&nbsp;Sijian Wen ,&nbsp;Kanglei Wu ,&nbsp;Yu Ye ,&nbsp;Yongmei Wang ,&nbsp;Youhua Zhang","doi":"10.1016/j.jksuci.2024.102220","DOIUrl":"10.1016/j.jksuci.2024.102220","url":null,"abstract":"<div><div>The apple industry, recognized as a pivotal sector in agriculture, increasingly emphasizes the mechanization and intelligent advancement of picking technology. This study innovatively applies a mist simulation algorithm to apple image generation, constructing a dataset of apple images under mixed sunny, cloudy, drizzling and foggy weather conditions called DNE-APPLE. It introduces a lightweight and efficient target detection network called DNE-YOLO. Building upon the YOLOv8 base model, DNE-YOLO incorporates the CBAM attention mechanism and CARAFE up-sampling operator to enhance the focus on apples. Additionally, it utilizes GSConv and the dynamic non-monotonic focusing mechanism loss function WIOU to reduce model parameters and decrease reliance on dataset quality. Extensive experimental results underscore the efficacy of the DNE-YOLO model, which achieves a detection accuracy (precision) of 90.7%, a recall of 88.9%, a mean accuracy (mAP50) of 94.3%, a computational complexity (GFLOPs) of 25.4G, and a parameter count of 10.46M across various environmentally diverse datasets. Compared to YOLOv8, it exhibits superior detection accuracy and robustness in sunny, drizzly, cloudy, and misty environments, making it especially suitable for practical applications such as apple picking for agricultural robots. The code for this model is open source at <span><span>https://github.com/wuhaitao2178827/DNE-YOLO</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102220"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142553295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Echocardiographic mitral valve segmentation model 超声心动图二尖瓣分割模型
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-19 DOI: 10.1016/j.jksuci.2024.102218
Chunxia Liu , Shanshan Dong , Feng Xiong , Luqing Wang , Bolun Li , Hongjun Wang
Segmentation of mitral valve is not only important for clinical diagnosis, but also has far-reaching impact on prevention and prognosis of the disease by experts and doctors. In this paper, the multi-channel cross fusion transformer based U-Net network model (MCCT-UNet) is proposed according to the classical U-Net architecture. First, the jump connection part of MCCT-UNet is designed by using a multi-channel cross-fusion based attention mechanism module (MCCT) instead of the original jump connection, and this module fuses the feature maps from different scales in different stages of the encoder. Second, the optimization of the feature fusion method is proposed in the decoding stage by designing the cross-compression excitation sub-module (C-SENet) to replace the simple feature splicing, and the C-SENet is used to bridge the inconsistency of the semantic hierarchy by effectively combining the deeper information in the encoding stage with the shallower information. This two modules can establish a close connection between the encoder and decoder by exploring multi-scale global contextual information to solve the semantic divide problem, thus it significantly improves the segmentation performance of the network. The experimental results show that the improvement is effective, and the MCCT-UNet model outperforms the other 9 network models. Specifically, the MCCT-UNet achieved a Dice coefficient of 0.8734, an IoU of 0.7854, and an accuracy of 0.9977, demonstrating significant improvements over the compared models.
二尖瓣的分割不仅对临床诊断有重要意义,而且对专家和医生预防和预后疾病也有深远影响。本文根据经典的 U-Net 架构,提出了基于多通道交叉融合变压器的 U-Net 网络模型(MCCT-UNet)。首先,MCCT-UNet 的跳转连接部分采用基于多通道交叉融合的注意力机制模块(MCCT)代替原有的跳转连接,该模块在编码器的不同阶段融合不同尺度的特征图。其次,在解码阶段提出了对特征融合方法的优化,设计了交叉压缩激发子模块(C-SENet)来替代简单的特征拼接,通过 C-SENet 将编码阶段的深层信息与浅层信息有效结合,弥合语义层次的不一致性。这两个模块通过探索多尺度的全局上下文信息,在编码器和解码器之间建立了紧密的联系,从而解决了语义鸿沟问题,显著提高了网络的分割性能。实验结果表明,改进效果显著,MCCT-UNet 模型优于其他 9 个网络模型。具体来说,MCCT-UNet 的骰子系数达到了 0.8734,IoU 达到了 0.7854,准确率达到了 0.9977,与其他模型相比有了显著的提高。
{"title":"Echocardiographic mitral valve segmentation model","authors":"Chunxia Liu ,&nbsp;Shanshan Dong ,&nbsp;Feng Xiong ,&nbsp;Luqing Wang ,&nbsp;Bolun Li ,&nbsp;Hongjun Wang","doi":"10.1016/j.jksuci.2024.102218","DOIUrl":"10.1016/j.jksuci.2024.102218","url":null,"abstract":"<div><div>Segmentation of mitral valve is not only important for clinical diagnosis, but also has far-reaching impact on prevention and prognosis of the disease by experts and doctors. In this paper, the multi-channel cross fusion transformer based U-Net network model (MCCT-UNet) is proposed according to the classical U-Net architecture. First, the jump connection part of MCCT-UNet is designed by using a multi-channel cross-fusion based attention mechanism module (MCCT) instead of the original jump connection, and this module fuses the feature maps from different scales in different stages of the encoder. Second, the optimization of the feature fusion method is proposed in the decoding stage by designing the cross-compression excitation sub-module (C-SENet) to replace the simple feature splicing, and the C-SENet is used to bridge the inconsistency of the semantic hierarchy by effectively combining the deeper information in the encoding stage with the shallower information. This two modules can establish a close connection between the encoder and decoder by exploring multi-scale global contextual information to solve the semantic divide problem, thus it significantly improves the segmentation performance of the network. The experimental results show that the improvement is effective, and the MCCT-UNet model outperforms the other 9 network models. Specifically, the MCCT-UNet achieved a Dice coefficient of 0.8734, an IoU of 0.7854, and an accuracy of 0.9977, demonstrating significant improvements over the compared models.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102218"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General secure encryption algorithm for separable reversible data hiding in encrypted domain 加密域中可分离可逆数据隐藏的通用安全加密算法
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-23 DOI: 10.1016/j.jksuci.2024.102217
Hongli Wan, Minqing Zhang, Yan Ke, Zongbao Jiang, Fuqiang Di
The separable reversible data hiding in encrypted domain (RDH-ED) algorithm leaves out the embedding space for the information before or after encryption and makes the operation of extracting the information and restoring the image not interfere with each other. The encryption method employed not only affects the embedding space of the information and separability, but is more crucial for ensuring security. However, the commonly used XOR, scram-bling or combination methods fall short in security, especially against known plaintext attack (KPA). Therefore, in order to improve the security of RDH-ED and be widely applicable, this paper proposes a high-security RDH-ED encryption algorithm that can be used to reserve space before encryption (RSBE) and free space after encryption (FSAE). During encryption, the image undergoes block XOR, global intra-block bit-plane scrambling (GIBS) and inter-block scrambling sequentially. The GIBS key is created through chaotic mapping transformation. Subsequently, two RDH-ED algorithms based on this encryption are proposed. Experimental results indicate that the algorithm outlined in this paper maintains consistent key communication traffic post key conversion. Additionally, its computational complexity remains at a constant level, satisfying separability criteria, and is suitable for both RSBE and FSAE methods. Simultaneously, while satisfying the security of a single encryption technique, we have expanded the key space to 28Np×Np!×8!Np, enabling resilience against various existing attack methods. Notably, particularly in KPA testing scenarios, the average decryption success rate is a mere 0.0067% and 0.0045%, highlighting its exceptional security. Overall, this virtually unbreakable system significantly enhances image security while preserving an appropriate embedding capacity.
加密域中的可分离可逆数据隐藏(RDH-ED)算法在加密前后都留出了信息的嵌入空间,使提取信息和还原图像的操作互不干扰。所采用的加密方法不仅会影响信息的嵌入空间和可分离性,而且对确保安全性更为关键。然而,常用的 XOR、加扰或组合方法在安全性方面存在不足,尤其是在应对已知明文攻击(KPA)时。因此,为了提高 RDH-ED 的安全性和广泛适用性,本文提出了一种可用于加密前预留空间(RSBE)和加密后释放空间(FSAE)的高安全性 RDH-ED 加密算法。在加密过程中,图像依次经过块 XOR、全局块内位平面加扰(GIBS)和块间加扰。GIBS 密钥通过混沌映射变换创建。随后,提出了两种基于这种加密的 RDH-ED 算法。实验结果表明,本文概述的算法能在密钥转换后保持一致的密钥通信流量。此外,该算法的计算复杂度保持在恒定水平,满足可分性标准,同时适用于 RSBE 和 FSAE 方法。同时,在满足单一加密技术安全性的同时,我们还将密钥空间扩展到了 28Np×Np!×8!Np,从而能够抵御现有的各种攻击方法。值得注意的是,特别是在 KPA 测试场景中,平均解密成功率仅为 0.0067% 和 0.0045%,彰显了其卓越的安全性。总之,这个几乎牢不可破的系统在保持适当嵌入容量的同时,大大增强了图像的安全性。
{"title":"General secure encryption algorithm for separable reversible data hiding in encrypted domain","authors":"Hongli Wan,&nbsp;Minqing Zhang,&nbsp;Yan Ke,&nbsp;Zongbao Jiang,&nbsp;Fuqiang Di","doi":"10.1016/j.jksuci.2024.102217","DOIUrl":"10.1016/j.jksuci.2024.102217","url":null,"abstract":"<div><div>The separable reversible data hiding in encrypted domain (RDH-ED) algorithm leaves out the embedding space for the information before or after encryption and makes the operation of extracting the information and restoring the image not interfere with each other. The encryption method employed not only affects the embedding space of the information and separability, but is more crucial for ensuring security. However, the commonly used XOR, scram-bling or combination methods fall short in security, especially against known plaintext attack (KPA). Therefore, in order to improve the security of RDH-ED and be widely applicable, this paper proposes a high-security RDH-ED encryption algorithm that can be used to reserve space before encryption (RSBE) and free space after encryption (FSAE). During encryption, the image undergoes block XOR, global intra-block bit-plane scrambling (GIBS) and inter-block scrambling sequentially. The GIBS key is created through chaotic mapping transformation. Subsequently, two RDH-ED algorithms based on this encryption are proposed. Experimental results indicate that the algorithm outlined in this paper maintains consistent key communication traffic post key conversion. Additionally, its computational complexity remains at a constant level, satisfying separability criteria, and is suitable for both RSBE and FSAE methods. Simultaneously, while satisfying the security of a single encryption technique, we have expanded the key space to 2<span><math><mrow><msup><mrow></mrow><mrow><mn>8</mn><mi>N</mi><mi>p</mi></mrow></msup><mo>×</mo><mi>N</mi><mi>p</mi><mo>!</mo><mo>×</mo><mn>8</mn><msup><mrow><mo>!</mo></mrow><mrow><mi>N</mi><mi>p</mi></mrow></msup></mrow></math></span>, enabling resilience against various existing attack methods. Notably, particularly in KPA testing scenarios, the average decryption success rate is a mere 0.0067% and 0.0045%, highlighting its exceptional security. Overall, this virtually unbreakable system significantly enhances image security while preserving an appropriate embedding capacity.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102217"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time segmentation and classification of whole-slide images for tumor biomarker scoring 用于肿瘤生物标记物评分的全切片图像实时分割和分类
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-05 DOI: 10.1016/j.jksuci.2024.102204
Md Jahid Hasan , Wan Siti Halimatul Munirah Wan Ahmad , Mohammad Faizal Ahmad Fauzi , Jenny Tung Hiong Lee , See Yee Khor , Lai Meng Looi , Fazly Salleh Abas , Afzan Adam , Elaine Wan Ling Chan
Histopathology image segmentation and classification are essential for diagnosing and treating breast cancer. This study introduced a highly accurate segmentation and classification for histopathology images using a single architecture. We utilized the famous segmentation architectures, SegNet and U-Net, and modified the decoder to attach ResNet, VGG and DenseNet to perform classification tasks. These hybrid models are integrated with Stardist as the backbone, and implemented in a real-time pathologist workflow with a graphical user interface. These models were trained and tested offline using the ER-IHC-stained private and H&E-stained public datasets (MoNuSeg). For real-time evaluation, the proposed model was evaluated using PR-IHC-stained glass slides. It achieved the highest segmentation pixel-based F1-score of 0.902 and 0.903 for private and public datasets respectively, and a classification-based F1-score of 0.833 for private dataset. The experiment shows the robustness of our method where a model trained on ER-IHC dataset able to perform well on real-time microscopy of PR-IHC slides on both 20x and 40x magnification. This will help the pathologists with a quick decision-making process.
组织病理学图像分割和分类对于诊断和治疗乳腺癌至关重要。本研究采用单一架构对组织病理学图像进行高精度分割和分类。我们利用了著名的分割架构 SegNet 和 U-Net,并修改了解码器以附加 ResNet、VGG 和 DenseNet 来执行分类任务。这些混合模型与作为骨干的 Stardist 集成,并通过图形用户界面在病理学家实时工作流程中实施。使用 ER-IHC 染色私人数据集和 H&E 染色公共数据集 (MoNuSeg) 对这些模型进行了离线训练和测试。为了进行实时评估,使用 PR-IHC 染色玻璃切片对所提出的模型进行了评估。在私人数据集和公共数据集上,基于像素的分割 F1 分数分别为 0.902 和 0.903,在私人数据集上,基于分类的 F1 分数为 0.833。实验显示了我们方法的鲁棒性,在 ER-IHC 数据集上训练的模型能够在 20 倍和 40 倍放大率的 PR-IHC 切片实时显微镜检查中表现良好。这将有助于病理学家快速做出决策。
{"title":"Real-time segmentation and classification of whole-slide images for tumor biomarker scoring","authors":"Md Jahid Hasan ,&nbsp;Wan Siti Halimatul Munirah Wan Ahmad ,&nbsp;Mohammad Faizal Ahmad Fauzi ,&nbsp;Jenny Tung Hiong Lee ,&nbsp;See Yee Khor ,&nbsp;Lai Meng Looi ,&nbsp;Fazly Salleh Abas ,&nbsp;Afzan Adam ,&nbsp;Elaine Wan Ling Chan","doi":"10.1016/j.jksuci.2024.102204","DOIUrl":"10.1016/j.jksuci.2024.102204","url":null,"abstract":"<div><div>Histopathology image segmentation and classification are essential for diagnosing and treating breast cancer. This study introduced a highly accurate segmentation and classification for histopathology images using a single architecture. We utilized the famous segmentation architectures, SegNet and U-Net, and modified the decoder to attach ResNet, VGG and DenseNet to perform classification tasks. These hybrid models are integrated with Stardist as the backbone, and implemented in a real-time pathologist workflow with a graphical user interface. These models were trained and tested offline using the ER-IHC-stained private and H&amp;E-stained public datasets (MoNuSeg). For real-time evaluation, the proposed model was evaluated using PR-IHC-stained glass slides. It achieved the highest segmentation pixel-based F1-score of 0.902 and 0.903 for private and public datasets respectively, and a classification-based F1-score of 0.833 for private dataset. The experiment shows the robustness of our method where a model trained on ER-IHC dataset able to perform well on real-time microscopy of PR-IHC slides on both 20x and 40x magnification. This will help the pathologists with a quick decision-making process.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102204"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On-chain zero-knowledge machine learning: An overview and comparison 链上零知识机器学习:概述与比较
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-05 DOI: 10.1016/j.jksuci.2024.102207
Vid Keršič, Sašo Karakatič, Muhamed Turkanović
Zero-knowledge proofs introduce a mechanism to prove that certain computations were performed without revealing any underlying information and are used commonly in blockchain-based decentralized apps (dapps). This cryptographic technique addresses trust issues prevalent in blockchain applications, and has now been adapted for machine learning (ML) services, known as Zero-Knowledge Machine Learning (ZKML). By leveraging the distributed nature of blockchains, this approach enhances the trustworthiness of ML deployments, and opens up new possibilities for privacy-preserving and robust ML applications within dapps. This paper provides a comprehensive overview of the ZKML process and its critical components for verifying ML services on-chain. Furthermore, this paper explores how blockchain technology and smart contracts can offer verifiable, trustless proof that a specific ML model has been used correctly to perform inference, all without relying on a single trusted entity. Additionally, the paper compares and reviews existing frameworks for implementing ZKML in dapps, serving as a reference point for researchers interested in this emerging field.
零知识证明引入了一种机制,用于证明某些计算是在不透露任何底层信息的情况下进行的,常用于基于区块链的去中心化应用程序(dapps)。这种加密技术解决了区块链应用中普遍存在的信任问题,现在已被用于机器学习(ML)服务,即零知识机器学习(ZKML)。通过利用区块链的分布式特性,这种方法提高了 ML 部署的可信度,并为 dapps 中保护隐私和稳健的 ML 应用开辟了新的可能性。本文全面概述了 ZKML 流程及其用于验证链上 ML 服务的关键组件。此外,本文还探讨了区块链技术和智能合约如何提供可验证的无信任证明,证明特定的 ML 模型已被正确用于执行推理,而无需依赖单一的可信实体。此外,本文还比较和回顾了在 dapp 中实施 ZKML 的现有框架,为对这一新兴领域感兴趣的研究人员提供了参考。
{"title":"On-chain zero-knowledge machine learning: An overview and comparison","authors":"Vid Keršič,&nbsp;Sašo Karakatič,&nbsp;Muhamed Turkanović","doi":"10.1016/j.jksuci.2024.102207","DOIUrl":"10.1016/j.jksuci.2024.102207","url":null,"abstract":"<div><div>Zero-knowledge proofs introduce a mechanism to prove that certain computations were performed without revealing any underlying information and are used commonly in blockchain-based decentralized apps (dapps). This cryptographic technique addresses trust issues prevalent in blockchain applications, and has now been adapted for machine learning (ML) services, known as Zero-Knowledge Machine Learning (ZKML). By leveraging the distributed nature of blockchains, this approach enhances the trustworthiness of ML deployments, and opens up new possibilities for privacy-preserving and robust ML applications within dapps. This paper provides a comprehensive overview of the ZKML process and its critical components for verifying ML services on-chain. Furthermore, this paper explores how blockchain technology and smart contracts can offer verifiable, trustless proof that a specific ML model has been used correctly to perform inference, all without relying on a single trusted entity. Additionally, the paper compares and reviews existing frameworks for implementing ZKML in dapps, serving as a reference point for researchers interested in this emerging field.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102207"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IPSRM: An intent perceived sequential recommendation model IPSRM:意图感知顺序推荐模型
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-11-01 Epub Date: 2024-10-05 DOI: 10.1016/j.jksuci.2024.102206
Chaoran Wang , Mingyang Wang , Xianjie Wang , Yingchun Tan

Objectives:

Sequential recommendation aims to recommend items that are relevant to users’ interests based on their existing interaction sequences. Current models lack in capturing users’ latent intentions and do not sufficiently consider sequence information during the modeling of users and items. Additionally, noise in user interaction sequences can affect the model’s optimization process.

Methods:

This paper introduces an intent perceived sequential recommendation model (IPSRM). IPSRM employs the generalized expectation–maximization (EM) framework, alternating between learning sequence representations and optimizing the model to better capture the underlying intentions of user interactions. Specifically, IPSRM maps unlabeled behavioral sequences into frequency domain filtering and random Gaussian distribution space. These mappings reduce the impact of noise and improve the learning of user behavior representations. Through clustering process, IPSRM captures users’ potential interaction intentions and incorporates them as one of the supervisions into the contrastive self-supervised learning process to guide the optimization process.

Results:

Experimental results on four standard datasets demonstrate the superiority of IPSRM. Comparative experiments also verify that IPSRM exhibits strong robustness under cold start and noisy interaction conditions.

Conclusions:

Capturing latent user intentions, integrating intention-based supervision into model optimization, and mitigating noise in sequential modeling significantly enhance the performance of sequential recommendation systems.
目标:序列推荐旨在根据用户现有的交互序列,推荐与用户兴趣相关的项目。目前的模型无法捕捉用户的潜在意图,在用户和项目建模过程中也没有充分考虑序列信息。此外,用户互动序列中的噪声也会影响模型的优化过程。方法:本文介绍了一种意图感知序列推荐模型(IPSRM)。IPSRM采用广义期望最大化(EM)框架,在学习序列表示和优化模型之间交替进行,以更好地捕捉用户交互的潜在意图。具体来说,IPSRM 将未标记的行为序列映射到频域滤波和随机高斯分布空间中。这些映射降低了噪声的影响,提高了用户行为表征的学习能力。通过聚类过程,IPSRM 捕捉到了用户潜在的交互意图,并将其作为监督之一纳入对比自监督学习过程,以指导优化过程。结果:在四个标准数据集上的实验结果证明了 IPSRM 的优越性。结论:捕捉潜在用户意图、将基于意图的监督整合到模型优化中,以及在顺序建模中减少噪声,都能显著提高顺序推荐系统的性能。
{"title":"IPSRM: An intent perceived sequential recommendation model","authors":"Chaoran Wang ,&nbsp;Mingyang Wang ,&nbsp;Xianjie Wang ,&nbsp;Yingchun Tan","doi":"10.1016/j.jksuci.2024.102206","DOIUrl":"10.1016/j.jksuci.2024.102206","url":null,"abstract":"<div><h3>Objectives:</h3><div>Sequential recommendation aims to recommend items that are relevant to users’ interests based on their existing interaction sequences. Current models lack in capturing users’ latent intentions and do not sufficiently consider sequence information during the modeling of users and items. Additionally, noise in user interaction sequences can affect the model’s optimization process.</div></div><div><h3>Methods:</h3><div>This paper introduces an intent perceived sequential recommendation model (IPSRM). IPSRM employs the generalized expectation–maximization (EM) framework, alternating between learning sequence representations and optimizing the model to better capture the underlying intentions of user interactions. Specifically, IPSRM maps unlabeled behavioral sequences into frequency domain filtering and random Gaussian distribution space. These mappings reduce the impact of noise and improve the learning of user behavior representations. Through clustering process, IPSRM captures users’ potential interaction intentions and incorporates them as one of the supervisions into the contrastive self-supervised learning process to guide the optimization process.</div></div><div><h3>Results:</h3><div>Experimental results on four standard datasets demonstrate the superiority of IPSRM. Comparative experiments also verify that IPSRM exhibits strong robustness under cold start and noisy interaction conditions.</div></div><div><h3>Conclusions:</h3><div>Capturing latent user intentions, integrating intention-based supervision into model optimization, and mitigating noise in sequential modeling significantly enhance the performance of sequential recommendation systems.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102206"},"PeriodicalIF":5.2,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of King Saud University-Computer and Information Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1