首页 > 最新文献

Journal of King Saud University-Computer and Information Sciences最新文献

英文 中文
Study on data storage and verification methods based on improved Merkle mountain range in IoT scenarios 物联网场景下基于改进型梅克尔山脉的数据存储与验证方法研究
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1016/j.jksuci.2024.102117
Chufeng Liang , Junlang Zhang , Shansi Ma , Yu Zhou , Zhicheng Hong , Jiawen Fang , Yongzhang Zhou , Hua Tang

In the context of the rapid development of Internet of Things (IoT) technology and the extensive proliferation of the global Internet, the authenticity of data has become a focal point of societal demand. It plays a decisive role in enhancing the quality of decision-making and operational efficiency. However, the storage and authenticity verification of large-scale IoT real-time data present unprecedented technical challenges. Faced with the inherent data security risks of traditional centralized cloud storage, blockchain technology reveals its unique potential for solutions with its inherent immutability and decentralization. Nevertheless, current blockchain-based data storage solutions are still restricted by high costs and inefficiency. To address these challenges, this paper innovatively proposes the BI-TSFID framework, which leverages the benefits of Ethereum and IPFS and optimizes the Merkle Tree structure and verification mechanisms. The BI-TSFID framework adopts a strategy of on-chain data summary storage and off-chain computation. This approach provides IoT with efficient and reliable data storage, reduces operational costs, and simplifies the verification process. This research has improved the data computation efficiency by refining the structure of the Merkle Tree and analyzed its optimal branch number. Additionally, the study introduces a sampling-based data integrity verification method that significantly reduces resource consumption during the verification process. Experimental results show that the solutions proposed in this paper effectively enhance the efficiency and security of IoT data management and provide valuable guidance for the theory and practice of real-time data storage and verification, further promoting the development and innovation in the related technological fields.

在物联网(IoT)技术快速发展和全球互联网广泛普及的背景下,数据的真实性已成为社会需求的焦点。它对提高决策质量和运行效率起着决定性作用。然而,大规模物联网实时数据的存储和真实性验证面临着前所未有的技术挑战。面对传统集中式云存储固有的数据安全风险,区块链技术凭借其固有的不变性和去中心化特性,展现出其独特的解决方案潜力。然而,目前基于区块链的数据存储解决方案仍受到高成本和低效率的限制。为了应对这些挑战,本文创新性地提出了 BI-TSFID 框架,该框架充分利用了以太坊和 IPFS 的优势,并优化了梅克尔树结构和验证机制。BI-TSFID 框架采用链上数据汇总存储和链下计算的策略。这种方法为物联网提供了高效可靠的数据存储,降低了运营成本,简化了验证流程。本研究通过完善梅克尔树的结构并分析其最佳分支数,提高了数据计算效率。此外,研究还引入了一种基于抽样的数据完整性验证方法,大大减少了验证过程中的资源消耗。实验结果表明,本文提出的解决方案有效提升了物联网数据管理的效率和安全性,为实时数据存储与验证的理论与实践提供了有价值的指导,进一步推动了相关技术领域的发展与创新。
{"title":"Study on data storage and verification methods based on improved Merkle mountain range in IoT scenarios","authors":"Chufeng Liang ,&nbsp;Junlang Zhang ,&nbsp;Shansi Ma ,&nbsp;Yu Zhou ,&nbsp;Zhicheng Hong ,&nbsp;Jiawen Fang ,&nbsp;Yongzhang Zhou ,&nbsp;Hua Tang","doi":"10.1016/j.jksuci.2024.102117","DOIUrl":"https://doi.org/10.1016/j.jksuci.2024.102117","url":null,"abstract":"<div><p>In the context of the rapid development of Internet of Things (IoT) technology and the extensive proliferation of the global Internet, the authenticity of data has become a focal point of societal demand. It plays a decisive role in enhancing the quality of decision-making and operational efficiency. However, the storage and authenticity verification of large-scale IoT real-time data present unprecedented technical challenges. Faced with the inherent data security risks of traditional centralized cloud storage, blockchain technology reveals its unique potential for solutions with its inherent immutability and decentralization. Nevertheless, current blockchain-based data storage solutions are still restricted by high costs and inefficiency. To address these challenges, this paper innovatively proposes the BI-TSFID framework, which leverages the benefits of Ethereum and IPFS and optimizes the Merkle Tree structure and verification mechanisms. The BI-TSFID framework adopts a strategy of on-chain data summary storage and off-chain computation. This approach provides IoT with efficient and reliable data storage, reduces operational costs, and simplifies the verification process. This research has improved the data computation efficiency by refining the structure of the Merkle Tree and analyzed its optimal branch number. Additionally, the study introduces a sampling-based data integrity verification method that significantly reduces resource consumption during the verification process. Experimental results show that the solutions proposed in this paper effectively enhance the efficiency and security of IoT data management and provide valuable guidance for the theory and practice of real-time data storage and verification, further promoting the development and innovation in the related technological fields.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002064/pdfft?md5=3f23e4908d659381b29ba5d11bc7d783&pid=1-s2.0-S1319157824002064-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141607596","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusion of infrared and visible images via multi-layer convolutional sparse representation 通过多层卷积稀疏表示法融合红外和可见光图像
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1016/j.jksuci.2024.102090
Zhouyu Zhang , Chenyuan He , Hai Wang , Yingfeng Cai , Long Chen , Zhihua Gan , Fenghua Huang , Yiqun Zhang

Infrared and visible image fusion is an effective solution for image quality enhancement. However, conventional fusion models require the decomposition of source images into image blocks, which disrupts the original structure of the images, leading to the loss of detail in the fused images and making the fusion results highly sensitive to matching errors. This paper employs Convolutional Sparse Representation (CSR) to perform global feature transformation on the source images, overcoming the drawbacks of traditional fusion models that rely on image decomposition. Inspired by neural networks, a multi-layer CSR model is proposed, which involves five layers in a forward-feeding manner: two CSR layers acquiring sparse coefficient maps, one fusion layer combining sparse maps, and two reconstruction layers for image recovery. The dataset used in this paper comprises infrared and visible images selected from public dataset, as well as registered images collected by an actual Unmanned Aerial Vehicle (UAV). The source images contain ground targets, marine targets, and natural landscapes. To validate the effectiveness of the proposed image fusion model in this paper, comparative analysis is conducted with state-of-the-art (SOTA) algorithms. Experimental results demonstrate that the proposed fusion model outperforms other state-of-the-art methods by at least 10% in SF, EN, MI and QAB/F fusion metrics in most image fusion cases, thereby affirming its favorable performance.

红外和可见光图像融合是提高图像质量的有效解决方案。然而,传统的融合模型需要将源图像分解成图像块,这破坏了图像的原始结构,导致融合后的图像细节丢失,使融合结果对匹配误差高度敏感。本文采用卷积稀疏表示法(CSR)对源图像进行全局特征变换,克服了传统融合模型依赖图像分解的缺点。受神经网络的启发,我们提出了一种多层 CSR 模型,它以前馈方式包含五个层:两个获取稀疏系数图的 CSR 层,一个结合稀疏图的融合层,以及两个用于图像复原的重建层。本文使用的数据集包括从公共数据集中选取的红外和可见光图像,以及实际无人飞行器(UAV)采集的注册图像。源图像包括地面目标、海洋目标和自然景观。为了验证本文提出的图像融合模型的有效性,我们与最先进的(SOTA)算法进行了对比分析。实验结果表明,在大多数图像融合情况下,所提出的融合模型在 SF、EN、MI 和 QAB/F 融合指标上至少比其他先进方法高出 10%,从而肯定了其良好的性能。
{"title":"Fusion of infrared and visible images via multi-layer convolutional sparse representation","authors":"Zhouyu Zhang ,&nbsp;Chenyuan He ,&nbsp;Hai Wang ,&nbsp;Yingfeng Cai ,&nbsp;Long Chen ,&nbsp;Zhihua Gan ,&nbsp;Fenghua Huang ,&nbsp;Yiqun Zhang","doi":"10.1016/j.jksuci.2024.102090","DOIUrl":"10.1016/j.jksuci.2024.102090","url":null,"abstract":"<div><p>Infrared and visible image fusion is an effective solution for image quality enhancement. However, conventional fusion models require the decomposition of source images into image blocks, which disrupts the original structure of the images, leading to the loss of detail in the fused images and making the fusion results highly sensitive to matching errors. This paper employs Convolutional Sparse Representation (CSR) to perform global feature transformation on the source images, overcoming the drawbacks of traditional fusion models that rely on image decomposition. Inspired by neural networks, a multi-layer CSR model is proposed, which involves five layers in a forward-feeding manner: two CSR layers acquiring sparse coefficient maps, one fusion layer combining sparse maps, and two reconstruction layers for image recovery. The dataset used in this paper comprises infrared and visible images selected from public dataset, as well as registered images collected by an actual Unmanned Aerial Vehicle (UAV). The source images contain ground targets, marine targets, and natural landscapes. To validate the effectiveness of the proposed image fusion model in this paper, comparative analysis is conducted with state-of-the-art (SOTA) algorithms. Experimental results demonstrate that the proposed fusion model outperforms other state-of-the-art methods by at least 10% in SF, EN, MI and <span><math><msup><mrow><mi>Q</mi></mrow><mrow><mi>A</mi><mi>B</mi><mo>/</mo><mi>F</mi></mrow></msup></math></span> fusion metrics in most image fusion cases, thereby affirming its favorable performance.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824001794/pdfft?md5=519b5bf350ebfdc5c76e12245ab0600b&pid=1-s2.0-S1319157824001794-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141391228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detection of misbehaving individuals in social networks using overlapping communities and machine learning 利用重叠社区和机器学习检测社交网络中的不当行为个体
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1016/j.jksuci.2024.102110
Wejdan Alshlahy , Delel Rhouma

Detecting misbehavior in social networks is essential for maintaining trust and reliability in online communities. Traditional methods of identification often rely on individual attributes or structural network properties, which may overlook subtle or complex misbehavior patterns. This paper introduces a novel approach called OCMLMD that leverages network overlapping community structure and machine learning techniques to detect misbehavior. Our method combines graph-based analyses of network topology with state-of-theart machine learning algorithms to identify suspicious behavior indicative of misbehavior. Specifically, we target nodes that belong to multiple communities or exhibit weak connections within their community, utilizing a novel metric for selecting overlapping nodes. Additionally, we develop a machine learning model trained on relevant attributes extracted from social network data to detect misbehavior accurately. Extensive experiments on synthetic and real-world social network datasets demonstrate the superior performance of OCMLMD compared to baseline methods. Overall, our proposed approach offers a promising solution to the challenge of detecting misbehavior in social networks.

检测社交网络中的不当行为对于维护网络社区的信任和可靠性至关重要。传统的识别方法通常依赖于个人属性或网络结构属性,这可能会忽略细微或复杂的不当行为模式。本文介绍了一种名为 OCMLMD 的新方法,该方法利用网络重叠社区结构和机器学习技术来检测不当行为。我们的方法将基于图的网络拓扑分析与先进的机器学习算法相结合,以识别表明不当行为的可疑行为。具体来说,我们利用一种用于选择重叠节点的新指标,锁定属于多个社区或在其社区内表现出弱连接的节点。此外,我们还开发了一种基于从社交网络数据中提取的相关属性进行训练的机器学习模型,以准确检测不当行为。在合成和真实世界社交网络数据集上进行的大量实验证明,与基线方法相比,OCMLMD 的性能更加优越。总之,我们提出的方法为检测社交网络中的不当行为这一挑战提供了一种前景广阔的解决方案。
{"title":"Detection of misbehaving individuals in social networks using overlapping communities and machine learning","authors":"Wejdan Alshlahy ,&nbsp;Delel Rhouma","doi":"10.1016/j.jksuci.2024.102110","DOIUrl":"https://doi.org/10.1016/j.jksuci.2024.102110","url":null,"abstract":"<div><p>Detecting misbehavior in social networks is essential for maintaining trust and reliability in online communities. Traditional methods of identification often rely on individual attributes or structural network properties, which may overlook subtle or complex misbehavior patterns. This paper introduces a novel approach called OCMLMD that leverages network overlapping community structure and machine learning techniques to detect misbehavior. Our method combines graph-based analyses of network topology with state-of-theart machine learning algorithms to identify suspicious behavior indicative of misbehavior. Specifically, we target nodes that belong to multiple communities or exhibit weak connections within their community, utilizing a novel metric for selecting overlapping nodes. Additionally, we develop a machine learning model trained on relevant attributes extracted from social network data to detect misbehavior accurately. Extensive experiments on synthetic and real-world social network datasets demonstrate the superior performance of OCMLMD compared to baseline methods. Overall, our proposed approach offers a promising solution to the challenge of detecting misbehavior in social networks.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S131915782400199X/pdfft?md5=05f3b297c94f4bf78520d4c75716b31c&pid=1-s2.0-S131915782400199X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141595580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LightSGM: Local feature matching with lightweight seeded LightSGM:使用轻量级种子算法进行局部特征匹配
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1016/j.jksuci.2024.102095
Shuai Feng , Huaming Qian , Huilin wang , Wenna Wang

Addressing the quintessential challenge of local feature matching in computer vision, this study introduces a novel fast sparse seed graph structure named LightSGM. This structure aims to refine the characterization of graph features while minimizing superfluous connections. Initially, a subset of high-quality seed feature points is curated using a confidence filter. Subsequently, keypoint features are assimilated into this seed set via graph pooling, and the composite features are further processed through a memory and computation-efficient seed transformer to capture rich contextual information about the keypoints. The seed feature points are then relayed back to the original keypoints using an inverse process known as graph unpooling. The paper also introduce an adaptive mechanism to determine the optimal number of model layers based on the intricacy of matching image pairs. A Matching Point Prediction Header is employed to extract the final set of matching points. Through extensive experimentation on image matching and position estimation, LightSGM has demonstrated its prowess in delivering competitive matching accuracy while maintaining a balance with real-time processing capabilities.

为了应对计算机视觉中局部特征匹配的典型挑战,本研究引入了一种名为 LightSGM 的新型快速稀疏种子图结构。该结构旨在完善图特征的表征,同时尽量减少多余的连接。首先,使用置信度过滤器对高质量种子特征点子集进行策划。随后,通过图池将关键点特征同化到该种子集中,并通过内存和计算效率高的种子转换器进一步处理复合特征,以捕捉关键点的丰富上下文信息。然后,种子特征点通过一个称为图解池的逆过程被转回原始关键点。本文还引入了一种自适应机制,可根据匹配图像对的复杂程度确定最佳模型层数。匹配点预测头用于提取最终的匹配点集。通过对图像匹配和位置估计的大量实验,LightSGM 证明了它在提供有竞争力的匹配精度的同时,还能保持实时处理能力的平衡。
{"title":"LightSGM: Local feature matching with lightweight seeded","authors":"Shuai Feng ,&nbsp;Huaming Qian ,&nbsp;Huilin wang ,&nbsp;Wenna Wang","doi":"10.1016/j.jksuci.2024.102095","DOIUrl":"https://doi.org/10.1016/j.jksuci.2024.102095","url":null,"abstract":"<div><p>Addressing the quintessential challenge of local feature matching in computer vision, this study introduces a novel fast sparse seed graph structure named LightSGM. This structure aims to refine the characterization of graph features while minimizing superfluous connections. Initially, a subset of high-quality seed feature points is curated using a confidence filter. Subsequently, keypoint features are assimilated into this seed set via graph pooling, and the composite features are further processed through a memory and computation-efficient seed transformer to capture rich contextual information about the keypoints. The seed feature points are then relayed back to the original keypoints using an inverse process known as graph unpooling. The paper also introduce an adaptive mechanism to determine the optimal number of model layers based on the intricacy of matching image pairs. A Matching Point Prediction Header is employed to extract the final set of matching points. Through extensive experimentation on image matching and position estimation, LightSGM has demonstrated its prowess in delivering competitive matching accuracy while maintaining a balance with real-time processing capabilities.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824001848/pdfft?md5=3cec11dde120649dfc35bfb8595a9460&pid=1-s2.0-S1319157824001848-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141595579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved YOLOv8 algorithms for small object detection in aerial imagery 改进 YOLOv8 算法,用于航空图像中的小物体检测
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1016/j.jksuci.2024.102113
Fei Feng, Yu Hu, Weipeng Li, Feiyan Yang

In drone aerial target detection tasks, a high proportion of small targets and complex backgrounds often lead to false positives and missed detections, resulting in low detection accuracy. To improve the accuracy of the detection of small targets, this study proposes two improved models based on YOLOv8s, named IMCMD_YOLOv8_small and IMCMD_YOLOv8_large. Each model accommodates different application scenarios. First, the network structure was optimized by removing the backbone P5 layer used to detect large targets and merging the P4, P3, and P2 layers, which are better suited for detecting medium and small targets; P3 and P2 serve as detection heads to focus more on small targets. Subsequently, the coordinate attention mechanism is integrated into the backbone’s C2f, to create a C2f_CA module that enhances the model’ s focus on key information and secures a richer flow of gradient information. Subsequently, a multiscale attention feature fusion module was designed to merge the shallow and deep features. Finally, a Dynamic Head was introduced to unify the perception of scale, space, and tasks, further enhancing the detection capability for small targets. Experimental results on the VisDrone2019 dataset demonstrated that, compared with YOLOv8s, IMCMD_YOLOv8_small achieved improvements of 7.7% and 5.1% in [email protected] and [email protected]:0.95, respectively, with a 73.0% reduction in the parameter count. The IMCMD_YOLOv8_large model showed even more significant improvements in these metrics, reaching 10.8% and 7.3%, respectively, with a 47.7% reduction in the parameter count, displaying superior performance in small target detection tasks. The improved models not only enhanced the detection accuracy but also achieved model lightweighting, thereby proving the effectiveness of the improvement strategies and showcasing superior performance compared with other classic models.

在无人机空中目标检测任务中,小目标和复杂背景所占比例较高,往往会导致误报和漏检,从而降低检测精度。为了提高小型目标的检测精度,本研究提出了两个基于 YOLOv8 的改进模型,分别命名为 IMCMD_YOLOv8_small 和 IMCMD_YOLOv8_large。每个模型都能适应不同的应用场景。首先,对网络结构进行了优化,删除了用于检测大型目标的主干 P5 层,合并了更适合检测中小型目标的 P4、P3 和 P2 层;P3 和 P2 作为检测头,更专注于小型目标。随后,将协调注意力机制整合到主干的 C2f 中,创建 C2f_CA 模块,以加强模型对关键信息的关注,并确保更丰富的梯度信息流。随后,设计了一个多尺度注意力特征融合模块,以融合浅层和深层特征。最后,还引入了动态头,以统一对规模、空间和任务的感知,进一步提高对小型目标的检测能力。在VisDrone2019数据集上的实验结果表明,与YOLOv8s相比,IMCMD_YOLOv8_small在[email protected]和[email protected]:0.95方面分别提高了7.7%和5.1%,参数数量减少了73.0%。IMCMD_YOLOv8_large 模型在这些指标上的改进更为显著,分别达到了 10.8% 和 7.3%,参数数量减少了 47.7%,在小型目标检测任务中表现出了卓越的性能。改进后的模型不仅提高了检测精度,还实现了模型轻量化,从而证明了改进策略的有效性,并展示了与其他经典模型相比的卓越性能。
{"title":"Improved YOLOv8 algorithms for small object detection in aerial imagery","authors":"Fei Feng,&nbsp;Yu Hu,&nbsp;Weipeng Li,&nbsp;Feiyan Yang","doi":"10.1016/j.jksuci.2024.102113","DOIUrl":"https://doi.org/10.1016/j.jksuci.2024.102113","url":null,"abstract":"<div><p>In drone aerial target detection tasks, a high proportion of small targets and complex backgrounds often lead to false positives and missed detections, resulting in low detection accuracy. To improve the accuracy of the detection of small targets, this study proposes two improved models based on YOLOv8s, named IMCMD_YOLOv8_small and IMCMD_YOLOv8_large. Each model accommodates different application scenarios. First, the network structure was optimized by removing the backbone P5 layer used to detect large targets and merging the P4, P3, and P2 layers, which are better suited for detecting medium and small targets; P3 and P2 serve as detection heads to focus more on small targets. Subsequently, the coordinate attention mechanism is integrated into the backbone’s C2f, to create a C2f_CA module that enhances the model’ s focus on key information and secures a richer flow of gradient information. Subsequently, a multiscale attention feature fusion module was designed to merge the shallow and deep features. Finally, a Dynamic Head was introduced to unify the perception of scale, space, and tasks, further enhancing the detection capability for small targets. Experimental results on the VisDrone2019 dataset demonstrated that, compared with YOLOv8s, IMCMD_YOLOv8_small achieved improvements of 7.7% and 5.1% in [email protected] and [email protected]:0.95, respectively, with a 73.0% reduction in the parameter count. The IMCMD_YOLOv8_large model showed even more significant improvements in these metrics, reaching 10.8% and 7.3%, respectively, with a 47.7% reduction in the parameter count, displaying superior performance in small target detection tasks. The improved models not only enhanced the detection accuracy but also achieved model lightweighting, thereby proving the effectiveness of the improvement strategies and showcasing superior performance compared with other classic models.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002027/pdfft?md5=8bdeb619d762fdc2367a02f8611772c3&pid=1-s2.0-S1319157824002027-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141540370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Physically structured adversarial patch inspired by natural leaves multiply angles deceives infrared detectors 受自然树叶多角度启发的物理结构对抗补丁欺骗红外探测器
IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-07-01 DOI: 10.1016/j.jksuci.2024.102122

Researching infrared adversarial attacks is crucial for ensuring the safe deployment of security-sensitive systems reliant on infrared object detectors. However, current research on infrared adversarial attacks mainly focuses on pedestrian detection tasks. Due to the complex shape and structure of vehicles and the changing working conditions, adversarial attack in infrared vehicle detection pose challenges like difficult multi-angle attack, poor physical transferability, and weak environmental adaptation. This paper proposed Leaf-like Mask Bar Code (LMBC), a novel adversarial attack method for multi-angle physical black-box attack on infrared detectors. Inspired by natural leaf structures, a mask was designed to restrict the adversarial patch contour. Then, adversarial parameters of the patches (angle, sparsity, and position) were optimized using the Genetic Algorithm with Multi-segment (GAM). Moreover, leaf-like structures in physical adversarial patches were constructed using suitable infrared coating materials. deploying them at multiple angles. Experimental results demonstrated LMBC’s efficacy, paralyzing the infrared vehicle detector with an Average Precision (AP) as low as 33.7% and an average Attack Success Rate (ASR) as high as 92.9% across a distance of 2.4m 4.2 m and angles of 0° 360°. Moreover, LMBC’s adversarial patches transferred to mainstream detectors (e.g., Faster RCNN, Yolov3, etc.) and pedestrian detection tasks.

研究红外对抗攻击对于确保依赖红外物体探测器的安全敏感系统的安全部署至关重要。然而,目前对红外对抗攻击的研究主要集中在行人检测任务上。由于车辆的形状和结构复杂,工作环境多变,红外车辆检测中的对抗攻击存在多角度攻击难度大、物理转移性差、环境适应性弱等挑战。本文提出了一种针对红外探测器多角度物理黑盒攻击的新型对抗攻击方法--类树叶掩码条形码(LMBC)。受自然树叶结构的启发,本文设计了一个掩码来限制对抗性补丁的轮廓。然后,利用多分段遗传算法(GAM)优化了补丁的对抗参数(角度、稀疏度和位置)。此外,还使用合适的红外涂层材料在物理对抗补丁中构建了叶状结构,并将其部署在多个角度。实验结果证明了 LMBC 的功效,在 2.4 米至 4.2 米的距离和 0° 至 360° 的角度范围内,其瘫痪红外车辆探测器的平均精度(AP)低至 33.7%,平均攻击成功率(ASR)高达 92.9%。此外,LMBC 的对抗补丁还可用于主流检测器(如 Faster RCNN、Yolov3 等)和行人检测任务。
{"title":"Physically structured adversarial patch inspired by natural leaves multiply angles deceives infrared detectors","authors":"","doi":"10.1016/j.jksuci.2024.102122","DOIUrl":"10.1016/j.jksuci.2024.102122","url":null,"abstract":"<div><p>Researching infrared adversarial attacks is crucial for ensuring the safe deployment of security-sensitive systems reliant on infrared object detectors. However, current research on infrared adversarial attacks mainly focuses on pedestrian detection tasks. Due to the complex shape and structure of vehicles and the changing working conditions, adversarial attack in infrared vehicle detection pose challenges like difficult multi-angle attack, poor physical transferability, and weak environmental adaptation. This paper proposed Leaf-like Mask Bar Code (LMBC), a novel adversarial attack method for multi-angle physical black-box attack on infrared detectors. Inspired by natural leaf structures, a mask was designed to restrict the adversarial patch contour. Then, adversarial parameters of the patches (angle, sparsity, and position) were optimized using the Genetic Algorithm with Multi-segment (GAM). Moreover, leaf-like structures in physical adversarial patches were constructed using suitable infrared coating materials. deploying them at multiple angles. Experimental results demonstrated LMBC’s efficacy, paralyzing the infrared vehicle detector with an Average Precision (AP) as low as 33.7% and an average Attack Success Rate (ASR) as high as 92.9% across a distance of 2.4m 4.2 m and angles of 0° 360°. Moreover, LMBC’s adversarial patches transferred to mainstream detectors (e.g., Faster RCNN, Yolov3, etc.) and pedestrian detection tasks.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824002118/pdfft?md5=75ea3639728ca4afe725529410bfb979&pid=1-s2.0-S1319157824002118-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141630331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MF-Saudi: A multimodal framework for bridging the gap between audio and textual data for Saudi dialect detection MF-Saudi:弥合音频和文本数据鸿沟的多模态框架,用于沙特方言检测
IF 6.9 2区 计算机科学 Q1 Computer Science Pub Date : 2024-06-14 DOI: 10.1016/j.jksuci.2024.102084
Raed Alharbi

Detecting variations in dialects within a language can be challenging, particularly in regions with rich linguistic diversity like Saudi Arabia. To our knowledge, no prior attempts have been made to develop a multimodal, audio–textual framework for Saudi dialect detection. The current approaches often concentrate on detecting dialects only based on audio or textual data, which fails to capture the complex relationship between both modalities. In this paper, we propose a novel Multimodal Framework, called MF-Saudi, for Saudi dialect detection. The framework consists of three main components: (1) a pretrained BERT encoder for extracting and encoding textual information; (2) an acoustic model for representing audio signals and fusing them with textual information via the fusion layer; and (3) an alignment learning module to develop meaningful representations that capture the complexities of audio–text relationships, resulting in improved dialect detection. We conduct empirical evaluations on a real-world dataset, demonstrating that our solution outperforms some of the state-of-the-art baseline methods. The experiment’s code can be found here: https://github.com/raed19/MF-Saudi.

检测一种语言内部的方言变化是一项挑战,尤其是在沙特阿拉伯这样语言多样性丰富的地区。据我们所知,此前还没有人尝试过为沙特方言检测开发多模态、音频和文本框架。目前的方法通常只集中在基于音频或文本数据的方言检测上,无法捕捉两种模式之间的复杂关系。在本文中,我们提出了一种用于沙特方言检测的新型多模态框架,称为 MF-Saudi。该框架由三个主要部分组成:(1) 预训练 BERT 编码器,用于提取和编码文本信息;(2) 声学模型,用于表示音频信号,并通过融合层将音频信号与文本信息融合;以及 (3) 对齐学习模块,用于开发有意义的表征,捕捉音频与文本之间的复杂关系,从而改进方言检测。我们在真实世界的数据集上进行了实证评估,证明我们的解决方案优于一些最先进的基线方法。实验代码请访问:https://github.com/raed19/MF-Saudi。
{"title":"MF-Saudi: A multimodal framework for bridging the gap between audio and textual data for Saudi dialect detection","authors":"Raed Alharbi","doi":"10.1016/j.jksuci.2024.102084","DOIUrl":"10.1016/j.jksuci.2024.102084","url":null,"abstract":"<div><p>Detecting variations in dialects within a language can be challenging, particularly in regions with rich linguistic diversity like Saudi Arabia. To our knowledge, no prior attempts have been made to develop a multimodal, audio–textual framework for Saudi dialect detection. The current approaches often concentrate on detecting dialects only based on audio or textual data, which fails to capture the complex relationship between both modalities. In this paper, we propose a novel Multimodal Framework, called MF-Saudi, for Saudi dialect detection. The framework consists of three main components: (1) a pretrained BERT encoder for extracting and encoding textual information; (2) an acoustic model for representing audio signals and fusing them with textual information via the fusion layer; and (3) an alignment learning module to develop meaningful representations that capture the complexities of audio–text relationships, resulting in improved dialect detection. We conduct empirical evaluations on a real-world dataset, demonstrating that our solution outperforms some of the state-of-the-art baseline methods. The experiment’s code can be found here: <span>https://github.com/raed19/MF-Saudi</span><svg><path></path></svg>.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824001733/pdfft?md5=99b69313cadb5fce44b832f5ddaa2066&pid=1-s2.0-S1319157824001733-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141401309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Achieving local differential location privacy protection in 3D space via Hilbert encoding and optimized random response 通过希尔伯特编码和优化随机响应实现三维空间局部差分位置隐私保护
IF 6.9 2区 计算机科学 Q1 Computer Science Pub Date : 2024-06-12 DOI: 10.1016/j.jksuci.2024.102085
Yan Yan , Pengbin Yan , Adnan Mahmood , Yang Zhang , Quan Z. Sheng

The widespread use of spatial location-based services not only provides considerable convenience, but also exposes the downsides of location privacy leakage. Most of the existing user-side location privacy protection techniques are limited to planar locations. However, the extensive use of aircraft, sensor equipment and acquisition devices with positioning functions promotes the urgency of protecting the privacy of 3D spatial locations. Therefore, this study suggests a local differential privacy protection approach for 3D spatial locations. A 3D spatial decomposition and Hilbert encoding method are designed to reduce the 3D location data into one-dimensional encoding. The optimized random response mechanism was utilized to perturb the dimensional-reduced location encoding, which not only achieves user-side location privacy protection but also improves the accuracy of aggregated data on the server-side. Experiments on the real spatial location datasets show that the suggested method can reduce spatial location service quality loss, maintain the availability of perturbed spatial location and improve the operation efficiency of the spatial location perturbation algorithm.

基于空间位置的服务的广泛使用不仅为人们提供了极大的便利,同时也暴露出位置隐私泄露的弊端。现有的用户端位置隐私保护技术大多局限于平面位置。然而,随着具有定位功能的飞机、传感设备和采集设备的广泛使用,保护三维空间位置隐私的紧迫性日益突出。因此,本研究提出了一种针对三维空间位置的局部差分隐私保护方法。设计了一种三维空间分解和希尔伯特编码方法,将三维位置数据简化为一维编码。利用优化的随机响应机制对降维后的位置编码进行扰动,不仅实现了用户端的位置隐私保护,还提高了服务器端聚合数据的准确性。在真实空间位置数据集上的实验表明,建议的方法可以减少空间位置服务的质量损失,保持扰动空间位置的可用性,提高空间位置扰动算法的运行效率。
{"title":"Achieving local differential location privacy protection in 3D space via Hilbert encoding and optimized random response","authors":"Yan Yan ,&nbsp;Pengbin Yan ,&nbsp;Adnan Mahmood ,&nbsp;Yang Zhang ,&nbsp;Quan Z. Sheng","doi":"10.1016/j.jksuci.2024.102085","DOIUrl":"10.1016/j.jksuci.2024.102085","url":null,"abstract":"<div><p>The widespread use of spatial location-based services not only provides considerable convenience, but also exposes the downsides of location privacy leakage. Most of the existing user-side location privacy protection techniques are limited to planar locations. However, the extensive use of aircraft, sensor equipment and acquisition devices with positioning functions promotes the urgency of protecting the privacy of 3D spatial locations. Therefore, this study suggests a local differential privacy protection approach for 3D spatial locations. A 3D spatial decomposition and Hilbert encoding method are designed to reduce the 3D location data into one-dimensional encoding. The optimized random response mechanism was utilized to perturb the dimensional-reduced location encoding, which not only achieves user-side location privacy protection but also improves the accuracy of aggregated data on the server-side. Experiments on the real spatial location datasets show that the suggested method can reduce spatial location service quality loss, maintain the availability of perturbed spatial location and improve the operation efficiency of the spatial location perturbation algorithm.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824001745/pdfft?md5=a1d943978c233cc6b2ed8d36afb5d5b1&pid=1-s2.0-S1319157824001745-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141414182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAIR-U-Net: 3D guided attention inception residual u-net for brain tumor segmentation using multimodal MRI images GAIR-U-Net:利用多模态 MRI 图像进行脑肿瘤分割的三维引导注意残差 U 网
IF 6.9 2区 计算机科学 Q1 Computer Science Pub Date : 2024-06-09 DOI: 10.1016/j.jksuci.2024.102086
Evans Kipkoech Rutoh , Qin Zhi Guang , Noor Bahadar , Rehan Raza , Muhammad Shehzad Hanif

Deep learning technologies have led to substantial breakthroughs in the field of biomedical image analysis. Accurate brain tumor segmentation is an essential aspect of treatment planning. Radiologists agree that manual segmentation is a difficult and time-consuming task that frequently causes delays in the diagnosing process. While U-Net-based methods have been widely used for brain tumor segmentation, many challenges persist, particularly when dealing with tumors of varying sizes, locations, and shapes. Additionally, segmenting tumor regions with structures requires a comprehensive model, which can increase computational complexity and potentially cause gradient vanishing issues. This study presents a novel method called 3D Guided Attention-based deep Inception Residual U-Net (GAIR-U-Net) to address these challenges. This model combines attention mechanisms, an inception module, and residual blocks with dilated convolution to enhance feature representation and spatial context understanding. The backbone of the model is the U-Net model, which leverages the power of inception and residual connections to capture intricate patterns and hierarchical features while expanding the model’s width in three-dimensional space without significantly increasing computational complexity. The attention mechanisms play a role in focusing on important regions and areas while downgrading irrelevant details. The dilated convolutions in the network help in learning both local and global information, improving accuracy and adaptability in segmenting tumors. All the experiments in this study were carried out on multimodal MRI scans that include (T1-weighted, T1-ce, T2-weighted, and FLAIR sequences) from the BraTS 2020 dataset. The presented model is trained and tested on the same dataset, which exhibited promising performance compared to previous methods. On the BraTS 2020 validation dataset, the proposed model obtained a dice score of 0.8796, 0.8634, and 0.8441 for whole tumor (WT), tumor core (TC), and enhancing tumor (ET), respectively. These results demonstrate the model’s efficacy in precisely segmenting brain tumors across various modalities. Comparative analyses underscore the model’s versatility in handling tumor shape variations, size, and location, making it a promising solution for clinical applications.

深度学习技术在生物医学图像分析领域取得了重大突破。准确的脑肿瘤分割是治疗计划的一个重要方面。放射科医生一致认为,人工分割是一项艰巨而耗时的任务,经常会延误诊断过程。虽然基于 U-Net 的方法已被广泛用于脑肿瘤分割,但仍存在许多挑战,尤其是在处理不同大小、位置和形状的肿瘤时。此外,分割具有结构的肿瘤区域需要一个全面的模型,这会增加计算复杂度,并可能导致梯度消失问题。本研究提出了一种名为 "基于 3D 引导注意力的深度初始残差 U-Net (GAIR-U-Net) "的新方法来应对这些挑战。该模型将注意力机制、初始模块和残差块与扩张卷积相结合,以增强特征表示和空间上下文理解。该模型的主干是 U-Net 模型,它利用起始和残差连接的力量来捕捉错综复杂的模式和层次特征,同时在不显著增加计算复杂度的情况下扩大模型在三维空间中的宽度。注意力机制的作用是聚焦重要区域和领域,同时降低无关细节的等级。网络中的扩张卷积有助于学习局部和全局信息,从而提高分割肿瘤的准确性和适应性。本研究的所有实验都是在 BraTS 2020 数据集中的多模态磁共振成像扫描(包括 T1 加权、T1-ce、T2 加权和 FLAIR 序列)上进行的。该模型在同一数据集上进行了训练和测试,与之前的方法相比,表现出了良好的性能。在 BraTS 2020 验证数据集上,所提出的模型在全肿瘤(WT)、肿瘤核心(TC)和增强肿瘤(ET)方面的骰子得分分别为 0.8796、0.8634 和 0.8441。这些结果证明了该模型在各种模式下精确分割脑肿瘤的功效。对比分析凸显了该模型在处理肿瘤形状变化、大小和位置方面的多功能性,使其成为临床应用的一个有前途的解决方案。
{"title":"GAIR-U-Net: 3D guided attention inception residual u-net for brain tumor segmentation using multimodal MRI images","authors":"Evans Kipkoech Rutoh ,&nbsp;Qin Zhi Guang ,&nbsp;Noor Bahadar ,&nbsp;Rehan Raza ,&nbsp;Muhammad Shehzad Hanif","doi":"10.1016/j.jksuci.2024.102086","DOIUrl":"https://doi.org/10.1016/j.jksuci.2024.102086","url":null,"abstract":"<div><p>Deep learning technologies have led to substantial breakthroughs in the field of biomedical image analysis. Accurate brain tumor segmentation is an essential aspect of treatment planning. Radiologists agree that manual segmentation is a difficult and time-consuming task that frequently causes delays in the diagnosing process. While U-Net-based methods have been widely used for brain tumor segmentation, many challenges persist, particularly when dealing with tumors of varying sizes, locations, and shapes. Additionally, segmenting tumor regions with structures requires a comprehensive model, which can increase computational complexity and potentially cause gradient vanishing issues. This study presents a novel method called 3D Guided Attention-based deep Inception Residual U-Net (GAIR-U-Net) to address these challenges. This model combines attention mechanisms, an inception module, and residual blocks with dilated convolution to enhance feature representation and spatial context understanding. The backbone of the model is the U-Net model, which leverages the power of inception and residual connections to capture intricate patterns and hierarchical features while expanding the model’s width in three-dimensional space without significantly increasing computational complexity. The attention mechanisms play a role in focusing on important regions and areas while downgrading irrelevant details. The dilated convolutions in the network help in learning both local and global information, improving accuracy and adaptability in segmenting tumors. All the experiments in this study were carried out on multimodal MRI scans that include (T1-weighted, T1-ce, T2-weighted, and FLAIR sequences) from the BraTS 2020 dataset. The presented model is trained and tested on the same dataset, which exhibited promising performance compared to previous methods. On the BraTS 2020 validation dataset, the proposed model obtained a dice score of 0.8796, 0.8634, and 0.8441 for whole tumor (WT), tumor core (TC), and enhancing tumor (ET), respectively. These results demonstrate the model’s efficacy in precisely segmenting brain tumors across various modalities. Comparative analyses underscore the model’s versatility in handling tumor shape variations, size, and location, making it a promising solution for clinical applications.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824001757/pdfft?md5=8fbc647524401707d35ef8806d9bc8a7&pid=1-s2.0-S1319157824001757-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141313786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive density guided network with CNN and Transformer for underwater fish counting 采用 CNN 和 Transformer 的自适应密度引导网络用于水下鱼类计数
IF 6.9 2区 计算机科学 Q1 Computer Science Pub Date : 2024-06-08 DOI: 10.1016/j.jksuci.2024.102088
Shijian Zheng , Rujing Wang , Shitao Zheng , Liusan Wang , Hongkui Jiang

Accurate assessment of high-density underwater fish resources is vital to the aquaculture industry. It is directly related to the formulation of fishery insurance strategies and the implementation of breeding plans. However, accurately counting fish in high-density environments becomes challenging due to the uneven distribution of fish density and individual fish’s different sizes and postures. To break through this technical bottleneck, we developed an advanced adaptive density-guided high-density fish counting network. In detail, first of all, the network adopts a multi-layer feature fusion structure similar to UNet, which significantly enhances the matching between fish targets of different scales and feature pyramid levels, effectively alleviating the problems caused by scale changes and morphological deformations. Secondly, the network also introduces a density-guided adaptive selection module, which can intelligently judge the applicability of Convolutional Neural Network and Transformer blocks in different density areas, thereby achieving robust information transfer and interaction between blocks. Finally, to verify the effectiveness of this method, we also specially constructed two high-density data sets: a simulated high-density underwater fish image data set (SHUFD) and a real high-density underwater fish image data set (RHUFD). The proposed method has significant improvements over the state-of-the-art method (CUT) on SHUFD and RHUFD datasets, with the mean absolute error, mean square error, background region bias, foreground region bias and density map bias indicators improving by 3.44% and 6.47%, 11.43% and 4.41%, 23.91% and 29.48%, 4.43% and 10.33%, 8.3% and 13.14%, respectively.

准确评估高密度水下鱼类资源对水产养殖业至关重要。它直接关系到渔业保险战略的制定和养殖计划的实施。然而,由于鱼类密度分布不均以及鱼类个体的大小和姿态各异,在高密度环境中准确计数鱼类成为一项挑战。为了突破这一技术瓶颈,我们开发了一种先进的自适应密度引导型高密度鱼类计数网络。具体来说,首先,该网络采用了类似于 UNet 的多层特征融合结构,显著增强了不同尺度和特征金字塔层次的鱼类目标之间的匹配度,有效缓解了尺度变化和形态变形带来的问题。其次,该网络还引入了密度引导的自适应选择模块,可以智能判断卷积神经网络和变换器模块在不同密度区域的适用性,从而实现模块间稳健的信息传递和交互。最后,为了验证该方法的有效性,我们还专门构建了两个高密度数据集:模拟高密度水下鱼类图像数据集(SHUFD)和真实高密度水下鱼类图像数据集(RHUFD)。在 SHUFD 和 RHUFD 数据集上,提出的方法比最先进的方法(CUT)有明显改善,平均绝对误差、均方误差、背景区域偏差、前景区域偏差和密度图偏差指标分别提高了 3.44% 和 6.47%、11.43% 和 4.41%、23.91% 和 29.48%、4.43% 和 10.33%、8.3% 和 13.14%。
{"title":"Adaptive density guided network with CNN and Transformer for underwater fish counting","authors":"Shijian Zheng ,&nbsp;Rujing Wang ,&nbsp;Shitao Zheng ,&nbsp;Liusan Wang ,&nbsp;Hongkui Jiang","doi":"10.1016/j.jksuci.2024.102088","DOIUrl":"https://doi.org/10.1016/j.jksuci.2024.102088","url":null,"abstract":"<div><p>Accurate assessment of high-density underwater fish resources is vital to the aquaculture industry. It is directly related to the formulation of fishery insurance strategies and the implementation of breeding plans. However, accurately counting fish in high-density environments becomes challenging due to the uneven distribution of fish density and individual fish’s different sizes and postures. To break through this technical bottleneck, we developed an advanced adaptive density-guided high-density fish counting network. In detail, first of all, the network adopts a multi-layer feature fusion structure similar to UNet, which significantly enhances the matching between fish targets of different scales and feature pyramid levels, effectively alleviating the problems caused by scale changes and morphological deformations. Secondly, the network also introduces a density-guided adaptive selection module, which can intelligently judge the applicability of Convolutional Neural Network and Transformer blocks in different density areas, thereby achieving robust information transfer and interaction between blocks. Finally, to verify the effectiveness of this method, we also specially constructed two high-density data sets: a simulated high-density underwater fish image data set (SHUFD) and a real high-density underwater fish image data set (RHUFD). The proposed method has significant improvements over the state-of-the-art method (CUT) on SHUFD and RHUFD datasets, with the mean absolute error, mean square error, background region bias, foreground region bias and density map bias indicators improving by 3.44% and 6.47%, 11.43% and 4.41%, 23.91% and 29.48%, 4.43% and 10.33%, 8.3% and 13.14%, respectively.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":6.9,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824001770/pdfft?md5=c34a8965a063e990aacd9937c9b89a52&pid=1-s2.0-S1319157824001770-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141325760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of King Saud University-Computer and Information Sciences
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1