Pub Date : 2024-10-05DOI: 10.1016/j.jksuci.2024.102207
Vid Keršič, Sašo Karakatič, Muhamed Turkanović
Zero-knowledge proofs introduce a mechanism to prove that certain computations were performed without revealing any underlying information and are used commonly in blockchain-based decentralized apps (dapps). This cryptographic technique addresses trust issues prevalent in blockchain applications, and has now been adapted for machine learning (ML) services, known as Zero-Knowledge Machine Learning (ZKML). By leveraging the distributed nature of blockchains, this approach enhances the trustworthiness of ML deployments, and opens up new possibilities for privacy-preserving and robust ML applications within dapps. This paper provides a comprehensive overview of the ZKML process and its critical components for verifying ML services on-chain. Furthermore, this paper explores how blockchain technology and smart contracts can offer verifiable, trustless proof that a specific ML model has been used correctly to perform inference, all without relying on a single trusted entity. Additionally, the paper compares and reviews existing frameworks for implementing ZKML in dapps, serving as a reference point for researchers interested in this emerging field.
零知识证明引入了一种机制,用于证明某些计算是在不透露任何底层信息的情况下进行的,常用于基于区块链的去中心化应用程序(dapps)。这种加密技术解决了区块链应用中普遍存在的信任问题,现在已被用于机器学习(ML)服务,即零知识机器学习(ZKML)。通过利用区块链的分布式特性,这种方法提高了 ML 部署的可信度,并为 dapps 中保护隐私和稳健的 ML 应用开辟了新的可能性。本文全面概述了 ZKML 流程及其用于验证链上 ML 服务的关键组件。此外,本文还探讨了区块链技术和智能合约如何提供可验证的无信任证明,证明特定的 ML 模型已被正确用于执行推理,而无需依赖单一的可信实体。此外,本文还比较和回顾了在 dapp 中实施 ZKML 的现有框架,为对这一新兴领域感兴趣的研究人员提供了参考。
{"title":"On-chain zero-knowledge machine learning: An overview and comparison","authors":"Vid Keršič, Sašo Karakatič, Muhamed Turkanović","doi":"10.1016/j.jksuci.2024.102207","DOIUrl":"10.1016/j.jksuci.2024.102207","url":null,"abstract":"<div><div>Zero-knowledge proofs introduce a mechanism to prove that certain computations were performed without revealing any underlying information and are used commonly in blockchain-based decentralized apps (dapps). This cryptographic technique addresses trust issues prevalent in blockchain applications, and has now been adapted for machine learning (ML) services, known as Zero-Knowledge Machine Learning (ZKML). By leveraging the distributed nature of blockchains, this approach enhances the trustworthiness of ML deployments, and opens up new possibilities for privacy-preserving and robust ML applications within dapps. This paper provides a comprehensive overview of the ZKML process and its critical components for verifying ML services on-chain. Furthermore, this paper explores how blockchain technology and smart contracts can offer verifiable, trustless proof that a specific ML model has been used correctly to perform inference, all without relying on a single trusted entity. Additionally, the paper compares and reviews existing frameworks for implementing ZKML in dapps, serving as a reference point for researchers interested in this emerging field.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102207"},"PeriodicalIF":5.2,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-05DOI: 10.1016/j.jksuci.2024.102206
Chaoran Wang , Mingyang Wang , Xianjie Wang , Yingchun Tan
Objectives:
Sequential recommendation aims to recommend items that are relevant to users’ interests based on their existing interaction sequences. Current models lack in capturing users’ latent intentions and do not sufficiently consider sequence information during the modeling of users and items. Additionally, noise in user interaction sequences can affect the model’s optimization process.
Methods:
This paper introduces an intent perceived sequential recommendation model (IPSRM). IPSRM employs the generalized expectation–maximization (EM) framework, alternating between learning sequence representations and optimizing the model to better capture the underlying intentions of user interactions. Specifically, IPSRM maps unlabeled behavioral sequences into frequency domain filtering and random Gaussian distribution space. These mappings reduce the impact of noise and improve the learning of user behavior representations. Through clustering process, IPSRM captures users’ potential interaction intentions and incorporates them as one of the supervisions into the contrastive self-supervised learning process to guide the optimization process.
Results:
Experimental results on four standard datasets demonstrate the superiority of IPSRM. Comparative experiments also verify that IPSRM exhibits strong robustness under cold start and noisy interaction conditions.
Conclusions:
Capturing latent user intentions, integrating intention-based supervision into model optimization, and mitigating noise in sequential modeling significantly enhance the performance of sequential recommendation systems.
{"title":"IPSRM: An intent perceived sequential recommendation model","authors":"Chaoran Wang , Mingyang Wang , Xianjie Wang , Yingchun Tan","doi":"10.1016/j.jksuci.2024.102206","DOIUrl":"10.1016/j.jksuci.2024.102206","url":null,"abstract":"<div><h3>Objectives:</h3><div>Sequential recommendation aims to recommend items that are relevant to users’ interests based on their existing interaction sequences. Current models lack in capturing users’ latent intentions and do not sufficiently consider sequence information during the modeling of users and items. Additionally, noise in user interaction sequences can affect the model’s optimization process.</div></div><div><h3>Methods:</h3><div>This paper introduces an intent perceived sequential recommendation model (IPSRM). IPSRM employs the generalized expectation–maximization (EM) framework, alternating between learning sequence representations and optimizing the model to better capture the underlying intentions of user interactions. Specifically, IPSRM maps unlabeled behavioral sequences into frequency domain filtering and random Gaussian distribution space. These mappings reduce the impact of noise and improve the learning of user behavior representations. Through clustering process, IPSRM captures users’ potential interaction intentions and incorporates them as one of the supervisions into the contrastive self-supervised learning process to guide the optimization process.</div></div><div><h3>Results:</h3><div>Experimental results on four standard datasets demonstrate the superiority of IPSRM. Comparative experiments also verify that IPSRM exhibits strong robustness under cold start and noisy interaction conditions.</div></div><div><h3>Conclusions:</h3><div>Capturing latent user intentions, integrating intention-based supervision into model optimization, and mitigating noise in sequential modeling significantly enhance the performance of sequential recommendation systems.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102206"},"PeriodicalIF":5.2,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-05DOI: 10.1016/j.jksuci.2024.102204
Md Jahid Hasan , Wan Siti Halimatul Munirah Wan Ahmad , Mohammad Faizal Ahmad Fauzi , Jenny Tung Hiong Lee , See Yee Khor , Lai Meng Looi , Fazly Salleh Abas , Afzan Adam , Elaine Wan Ling Chan
Histopathology image segmentation and classification are essential for diagnosing and treating breast cancer. This study introduced a highly accurate segmentation and classification for histopathology images using a single architecture. We utilized the famous segmentation architectures, SegNet and U-Net, and modified the decoder to attach ResNet, VGG and DenseNet to perform classification tasks. These hybrid models are integrated with Stardist as the backbone, and implemented in a real-time pathologist workflow with a graphical user interface. These models were trained and tested offline using the ER-IHC-stained private and H&E-stained public datasets (MoNuSeg). For real-time evaluation, the proposed model was evaluated using PR-IHC-stained glass slides. It achieved the highest segmentation pixel-based F1-score of 0.902 and 0.903 for private and public datasets respectively, and a classification-based F1-score of 0.833 for private dataset. The experiment shows the robustness of our method where a model trained on ER-IHC dataset able to perform well on real-time microscopy of PR-IHC slides on both 20x and 40x magnification. This will help the pathologists with a quick decision-making process.
{"title":"Real-time segmentation and classification of whole-slide images for tumor biomarker scoring","authors":"Md Jahid Hasan , Wan Siti Halimatul Munirah Wan Ahmad , Mohammad Faizal Ahmad Fauzi , Jenny Tung Hiong Lee , See Yee Khor , Lai Meng Looi , Fazly Salleh Abas , Afzan Adam , Elaine Wan Ling Chan","doi":"10.1016/j.jksuci.2024.102204","DOIUrl":"10.1016/j.jksuci.2024.102204","url":null,"abstract":"<div><div>Histopathology image segmentation and classification are essential for diagnosing and treating breast cancer. This study introduced a highly accurate segmentation and classification for histopathology images using a single architecture. We utilized the famous segmentation architectures, SegNet and U-Net, and modified the decoder to attach ResNet, VGG and DenseNet to perform classification tasks. These hybrid models are integrated with Stardist as the backbone, and implemented in a real-time pathologist workflow with a graphical user interface. These models were trained and tested offline using the ER-IHC-stained private and H&E-stained public datasets (MoNuSeg). For real-time evaluation, the proposed model was evaluated using PR-IHC-stained glass slides. It achieved the highest segmentation pixel-based F1-score of 0.902 and 0.903 for private and public datasets respectively, and a classification-based F1-score of 0.833 for private dataset. The experiment shows the robustness of our method where a model trained on ER-IHC dataset able to perform well on real-time microscopy of PR-IHC slides on both 20x and 40x magnification. This will help the pathologists with a quick decision-making process.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102204"},"PeriodicalIF":5.2,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-03DOI: 10.1016/j.jksuci.2024.102202
Yu Zhong, Bo Shen
Extracting structured information from unstructured text is crucial for knowledge management and utilization, which is the goal of document-level relation extraction. Existing graph-based methods face issues with information confusion and integration, limiting the reasoning capabilities of the model. To tackle this problem, a dual-stream dynamic graph structural network is proposed to model documents from various perspectives. Leveraging the richness of document information, a static document heterogeneous graph is constructed. A dynamic heterogeneous document graph is then induced based on this foundation to facilitate global information aggregation for entity representation learning. Additionally, the static document graph is decomposed into multi-level static semantic graphs, and multi-layer dynamic semantic graphs are further induced, explicitly segregating information from different levels. Information from different streams is effectively integrated via an information integrator. To mitigate the interference of noise during the reasoning process, a noise regularization mechanism is also designed. The experimental results on three extensively utilized publicly accessible datasets for document-level relation extraction demonstrate that our model achieves F1 scores of 62.56%, 71.1%, and 86.9% on the DocRED, CDR, and GDA datasets, respectively, significantly outperforming the baselines. Further analysis also demonstrates the effectiveness of the model in multi-entity scenarios.
从非结构化文本中提取结构化信息对于知识管理和利用至关重要,这也是文档级关系提取的目标。现有的基于图的方法面临着信息混淆和整合的问题,限制了模型的推理能力。为解决这一问题,我们提出了一种双流动态图结构网络,从不同角度对文档进行建模。利用丰富的文档信息,构建静态文档异构图。然后在此基础上诱导出动态异构文档图,以促进实体表征学习的全局信息聚合。此外,静态文档图被分解成多层次的静态语义图,并进一步诱导出多层次的动态语义图,明确分离来自不同层次的信息。来自不同信息流的信息通过信息集成器进行有效集成。为了减少推理过程中的噪声干扰,还设计了噪声正则化机制。在三个广泛使用的公开文档级关系提取数据集上的实验结果表明,我们的模型在 DocRED、CDR 和 GDA 数据集上的 F1 分数分别达到了 62.56%、71.1% 和 86.9%,明显优于基线模型。进一步的分析还证明了该模型在多实体场景中的有效性。
{"title":"Dual-stream dynamic graph structure network for document-level relation extraction","authors":"Yu Zhong, Bo Shen","doi":"10.1016/j.jksuci.2024.102202","DOIUrl":"10.1016/j.jksuci.2024.102202","url":null,"abstract":"<div><div>Extracting structured information from unstructured text is crucial for knowledge management and utilization, which is the goal of document-level relation extraction. Existing graph-based methods face issues with information confusion and integration, limiting the reasoning capabilities of the model. To tackle this problem, a dual-stream dynamic graph structural network is proposed to model documents from various perspectives. Leveraging the richness of document information, a static document heterogeneous graph is constructed. A dynamic heterogeneous document graph is then induced based on this foundation to facilitate global information aggregation for entity representation learning. Additionally, the static document graph is decomposed into multi-level static semantic graphs, and multi-layer dynamic semantic graphs are further induced, explicitly segregating information from different levels. Information from different streams is effectively integrated via an information integrator. To mitigate the interference of noise during the reasoning process, a noise regularization mechanism is also designed. The experimental results on three extensively utilized publicly accessible datasets for document-level relation extraction demonstrate that our model achieves F1 scores of 62.56%, 71.1%, and 86.9% on the DocRED, CDR, and GDA datasets, respectively, significantly outperforming the baselines. Further analysis also demonstrates the effectiveness of the model in multi-entity scenarios.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102202"},"PeriodicalIF":5.2,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.jksuci.2024.102203
Yingqi Lu , Xiangsuo Fan , Jinfeng Wang , Shaojun Chen , Jie Meng
Accurate segmentation of lung nodules is crucial for the early detection of lung cancer and other pulmonary diseases. Traditional segmentation methods face several challenges, such as the overlap between nodules and surrounding anatomical structures like blood vessels and bronchi, as well as the variability in nodule size and shape, which complicates the segmentation algorithms. Existing methods often inadequately address these issues, highlighting the need for a more effective solution. To address these challenges, this paper proposes an improved multi-scale parallel fusion encoding network, ParaU-Net. ParaU-Net enhances the segmentation accuracy and model performance by optimizing the encoding process, improving feature extraction, preserving down-sampling information, and expanding the receptive field. Specifically, the multi-scale parallel fusion mechanism introduced in ParaU-Net better captures the fine features of nodules and reduces interference from other structures. Experiments conducted on the LIDC (The Lung Image Database Consortium) public dataset demonstrate the excellent performance of ParaU-Net in segmentation tasks, with results showing an IoU of 87.15%, Dice of 92.16%, F1-score of 92.24%, F2-score of 92.33%, and F0.5-score of 92.69%. These results significantly outperform other advanced segmentation methods, validating the effectiveness and accuracy of the proposed model in lung nodule CT image analysis. The code is available at https://github.com/XiaoBai-Lyq/ParaU-Net.
{"title":"ParaU-Net: An improved UNet parallel coding network for lung nodule segmentation","authors":"Yingqi Lu , Xiangsuo Fan , Jinfeng Wang , Shaojun Chen , Jie Meng","doi":"10.1016/j.jksuci.2024.102203","DOIUrl":"10.1016/j.jksuci.2024.102203","url":null,"abstract":"<div><div>Accurate segmentation of lung nodules is crucial for the early detection of lung cancer and other pulmonary diseases. Traditional segmentation methods face several challenges, such as the overlap between nodules and surrounding anatomical structures like blood vessels and bronchi, as well as the variability in nodule size and shape, which complicates the segmentation algorithms. Existing methods often inadequately address these issues, highlighting the need for a more effective solution. To address these challenges, this paper proposes an improved multi-scale parallel fusion encoding network, ParaU-Net. ParaU-Net enhances the segmentation accuracy and model performance by optimizing the encoding process, improving feature extraction, preserving down-sampling information, and expanding the receptive field. Specifically, the multi-scale parallel fusion mechanism introduced in ParaU-Net better captures the fine features of nodules and reduces interference from other structures. Experiments conducted on the LIDC (The Lung Image Database Consortium) public dataset demonstrate the excellent performance of ParaU-Net in segmentation tasks, with results showing an IoU of 87.15%, Dice of 92.16%, F1-score of 92.24%, F2-score of 92.33%, and F0.5-score of 92.69%. These results significantly outperform other advanced segmentation methods, validating the effectiveness and accuracy of the proposed model in lung nodule CT image analysis. The code is available at <span><span>https://github.com/XiaoBai-Lyq/ParaU-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102203"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.jksuci.2024.102200
Fan Wang , Xiaochen Yuan , Yue Liu , Chan-Tong Lam
Lung auscultation is essential for early lung condition detection. Categorizing adventitious lung sounds requires expert discrimination by medical specialists. This paper details the features of LungNeXt, a novel classification model specifically designed for lung sound analysis. Furthermore, we propose two auxiliary methods: RandClipMix (RCM) for data augmentation and Enhanced Mel-Spectrogram for Feature Extraction (EMFE). RCM addresses the issue of data imbalance by randomly mixing clips within the same category to create new adventitious lung sounds. EMFE augments specific frequency bands in spectrograms to highlight adventitious features. These contributions enable LungNeXt to achieve outstanding performance. LungNeXt optimally integrates an appropriate number of NeXtblocks, ensuring superior performance and a lightweight model architecture. The proposed RCM and EMFE methods, along with the LungNeXt classification network, have been evaluated on the SPRSound dataset. Experimental results revealed a commendable score of 0.5699 for the lung sound five-category task on SPRSound. Specifically, the LungNeXt model is characterized by its efficiency, with only 3.804M parameters and a computational complexity of 0.659G FLOPS. This lightweight and efficient model is particularly well-suited for applications in electronic stethoscope back-end processing equipment, providing efficient diagnostic advice to physicians and patients.
{"title":"LungNeXt: A novel lightweight network utilizing enhanced mel-spectrogram for lung sound classification","authors":"Fan Wang , Xiaochen Yuan , Yue Liu , Chan-Tong Lam","doi":"10.1016/j.jksuci.2024.102200","DOIUrl":"10.1016/j.jksuci.2024.102200","url":null,"abstract":"<div><div>Lung auscultation is essential for early lung condition detection. Categorizing adventitious lung sounds requires expert discrimination by medical specialists. This paper details the features of LungNeXt, a novel classification model specifically designed for lung sound analysis. Furthermore, we propose two auxiliary methods: RandClipMix (RCM) for data augmentation and Enhanced Mel-Spectrogram for Feature Extraction (EMFE). RCM addresses the issue of data imbalance by randomly mixing clips within the same category to create new adventitious lung sounds. EMFE augments specific frequency bands in spectrograms to highlight adventitious features. These contributions enable LungNeXt to achieve outstanding performance. LungNeXt optimally integrates an appropriate number of NeXtblocks, ensuring superior performance and a lightweight model architecture. The proposed RCM and EMFE methods, along with the LungNeXt classification network, have been evaluated on the SPRSound dataset. Experimental results revealed a commendable score of 0.5699 for the lung sound five-category task on SPRSound. Specifically, the LungNeXt model is characterized by its efficiency, with only 3.804M parameters and a computational complexity of 0.659G FLOPS. This lightweight and efficient model is particularly well-suited for applications in electronic stethoscope back-end processing equipment, providing efficient diagnostic advice to physicians and patients.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 8","pages":"Article 102200"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.jksuci.2024.102194
Qingzeng Song , Yao Dai , Hao Lu , Guanghao Jin
In this era of Transformers enjoying remarkable success, Convolutional Neural Networks (CNNs) remain highly relevant and useful. Indeed, hybrid Transformer-CNN network architectures, which combine the benefits of both approaches, have achieved impressive results. Vision Transformer (ViT) is a significant neural network architecture that features a convolutional layer as its first layer, primarily built on the transformer framework. However, owing to the distinct computation patterns inherent in attention and convolution, existing hardware accelerators for these two models are typically designed separately and lack a unified approach toward accelerating both models efficiently. In this paper, we present a dedicated accelerator on a field-programmable gate array (FPGA) platform. The accelerator, which integrates a configurable three-dimensional systolic array, is specifically designed to accelerate the inferential capabilities of hybrid Transformer-CNN networks. The Convolution and Transformer computations can be mapped to a systolic array by unifying these operations for matrix multiplication. Softmax and LayerNorm which are frequently used in hybrid Transformer-CNN networks were also implemented on FPGA boards. The accelerator achieved high performance with a peak throughput of 722 GOP/s at an average energy efficiency of 53 GOPS/W. Its respective computation latencies were 51.3 ms, 18.1 ms, and 6.8 ms for ViT-Base, ViT-Small, and ViT-Tiny. The accelerator provided a improvement in energy efficiency compared to the CPU, a improvement compared to the GPU, and a to improvement compared to existing accelerators regarding speed and energy efficiency.
{"title":"High-throughput systolic array-based accelerator for hybrid transformer-CNN networks","authors":"Qingzeng Song , Yao Dai , Hao Lu , Guanghao Jin","doi":"10.1016/j.jksuci.2024.102194","DOIUrl":"10.1016/j.jksuci.2024.102194","url":null,"abstract":"<div><div>In this era of Transformers enjoying remarkable success, Convolutional Neural Networks (CNNs) remain highly relevant and useful. Indeed, hybrid Transformer-CNN network architectures, which combine the benefits of both approaches, have achieved impressive results. Vision Transformer (ViT) is a significant neural network architecture that features a convolutional layer as its first layer, primarily built on the transformer framework. However, owing to the distinct computation patterns inherent in attention and convolution, existing hardware accelerators for these two models are typically designed separately and lack a unified approach toward accelerating both models efficiently. In this paper, we present a dedicated accelerator on a field-programmable gate array (FPGA) platform. The accelerator, which integrates a configurable three-dimensional systolic array, is specifically designed to accelerate the inferential capabilities of hybrid Transformer-CNN networks. The Convolution and Transformer computations can be mapped to a systolic array by unifying these operations for matrix multiplication. Softmax and LayerNorm which are frequently used in hybrid Transformer-CNN networks were also implemented on FPGA boards. The accelerator achieved high performance with a peak throughput of 722 GOP/s at an average energy efficiency of 53 GOPS/W. Its respective computation latencies were 51.3 ms, 18.1 ms, and 6.8 ms for ViT-Base, ViT-Small, and ViT-Tiny. The accelerator provided a <span><math><mrow><mn>12</mn><mo>×</mo></mrow></math></span> improvement in energy efficiency compared to the CPU, a <span><math><mrow><mn>2</mn><mo>.</mo><mn>3</mn><mo>×</mo></mrow></math></span> improvement compared to the GPU, and a <span><math><mrow><mn>1</mn><mo>.</mo><mn>5</mn><mo>×</mo></mrow></math></span> to <span><math><mrow><mn>2</mn><mo>×</mo></mrow></math></span> improvement compared to existing accelerators regarding speed and energy efficiency.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 8","pages":"Article 102194"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Modeling long-range dependencies among features has become a consensus to improve the results of single image super-resolution (SISR), which stimulates interest in enlarging the kernel sizes in convolutional neural networks (CNNs). Although larger kernels definitely improve the network performance, network parameters and computational complexities are raised sharply as well. Hence, an optimization of setting the kernel sizes is required to improve the efficiency of the network. In this work, we study the influence of the positions of larger kernels on the network performance, and propose a scalable attention network (SCAN). In SCAN, we propose a depth-related attention block (DRAB) that consists of several multi-scale information enhancement blocks (MIEBs) and resizable-kernel attention blocks (RKABs). The RKAB dynamically adjusts the kernel size concerning the locations of the DRABs in the network. The resizable mechanism allows the network to extract more informative features in shallower layers with larger kernels and focus on useful information in deeper layers with smaller ones, which effectively improves the SR results. Extensive experiments demonstrate that the proposed SCAN outperforms other state-of-the-art lightweight SR methods. Our codes are available at https://github.com/ginsengf/SCAN.
建立特征之间的长程依赖关系模型已成为改善单图像超分辨率(SISR)结果的共识,这激发了人们对扩大卷积神经网络(CNN)内核大小的兴趣。虽然增大内核肯定会提高网络性能,但网络参数和计算复杂度也会大幅提高。因此,需要对内核大小的设置进行优化,以提高网络的效率。在这项工作中,我们研究了较大内核的位置对网络性能的影响,并提出了一种可扩展的注意力网络(SCAN)。在 SCAN 中,我们提出了一种深度相关注意力块(DRAB),它由多个多尺度信息增强块(MIEB)和可调整大小的内核注意力块(RKAB)组成。RKAB 可根据 DRAB 在网络中的位置动态调整内核大小。这种可调整大小的机制允许网络在较浅的层中用较大的内核提取更多的信息特征,而在较深的层中用较小的内核关注有用的信息,从而有效地改善了 SR 结果。大量实验证明,所提出的 SCAN 优于其他最先进的轻量级 SR 方法。我们的代码见 https://github.com/ginsengf/SCAN。
{"title":"A scalable attention network for lightweight image super-resolution","authors":"Jinsheng Fang , Xinyu Chen , Jianglong Zhao , Kun Zeng","doi":"10.1016/j.jksuci.2024.102185","DOIUrl":"10.1016/j.jksuci.2024.102185","url":null,"abstract":"<div><div>Modeling long-range dependencies among features has become a consensus to improve the results of single image super-resolution (SISR), which stimulates interest in enlarging the kernel sizes in convolutional neural networks (CNNs). Although larger kernels definitely improve the network performance, network parameters and computational complexities are raised sharply as well. Hence, an optimization of setting the kernel sizes is required to improve the efficiency of the network. In this work, we study the influence of the positions of larger kernels on the network performance, and propose a scalable attention network (SCAN). In SCAN, we propose a depth-related attention block (DRAB) that consists of several multi-scale information enhancement blocks (MIEBs) and resizable-kernel attention blocks (RKABs). The RKAB dynamically adjusts the kernel size concerning the locations of the DRABs in the network. The resizable mechanism allows the network to extract more informative features in shallower layers with larger kernels and focus on useful information in deeper layers with smaller ones, which effectively improves the SR results. Extensive experiments demonstrate that the proposed SCAN outperforms other state-of-the-art lightweight SR methods. Our codes are available at <span><span>https://github.com/ginsengf/SCAN</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 8","pages":"Article 102185"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-01DOI: 10.1016/j.jksuci.2024.102197
Zhiyuan Zou , Bangchao Wang , Xinrong Hu , Yang Deng , Hongyan Wan , Huan Jin
This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With Code Dependency (GA-XWCoDe), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWCoDe outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of ¡0.01 and demonstrates exceptional performance and stability across various training data scales.
本研究针对需求到代码的可追溯性所面临的挑战,提出了一种新的模型--代码依赖性遗传算法-XGBoost(GA-XWCoDe),该模型集成了 eXtreme Gradient Boosting(XGBoost)、Node2Vec 模型加权代码依赖性策略和参数优化遗传算法。XGBoost 可减轻过度拟合并增强模型稳定性,而 Node2Vec 则可提高低置信度链接的预测准确性。遗传算法用于有效优化模型参数,降低了传统方法的资源强度。实验结果表明,就四个数据集的平均 F1 性能而言,GA-XWCoDe 比最先进的 TRAceability lInk cLassifier(TRAIL)方法高出 17.44%,比需求可追溯性深林(DF4RT)方法高出 33.36%。在置信度为 α¡0.01 时,它明显优于所有基线方法,并在各种训练数据规模下表现出卓越的性能和稳定性。
{"title":"Enhancing requirements-to-code traceability with GA-XWCoDe: Integrating XGBoost, Node2Vec, and genetic algorithms for improving model performance and stability","authors":"Zhiyuan Zou , Bangchao Wang , Xinrong Hu , Yang Deng , Hongyan Wan , Huan Jin","doi":"10.1016/j.jksuci.2024.102197","DOIUrl":"10.1016/j.jksuci.2024.102197","url":null,"abstract":"<div><div>This study addresses the challenge of requirements-to-code traceability by proposing a novel model, Genetic Algorithm-XGBoost With Code Dependency (GA-XWCoDe), which integrates eXtreme Gradient Boosting (XGBoost) with a Node2Vec model-weighted code dependency strategy and genetic algorithms for parameter optimisation. XGBoost mitigates overfitting and enhances model stability, while Node2Vec improves prediction accuracy for low-confidence links. Genetic algorithms are employed to optimise model parameters efficiently, reducing the resource intensity of traditional methods. Experimental results show that GA-XWCoDe outperforms the state-of-the-art method TRAceability lInk cLassifier (TRAIL) by 17.44% and Deep Forest for Requirement traceability (DF4RT) by 33.36% in terms of average F1 performance across four datasets. It is significantly superior to all baseline methods at a confidence level of <span><math><mi>α</mi></math></span>¡0.01 and demonstrates exceptional performance and stability across various training data scales.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 8","pages":"Article 102197"},"PeriodicalIF":5.2,"publicationDate":"2024-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-30DOI: 10.1016/j.jksuci.2024.102199
Antonio Cedillo-Hernandez , Lydia Velazquez-Garcia , Manuel Cedillo-Hernandez , David Conchouso-Gonzalez
Generally speaking, those watermarking studies using the spatial domain tend to be fast but with limited robustness and imperceptibility while those performed in other transform domains are robust but have high computational cost. Watermarking applied to digital video has as one of the main challenges the large amount of computational power required due to the huge amount of information to be processed. In this paper we propose a watermarking algorithm for digital video that addresses this problem. To increase the speed, the watermark is embedded using a technique to modify the DCT coefficients directly in the spatial domain, in addition to carrying out this process considering the video scene as the basic unit and not the video frame. In terms of robustness, the watermark is modulated by a Just Noticeable Distortion (JND) scheme computed directly in the spatial domain guided by visual attention to increase the strength of the watermark to the maximum level but without this operation being perceivable by human eyes. Experimental results confirm that the proposed method achieves remarkable performance in terms of processing time, robustness and imperceptibility compared to previous studies.
{"title":"Fast and robust JND-guided video watermarking scheme in spatial domain","authors":"Antonio Cedillo-Hernandez , Lydia Velazquez-Garcia , Manuel Cedillo-Hernandez , David Conchouso-Gonzalez","doi":"10.1016/j.jksuci.2024.102199","DOIUrl":"10.1016/j.jksuci.2024.102199","url":null,"abstract":"<div><div>Generally speaking, those watermarking studies using the spatial domain tend to be fast but with limited robustness and imperceptibility while those performed in other transform domains are robust but have high computational cost. Watermarking applied to digital video has as one of the main challenges the large amount of computational power required due to the huge amount of information to be processed. In this paper we propose a watermarking algorithm for digital video that addresses this problem. To increase the speed, the watermark is embedded using a technique to modify the DCT coefficients directly in the spatial domain, in addition to carrying out this process considering the video scene as the basic unit and not the video frame. In terms of robustness, the watermark is modulated by a Just Noticeable Distortion (JND) scheme computed directly in the spatial domain guided by visual attention to increase the strength of the watermark to the maximum level but without this operation being perceivable by human eyes. Experimental results confirm that the proposed method achieves remarkable performance in terms of processing time, robustness and imperceptibility compared to previous studies.</div></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":"36 9","pages":"Article 102199"},"PeriodicalIF":5.2,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142424439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}