Pub Date : 2024-11-12DOI: 10.1016/j.future.2024.107585
M. Saqib Nawaz , M. Zohaib Nawaz , Yongshun Gong , Philippe Fournier-Viger , Abdoulaye Baniré Diallo
Genomes hold the complete genetic information of an organism. Examining and analyzing genomic data plays a critical role in properly understanding an organism, particularly the main characteristics, functionalities, and evolving nature of harmful viruses. However, the rapid increase in genomic data poses new challenges and demands for extracting meaningful and valuable insights from large and complex genomic datasets. In this paper, a novel Framework for Genome Data Analysis (F4GDA), is developed that offers various methods for the analysis of viral genomic data in various forms. The framework’s methods can not only analyze the changes in genomes but also various genome contents. As a case study, the genomes of five SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) VoC (variants of concern), which are divided into three types/groups on the basis of geographical locations, are analyzed using this framework to investigate (1) the nucleotides, amino acids and synonymous codon changes in the whole genomes of VoC as well as in the Spike (S) protein, (2) whether different environments affect the rate of changes in genomes, (3) the variations in nucleotide bases, amino acids, and codon base compositions in VoC genomes, and (4) to compare VoC genomes with the reference genome sequence of SARS-CoV-2.
基因组拥有生物体的全部遗传信息。检查和分析基因组数据对于正确理解生物体,特别是有害病毒的主要特征、功能和进化性质起着至关重要的作用。然而,基因组数据的快速增长对从庞大而复杂的基因组数据集中提取有意义、有价值的见解提出了新的挑战和要求。本文开发了一个新颖的基因组数据分析框架(Framework for Genome Data Analysis,F4GDA),为各种形式的病毒基因组数据分析提供了多种方法。该框架的方法不仅能分析基因组的变化,还能分析各种基因组内容。作为一项案例研究,我们利用该框架分析了五种 SARS-CoV-2(严重急性呼吸系统综合征冠状病毒 2)VoC(关注变种)的基因组,这些变种根据地理位置被分为三种类型/组别,研究内容包括:(1) 核苷酸、氨基酸和同义密码子的变化;(2) 核苷酸、氨基酸和同义密码子的变化;(3) 核苷酸、氨基酸和同义密码子的变化、(2)不同环境是否影响基因组的变化率;(3)VoC 基因组中核苷酸碱基、氨基酸和密码子碱基组成的变化;以及(4)VoC 基因组与 SARS-CoV-2 参考基因组序列的比较。
{"title":"In silico framework for genome analysis","authors":"M. Saqib Nawaz , M. Zohaib Nawaz , Yongshun Gong , Philippe Fournier-Viger , Abdoulaye Baniré Diallo","doi":"10.1016/j.future.2024.107585","DOIUrl":"10.1016/j.future.2024.107585","url":null,"abstract":"<div><div>Genomes hold the complete genetic information of an organism. Examining and analyzing genomic data plays a critical role in properly understanding an organism, particularly the main characteristics, functionalities, and evolving nature of harmful viruses. However, the rapid increase in genomic data poses new challenges and demands for extracting meaningful and valuable insights from large and complex genomic datasets. In this paper, a novel Framework for Genome Data Analysis (F4GDA), is developed that offers various methods for the analysis of viral genomic data in various forms. The framework’s methods can not only analyze the changes in genomes but also various genome contents. As a case study, the genomes of five SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) VoC (variants of concern), which are divided into three types/groups on the basis of geographical locations, are analyzed using this framework to investigate (1) the nucleotides, amino acids and synonymous codon changes in the whole genomes of VoC as well as in the Spike (S) protein, (2) whether different environments affect the rate of changes in genomes, (3) the variations in nucleotide bases, amino acids, and codon base compositions in VoC genomes, and (4) to compare VoC genomes with the reference genome sequence of SARS-CoV-2.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107585"},"PeriodicalIF":6.2,"publicationDate":"2024-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-10DOI: 10.1016/j.future.2024.107600
Yidong Xu , Rui Han , Xiaojiang Zuo , Junyan Ouyang , Chi Harold Liu , Lydia Y. Chen
Edge applications are increasingly empowered by deep neural networks (DNN) and face the challenges of adapting or retraining models for the changes in input data domains and learning tasks. The existing techniques to enable DNN retraining on edge devices are to configure the memory-related hyperparameters, termed m-hyperparameters, via batch size reduction, parameter freezing, and gradient checkpoint. While those methods show promising results for static DNNs, little is known about how to online and opportunistically optimize all their m-hyperparameters, especially for retraining tasks of edge applications. In this paper, we propose, MPOptimizer, which jointly optimizes an ensemble of m-hyperparameters according to the input distribution and available edge resources at runtime. The key feature of MPOptimizer is to easily emulate the execution of retraining tasks under different m-hyperparameters and thus effectively estimate their influence on task performance. We implement MPOptimizer on prevalent DNNs and demonstrate its effectiveness against state-of-the-art techniques, i.e. successfully find the best configuration that improves model accuracy by an average of 13% (up to 25.3%) while reducing memory and training time by 4.1x and 5.3x under the same model accuracies.
{"title":"Adaptive ensemble optimization for memory-related hyperparameters in retraining DNN at edge","authors":"Yidong Xu , Rui Han , Xiaojiang Zuo , Junyan Ouyang , Chi Harold Liu , Lydia Y. Chen","doi":"10.1016/j.future.2024.107600","DOIUrl":"10.1016/j.future.2024.107600","url":null,"abstract":"<div><div>Edge applications are increasingly empowered by deep neural networks (DNN) and face the challenges of adapting or retraining models for the changes in input data domains and learning tasks. The existing techniques to enable DNN retraining on edge devices are to configure the memory-related hyperparameters, termed <em>m</em>-hyperparameters, via batch size reduction, parameter freezing, and gradient checkpoint. While those methods show promising results for static DNNs, little is known about how to online and opportunistically optimize all their <em>m</em>-hyperparameters, especially for retraining tasks of edge applications. In this paper, we propose, MPOptimizer, which jointly optimizes an ensemble of <em>m</em>-hyperparameters according to the input distribution and available edge resources at runtime. The key feature of MPOptimizer is to easily emulate the execution of retraining tasks under different <em>m</em>-hyperparameters and thus effectively estimate their influence on task performance. We implement MPOptimizer on prevalent DNNs and demonstrate its effectiveness against state-of-the-art techniques, i.e. successfully find the best configuration that improves model accuracy by an average of 13% (up to 25.3%) while reducing memory and training time by 4.1x and 5.3x under the same model accuracies.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107600"},"PeriodicalIF":6.2,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.future.2024.107597
Hongliang Li , Zichen Wang , Hairui Zhao , Meng Zhang , Xiang Li , Haixiao Xu
Training Deep Learning (DL) models are becoming more time-consuming, thus interruptions to the training processes are inevitable. We can obtain an optimal checkpointing interval to minimize the fault tolerance overhead for a HPC (High Performance Computing) job with the precondition that the job progress is proportional to its execution time. Unfortunately, it is not the case in DL model training, where a DL training job yields diminishing returns across its lifetime. Meanwhile, training DL models is inherently exploratory, with early termination frequently occurring during model training&developing. It makes the early progress of a DL training job more valuable than the later ones. Even placement of checkpoints would either increase the risks in the early stages or waste resources overprotecting the latter stages. Moreover, in data parallelism, the state-of-the-art quality-driven scheduling strategies allocate more resources for the early stages of a job than the later ones to accelerate the training progress, which further amplifies the issue. In summary, the early stage is more important than the later stages. Allocating more fault-tolerant resources to the early stages is beneficial for the model exploration. Based on the aforementioned conclusion, we present COCI, an approach to compute optimal checkpointing configuration for a exploratory DL training job, minimizing the fault tolerance overhead, including checkpoint cost and recovery cost. We implement COCI based on state-of-the-art iteration-level checkpointing mechanism, as a pluggable module compatible with PyTorch without extra user input. The experimental results show that COCI reduces up to 40.18% fault tolerance overhead compared to existing state-of-the-art DL fault tolerance methods in serial scenario, 60.64% in data parallel scenario.
{"title":"Convergence-aware optimal checkpointing for exploratory deep learning training jobs","authors":"Hongliang Li , Zichen Wang , Hairui Zhao , Meng Zhang , Xiang Li , Haixiao Xu","doi":"10.1016/j.future.2024.107597","DOIUrl":"10.1016/j.future.2024.107597","url":null,"abstract":"<div><div>Training Deep Learning (DL) models are becoming more time-consuming, thus interruptions to the training processes are inevitable. We can obtain an optimal checkpointing interval to minimize the fault tolerance overhead for a HPC (High Performance Computing) job with the precondition that the job progress is proportional to its execution time. Unfortunately, it is not the case in DL model training, where a DL training job yields diminishing returns across its lifetime. Meanwhile, training DL models is inherently exploratory, with early termination frequently occurring during model training&developing. It makes the early progress of a DL training job more valuable than the later ones. Even placement of checkpoints would either increase the risks in the early stages or waste resources overprotecting the latter stages. Moreover, in data parallelism, the state-of-the-art quality-driven scheduling strategies allocate more resources for the early stages of a job than the later ones to accelerate the training progress, which further amplifies the issue. In summary, the early stage is more important than the later stages. Allocating more fault-tolerant resources to the early stages is beneficial for the model exploration. Based on the aforementioned conclusion, we present COCI, an approach to compute optimal checkpointing configuration for a exploratory DL training job, minimizing the fault tolerance overhead, including checkpoint cost and recovery cost. We implement COCI based on state-of-the-art iteration-level checkpointing mechanism, as a pluggable module compatible with PyTorch without extra user input. The experimental results show that COCI reduces up to 40.18% fault tolerance overhead compared to existing state-of-the-art DL fault tolerance methods in serial scenario, 60.64% in data parallel scenario.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107597"},"PeriodicalIF":6.2,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-08DOI: 10.1016/j.future.2024.107588
Lingqiang Liu , Ying Zhang
In recent years, global maritime activities have surged, yet maritime networks face significant limitations in capacity. To address this challenge, integrating mobile edge computing into maritime networks has emerged as a solution, enabling the offloading of computation-intensive tasks to the edge to enhance system performance. However, existing research often narrowly focuses on either system cost or Quality of Service (QoS), failing to optimize both concurrently. This study aims to bridge this research gap by proposing a novel approach that optimizes both system cost and QoS simultaneously through collaborative computing among terminals, edge servers, and a cloud server in a maritime network environment. We leverage the Improved Coati Optimization Algorithm (ICOA) to optimize transmission power for vessel users, and subsequently, we apply Binary Particle Swarm Optimization (BPSO) to make task offloading decisions that consider both system cost and QoS. Experimental results demonstrate that our proposed approach significantly outperforms existing benchmark algorithms in balancing system cost and QoS in cloud-edge-end collaborative scenarios.
{"title":"Task Offloading Optimization for Multi-objective Based on Cloud-Edge-End Collaboration in Maritime Networks","authors":"Lingqiang Liu , Ying Zhang","doi":"10.1016/j.future.2024.107588","DOIUrl":"10.1016/j.future.2024.107588","url":null,"abstract":"<div><div>In recent years, global maritime activities have surged, yet maritime networks face significant limitations in capacity. To address this challenge, integrating mobile edge computing into maritime networks has emerged as a solution, enabling the offloading of computation-intensive tasks to the edge to enhance system performance. However, existing research often narrowly focuses on either system cost or Quality of Service (QoS), failing to optimize both concurrently. This study aims to bridge this research gap by proposing a novel approach that optimizes both system cost and QoS simultaneously through collaborative computing among terminals, edge servers, and a cloud server in a maritime network environment. We leverage the Improved Coati Optimization Algorithm (ICOA) to optimize transmission power for vessel users, and subsequently, we apply Binary Particle Swarm Optimization (BPSO) to make task offloading decisions that consider both system cost and QoS. Experimental results demonstrate that our proposed approach significantly outperforms existing benchmark algorithms in balancing system cost and QoS in cloud-edge-end collaborative scenarios.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107588"},"PeriodicalIF":6.2,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142699723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-07DOI: 10.1016/j.future.2024.107595
Peng Zhao , Shaocong Guo , Yanan Li , Shusen Yang , Xuebin Ren
Federated learning has emerged as a prominent solution for the collaborative training of machine learning models without exchanging local data. However, existing approaches often impose rigid constraints on model heterogeneity, limiting the ability of clients to customize unique models and increasing the vulnerability of models to potential attacks. This paper presents FedGen, a novel personalized federated learning framework based on generative adversarial networks (GANs). FedGen shifts the focus from training task-specific models to generating data, especially for minority classes with imbalanced data. With FedGen, clients can gain knowledge from others by training generators, while maintaining a heterogeneous local model and avoiding sharing model information with other participants. Moreover, to address challenges arising from imbalanced data, we propose AT-GAN, a novel generative model incorporating pseudo augmentation and differentiable augmentation modules to foster healthy competition between the generator and discriminator. To evaluate the effectiveness of our approach, we conduct extensive experiments on real-world tabular datasets. The experimental results demonstrate that FedGen significantly enhances the performance of local models, achieving improvements of up to 11.92% in F1 score and up to 9.14% in MCC score compared to existing methods.
{"title":"FedGen: Personalized federated learning with data generation for enhanced model customization and class imbalance","authors":"Peng Zhao , Shaocong Guo , Yanan Li , Shusen Yang , Xuebin Ren","doi":"10.1016/j.future.2024.107595","DOIUrl":"10.1016/j.future.2024.107595","url":null,"abstract":"<div><div>Federated learning has emerged as a prominent solution for the collaborative training of machine learning models without exchanging local data. However, existing approaches often impose rigid constraints on model heterogeneity, limiting the ability of clients to customize unique models and increasing the vulnerability of models to potential attacks. This paper presents FedGen, a novel personalized federated learning framework based on generative adversarial networks (GANs). FedGen shifts the focus from training task-specific models to generating data, especially for minority classes with imbalanced data. With FedGen, clients can gain knowledge from others by training generators, while maintaining a heterogeneous local model and avoiding sharing model information with other participants. Moreover, to address challenges arising from imbalanced data, we propose AT-GAN, a novel generative model incorporating pseudo augmentation and differentiable augmentation modules to foster healthy competition between the generator and discriminator. To evaluate the effectiveness of our approach, we conduct extensive experiments on real-world tabular datasets. The experimental results demonstrate that FedGen significantly enhances the performance of local models, achieving improvements of up to 11.92% in F1 score and up to 9.14% in MCC score compared to existing methods.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107595"},"PeriodicalIF":6.2,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.future.2024.107598
Shiqiang Nie, Tong Lei, Jie Niu, Qihan Hu, Song Liu, Weiguo Wu
The inherent out-of-place update characteristic of the Log-Structured Merge tree (LSM tree) cannot guarantee persistent deletion within a specific time window, leading to potential data privacy and security issues. Existing solutions like Lethe-Fade ensure time-constrained persistent deletion but introduce considerable write overhead, worsening the write amplification issue, particularly for key–value stores on ZNS SSD. To address this problem, we propose a zone-aware persistent deletion scheme for key–value store engines. Targeting mitigating the write amplification induced by level compaction, we design an adaptive SSTable selection strategy for each level in the LSM tree. Additionally, as the SSTable with deletion records would become invalid after the persistent deletion timer reaches its threshold, we design a tombstone-aware zone allocation strategy to reduce the data migration induced by garbage collection. In further, we optimize the victim zone selection in GC to reduce the invalid migration of tombstone files. Experimental results demonstrate that our scheme effectively ensures that most outdated physical versions are deleted before reaching the persistent deletion time threshold. When deleting 10% of keys in the key–value store engine, this scheme reduces write amplification by 74.7% and the garbage collection-induced write by 87.3% compared to the Lethe-Fade scheme.
{"title":"Time-constrained persistent deletion for key–value store engine on ZNS SSD","authors":"Shiqiang Nie, Tong Lei, Jie Niu, Qihan Hu, Song Liu, Weiguo Wu","doi":"10.1016/j.future.2024.107598","DOIUrl":"10.1016/j.future.2024.107598","url":null,"abstract":"<div><div>The inherent out-of-place update characteristic of the Log-Structured Merge tree (LSM tree) cannot guarantee persistent deletion within a specific time window, leading to potential data privacy and security issues. Existing solutions like Lethe-Fade ensure time-constrained persistent deletion but introduce considerable write overhead, worsening the write amplification issue, particularly for key–value stores on ZNS SSD. To address this problem, we propose a zone-aware persistent deletion scheme for key–value store engines. Targeting mitigating the write amplification induced by level compaction, we design an adaptive SSTable selection strategy for each level in the LSM tree. Additionally, as the SSTable with deletion records would become invalid after the persistent deletion timer reaches its threshold, we design a tombstone-aware zone allocation strategy to reduce the data migration induced by garbage collection. In further, we optimize the victim zone selection in GC to reduce the invalid migration of tombstone files. Experimental results demonstrate that our scheme effectively ensures that most outdated physical versions are deleted before reaching the persistent deletion time threshold. When deleting 10% of keys in the key–value store engine, this scheme reduces write amplification by 74.7% and the garbage collection-induced write by 87.3% compared to the Lethe-Fade scheme.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107598"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.future.2024.107589
Hui Wang , Haiyang Li , Zihao Shen , Peiqian Liu
The popularity of location-based services facilitates people’s lives to a certain extent and generates a large amount of trajectory data. Analyzing these data can contribute to society’s development and provide better location services for users, but it also faces the security problem of personal trajectory privacy leakage. However, existing methods often suffer from either excessive privacy protection or insufficient protection of individual privacy. Therefore, this paper proposes a personalized trajectory data publishing scheme combining road network constraints and GAN (RNC-DP). Firstly, after grid-representing the trajectory data, we remove the unreachable grids and define a trajectory generation constraint. Second, the proposed TraGM model synthesizes the trajectory data to meet the constraints. Again, during the trajectory data publishing process, the proposed TraDP mechanism performs k-means clustering on the synthesized trajectories and assigns appropriate privacy budgets to the clustered generalized trajectory location points. Finally, the protected trajectory data is published. Compared with the existing schemes, the proposed scheme improves privacy protection strength by 10.2%–41.2% while balancing data availability and has low time complexity.
基于位置的服务的普及在一定程度上方便了人们的生活,同时也产生了大量的轨迹数据。分析这些数据可以促进社会发展,为用户提供更好的位置服务,但同时也面临着个人轨迹隐私泄露的安全问题。然而,现有的方法往往存在隐私保护过度或个人隐私保护不足的问题。因此,本文提出了一种结合路网约束和 GAN 的个性化轨迹数据发布方案(RNC-DP)。首先,在对轨迹数据进行网格化表示后,我们删除了无法到达的网格,并定义了轨迹生成约束。其次,建议的 TraGM 模型合成轨迹数据以满足约束条件。再次,在轨迹数据发布过程中,建议的 TraDP 机制会对合成轨迹进行 k-means 聚类,并为聚类后的广义轨迹位置点分配适当的隐私预算。最后,发布受保护的轨迹数据。与现有方案相比,拟议方案在平衡数据可用性的同时,将隐私保护强度提高了 10.2%-41.2%,而且时间复杂度较低。
{"title":"RNC-DP: A personalized trajectory data publishing scheme combining road network constraints and GAN","authors":"Hui Wang , Haiyang Li , Zihao Shen , Peiqian Liu","doi":"10.1016/j.future.2024.107589","DOIUrl":"10.1016/j.future.2024.107589","url":null,"abstract":"<div><div>The popularity of location-based services facilitates people’s lives to a certain extent and generates a large amount of trajectory data. Analyzing these data can contribute to society’s development and provide better location services for users, but it also faces the security problem of personal trajectory privacy leakage. However, existing methods often suffer from either excessive privacy protection or insufficient protection of individual privacy. Therefore, this paper proposes a personalized trajectory data publishing scheme combining road network constraints and GAN (RNC-DP). Firstly, after grid-representing the trajectory data, we remove the unreachable grids and define a trajectory generation constraint. Second, the proposed TraGM model synthesizes the trajectory data to meet the constraints. Again, during the trajectory data publishing process, the proposed TraDP mechanism performs k-means clustering on the synthesized trajectories and assigns appropriate privacy budgets to the clustered generalized trajectory location points. Finally, the protected trajectory data is published. Compared with the existing schemes, the proposed scheme improves privacy protection strength by 10.2%–41.2% while balancing data availability and has low time complexity.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107589"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-06DOI: 10.1016/j.future.2024.107596
Shihui Zhang , Kun Chen , Gangzheng Zhai , He Li , Shaojie Han
The cross-modal crowd counting method demonstrates better scene adaptability under complex conditions by introducing independent supplementary information. However, existing methods still face problems such as insufficient fusion of modal features, underutilization of crowd structure, and the neglect of scale information. In response to the above issues, this paper proposes a cross-modal multi-scale perception network (CMPNet). Specifically, CMPNet mainly consists of a cross-modal perception fusion module and a multi-scale feature aggregation module. The cross-modal perception fusion module effectively suppresses noise features while sharing features between different modalities, thereby significantly improving the robustness of the crowd counting process. The multi-scale feature aggregation module obtains rich crowd structure information through a spatial context aware graph convolution unit, and then integrates feature information from different scales to enhance the network’s perception ability of crowd density. To the best of our knowledge, CMPNet is the first attempt to model the crowd structure and mine its semantics in the field of cross-modal crowd counting. The experimental results show that CMPNet achieves state-of-the-art performance on all RGB-T datasets, providing an effective solution for cross-modal crowd counting. We will release the code at https://github.com/KunChenKKK/CMPNet.
{"title":"CMPNet: A cross-modal multi-scale perception network for RGB-T crowd counting","authors":"Shihui Zhang , Kun Chen , Gangzheng Zhai , He Li , Shaojie Han","doi":"10.1016/j.future.2024.107596","DOIUrl":"10.1016/j.future.2024.107596","url":null,"abstract":"<div><div>The cross-modal crowd counting method demonstrates better scene adaptability under complex conditions by introducing independent supplementary information. However, existing methods still face problems such as insufficient fusion of modal features, underutilization of crowd structure, and the neglect of scale information. In response to the above issues, this paper proposes a cross-modal multi-scale perception network (CMPNet). Specifically, CMPNet mainly consists of a cross-modal perception fusion module and a multi-scale feature aggregation module. The cross-modal perception fusion module effectively suppresses noise features while sharing features between different modalities, thereby significantly improving the robustness of the crowd counting process. The multi-scale feature aggregation module obtains rich crowd structure information through a spatial context aware graph convolution unit, and then integrates feature information from different scales to enhance the network’s perception ability of crowd density. To the best of our knowledge, CMPNet is the first attempt to model the crowd structure and mine its semantics in the field of cross-modal crowd counting. The experimental results show that CMPNet achieves state-of-the-art performance on all RGB-T datasets, providing an effective solution for cross-modal crowd counting. We will release the code at <span><span>https://github.com/KunChenKKK/CMPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107596"},"PeriodicalIF":6.2,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1016/j.future.2024.107586
Siyuan Shang , Xuehui Du , Xiaohan Wang, Aodi Liu
Blockchain manages data with immutability, decentralization and traceability, offering new solutions for traditional information systems and greatly facilitating data sharing. However, on-chain data query still faces challenges such as low efficiency and difficulty in privacy protection. We propose a private Approximate Nearest Neighbor (ANN) search method for on-chain data based on Locality-Sensitive Hashing (LSH), which mainly includes two steps: query initialization and query implementation. In query initialization, the data management node builds hash tables for on-chain data through improved LSH, which are encrypted and stored on the blockchain using attribute-based encryption. In query implementation, node with correct privileges utilizes random smart contracts to query on-chain data privately by distributed point function and a privacy protection technique called oblivious masking. To validate the effectiveness of this method, we compare the performance with two ANN search algorithms, the query time is reduced by 57% and 59.2%, the average recall is increased by 4.5% and 2%, the average precision is increased by 7.7% and 6.9%, the average F1-score is increased by 6% and 4.3%, the average initialization time is reduced by 34 times and 122 times, respectively. We also compare the performance with private ANN search methods using homomorphic encryption, differential privacy and secure multi-party computation. The results show that our method can reduce the query time by several orders of magnitude, which is more applicable to the blockchain environment. To the best of our knowledge, this is the first private ANN search method for on-chain data, which consider the query efficiency and privacy protection, achieving efficient, accurate, and private data query.
区块链管理数据具有不可篡改性、去中心化和可追溯性,为传统信息系统提供了新的解决方案,极大地促进了数据共享。然而,链上数据查询仍面临效率低、隐私保护难等挑战。我们提出了一种基于位置敏感散列(LSH)的链上数据私有近似近邻(ANN)搜索方法,主要包括查询初始化和查询实现两个步骤。在查询初始化中,数据管理节点通过改进的 LSH 为链上数据建立哈希表,并使用基于属性的加密技术将哈希表加密后存储在区块链上。在查询执行过程中,拥有正确权限的节点利用随机智能合约,通过分布式点函数和一种称为遗忘掩码的隐私保护技术,私下查询链上数据。为了验证这种方法的有效性,我们将其与两种 ANN 搜索算法进行了性能比较,结果显示,查询时间分别缩短了 57% 和 59.2%,平均召回率分别提高了 4.5% 和 2%,平均精度分别提高了 7.7% 和 6.9%,平均 F1 分数分别提高了 6% 和 4.3%,平均初始化时间分别缩短了 34 倍和 122 倍。我们还比较了使用同态加密、差分隐私和安全多方计算的私有 ANN 搜索方法的性能。结果表明,我们的方法可以将查询时间缩短几个数量级,更适用于区块链环境。据我们所知,这是第一种考虑查询效率和隐私保护的链上数据私有 ANN 搜索方法,实现了高效、准确和私有的数据查询。
{"title":"Private approximate nearest neighbor search for on-chain data based on locality-sensitive hashing","authors":"Siyuan Shang , Xuehui Du , Xiaohan Wang, Aodi Liu","doi":"10.1016/j.future.2024.107586","DOIUrl":"10.1016/j.future.2024.107586","url":null,"abstract":"<div><div>Blockchain manages data with immutability, decentralization and traceability, offering new solutions for traditional information systems and greatly facilitating data sharing. However, on-chain data query still faces challenges such as low efficiency and difficulty in privacy protection. We propose a private Approximate Nearest Neighbor (ANN) search method for on-chain data based on Locality-Sensitive Hashing (LSH), which mainly includes two steps: query initialization and query implementation. In query initialization, the data management node builds hash tables for on-chain data through improved LSH, which are encrypted and stored on the blockchain using attribute-based encryption. In query implementation, node with correct privileges utilizes random smart contracts to query on-chain data privately by distributed point function and a privacy protection technique called oblivious masking. To validate the effectiveness of this method, we compare the performance with two ANN search algorithms, the query time is reduced by 57% and 59.2%, the average recall is increased by 4.5% and 2%, the average precision is increased by 7.7% and 6.9%, the average F1-score is increased by 6% and 4.3%, the average initialization time is reduced by 34 times and 122 times, respectively. We also compare the performance with private ANN search methods using homomorphic encryption, differential privacy and secure multi-party computation. The results show that our method can reduce the query time by several orders of magnitude, which is more applicable to the blockchain environment. To the best of our knowledge, this is the first private ANN search method for on-chain data, which consider the query efficiency and privacy protection, achieving efficient, accurate, and private data query.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107586"},"PeriodicalIF":6.2,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-05DOI: 10.1016/j.future.2024.107582
Minseok Song, Mingoo Kwon
As video streaming services such as Netflix become popular, resolving the problem of high power consumption arising from both large data size and high bandwidth in video storage systems has become important. However, because various factors, such as the power characteristics of heterogeneous storage devices, variable workloads, and disk array models, influence storage power consumption, reducing power consumption with deterministic policies is ineffective. To address this, we present a new deep reinforcement learning (DRL)-based file placement algorithm for replication-based video storage systems, which aims to minimize overall storage power consumption. We first model the video storage system with time-varying streaming workloads as the DRL environment, in which the agent aims to find power-efficient file placement. We then propose a proximal policy optimization (PPO) algorithm, consisting of (1) an action space that determines the placement of each file; (2) an observation space that allows the agent to learn a power-efficient placement based on the current I/O bandwidth utilization; (3) a reward model that assigns a greater penalty for increased power consumption for each action; and (4) an action masking model that supports effective learning by preventing agents from selecting unnecessary actions. Extensive simulations were performed to evaluate the proposed scheme under various solid-state disk (SSD) models and replication configurations. Results show that our scheme reduces storage power consumption by 5% to 25.8% (average 12%) compared to existing benchmark methods known to be effective for file placement.
{"title":"Using Deep Reinforcement Learning (DRL) for minimizing power consumption in Video-on-Demand (VoD) storage systems","authors":"Minseok Song, Mingoo Kwon","doi":"10.1016/j.future.2024.107582","DOIUrl":"10.1016/j.future.2024.107582","url":null,"abstract":"<div><div>As video streaming services such as Netflix become popular, resolving the problem of high power consumption arising from both large data size and high bandwidth in video storage systems has become important. However, because various factors, such as the power characteristics of heterogeneous storage devices, variable workloads, and disk array models, influence storage power consumption, reducing power consumption with deterministic policies is ineffective. To address this, we present a new deep reinforcement learning (DRL)-based file placement algorithm for replication-based video storage systems, which aims to minimize overall storage power consumption. We first model the video storage system with time-varying streaming workloads as the DRL environment, in which the agent aims to find power-efficient file placement. We then propose a proximal policy optimization (PPO) algorithm, consisting of (1) an action space that determines the placement of each file; (2) an observation space that allows the agent to learn a power-efficient placement based on the current I/O bandwidth utilization; (3) a reward model that assigns a greater penalty for increased power consumption for each action; and (4) an action masking model that supports effective learning by preventing agents from selecting unnecessary actions. Extensive simulations were performed to evaluate the proposed scheme under various solid-state disk (SSD) models and replication configurations. Results show that our scheme reduces storage power consumption by 5% to 25.8% (average 12%) compared to existing benchmark methods known to be effective for file placement.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107582"},"PeriodicalIF":6.2,"publicationDate":"2024-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}