首页 > 最新文献

Future Generation Computer Systems-The International Journal of Escience最新文献

英文 中文
The Fast Inertial ADMM optimization framework for distributed machine learning 分布式机器学习的快速惯性 ADMM 优化框架
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-28 DOI: 10.1016/j.future.2024.107575
Guozheng Wang , Dongxia Wang , Chengfan Li , Yongmei Lei
The ADMM (Alternating Direction Method of Multipliers) optimization framework is known for its property of decomposition and assembly, which effectively bridges distributed computing and optimization algorithms, making it well-suited for distributed machine learning in the context of big data. However, it suffers from slow convergence speed and lacks the ability to coordinate worker computations, resulting in inconsistent speeds in solving subproblems in distributed systems and mutual waiting among workers. In this paper, we propose a novel optimization framework to address these challenges in support vector regression (SVR) and probit regression training through the FIADMM (Fast Inertial ADMM). The key concept of the FIADMM lies in the introduction of inertia acceleration and an adaptive subproblem iteration mechanism based on the ADMM, aimed at accelerating convergence speed and reducing the variance in solving speeds among workers. Further, we prove that FIADMM has a fast linear convergence rate O(1/k). Experimental results on six benchmark datasets demonstrate that the proposed FIADMM significantly enhances convergence speed and computational efficiency compared to multiple baseline algorithms and related efforts.
ADMM(交替乘法)优化框架以其分解和组装特性而著称,它有效地连接了分布式计算和优化算法,非常适合大数据背景下的分布式机器学习。然而,它存在收敛速度慢、缺乏协调工作者计算的能力等问题,导致在分布式系统中解决子问题的速度不一致,以及工作者之间的相互等待。在本文中,我们提出了一个新颖的优化框架,通过 FIADMM(快速惯性 ADMM)来解决支持向量回归(SVR)和 probit 回归训练中的这些难题。FIADMM 的关键概念在于引入惯性加速和基于 ADMM 的自适应子问题迭代机制,旨在加快收敛速度并减少工作者之间求解速度的差异。此外,我们还证明了 FIADMM 具有快速线性收敛率 O(1/k)。在六个基准数据集上的实验结果表明,与多种基准算法和相关努力相比,所提出的 FIADMM 显著提高了收敛速度和计算效率。
{"title":"The Fast Inertial ADMM optimization framework for distributed machine learning","authors":"Guozheng Wang ,&nbsp;Dongxia Wang ,&nbsp;Chengfan Li ,&nbsp;Yongmei Lei","doi":"10.1016/j.future.2024.107575","DOIUrl":"10.1016/j.future.2024.107575","url":null,"abstract":"<div><div>The ADMM (Alternating Direction Method of Multipliers) optimization framework is known for its property of decomposition and assembly, which effectively bridges distributed computing and optimization algorithms, making it well-suited for distributed machine learning in the context of big data. However, it suffers from slow convergence speed and lacks the ability to coordinate worker computations, resulting in inconsistent speeds in solving subproblems in distributed systems and mutual waiting among workers. In this paper, we propose a novel optimization framework to address these challenges in support vector regression (SVR) and probit regression training through the FIADMM (<strong>F</strong>ast <strong>I</strong>nertial ADMM). The key concept of the FIADMM lies in the introduction of inertia acceleration and an adaptive subproblem iteration mechanism based on the ADMM, aimed at accelerating convergence speed and reducing the variance in solving speeds among workers. Further, we prove that FIADMM has a fast linear convergence rate <span><math><mrow><mi>O</mi><mrow><mo>(</mo><mn>1</mn><mo>/</mo><mi>k</mi><mo>)</mo></mrow></mrow></math></span>. Experimental results on six benchmark datasets demonstrate that the proposed FIADMM significantly enhances convergence speed and computational efficiency compared to multiple baseline algorithms and related efforts.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107575"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Review of deep learning-based pathological image classification: From task-specific models to foundation models 基于深度学习的病理图像分类回顾:从特定任务模型到基础模型
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-28 DOI: 10.1016/j.future.2024.107578
Haijing Luan , Kaixing Yang , Taiyuan Hu , Jifang Hu , Siyao Liu , Ruilin Li , Jiayin He , Rui Yan , Xiaobing Guo , Niansong Qian , Beifang Niu
Pathological diagnosis is considered the gold standard in cancer diagnosis, playing a crucial role in guiding treatment decisions and prognosis assessment for patients. However, achieving accurate diagnosis of pathology images poses several challenges, including the scarcity of pathologists and the inherent subjective variability in their interpretations. The advancements in whole-slide imaging technology and deep learning methods provide new opportunities for digital pathology, especially in low-resource settings, by enabling effective pathological image classification. In this article, we begin by introducing the datasets, which include both unimodal and multimodal types, as essential resources for advancing pathological image classification. We then provide a comprehensive overview of deep learning-based pathological image classification models, covering task-specific models such as supervised, unsupervised, weakly supervised, and semi-supervised learning methods, as well as unimodal and multimodal foundation models. Next, we review tumor-related indicators that can be predicted from pathological images, focusing on two main categories: indicators that can be recognized by pathologists, such as tumor classification, grading, and region recognition; and those that cannot be recognized by pathologists, including molecular subtype prediction, tumor origin prediction, biomarker prediction, and survival prediction. Finally, we summarize the key challenges in digital pathology and propose potential future directions.
病理诊断被认为是癌症诊断的黄金标准,在指导治疗决策和评估患者预后方面起着至关重要的作用。然而,实现病理图像的准确诊断面临着一些挑战,其中包括病理学家的稀缺性以及他们在解释时固有的主观差异性。全切片成像技术和深度学习方法的进步通过实现有效的病理图像分类,为数字病理学提供了新的机遇,尤其是在资源匮乏的环境中。在本文中,我们首先介绍了数据集,其中包括单模态和多模态类型,它们是推进病理图像分类的重要资源。然后,我们全面概述了基于深度学习的病理图像分类模型,涵盖了特定任务模型,如监督、无监督、弱监督和半监督学习方法,以及单模态和多模态基础模型。接下来,我们回顾了可从病理图像预测的肿瘤相关指标,重点关注两大类:病理学家可识别的指标,如肿瘤分类、分级和区域识别;病理学家无法识别的指标,包括分子亚型预测、肿瘤起源预测、生物标记物预测和生存预测。最后,我们总结了数字病理学面临的主要挑战,并提出了潜在的未来发展方向。
{"title":"Review of deep learning-based pathological image classification: From task-specific models to foundation models","authors":"Haijing Luan ,&nbsp;Kaixing Yang ,&nbsp;Taiyuan Hu ,&nbsp;Jifang Hu ,&nbsp;Siyao Liu ,&nbsp;Ruilin Li ,&nbsp;Jiayin He ,&nbsp;Rui Yan ,&nbsp;Xiaobing Guo ,&nbsp;Niansong Qian ,&nbsp;Beifang Niu","doi":"10.1016/j.future.2024.107578","DOIUrl":"10.1016/j.future.2024.107578","url":null,"abstract":"<div><div>Pathological diagnosis is considered the gold standard in cancer diagnosis, playing a crucial role in guiding treatment decisions and prognosis assessment for patients. However, achieving accurate diagnosis of pathology images poses several challenges, including the scarcity of pathologists and the inherent subjective variability in their interpretations. The advancements in whole-slide imaging technology and deep learning methods provide new opportunities for digital pathology, especially in low-resource settings, by enabling effective pathological image classification. In this article, we begin by introducing the datasets, which include both unimodal and multimodal types, as essential resources for advancing pathological image classification. We then provide a comprehensive overview of deep learning-based pathological image classification models, covering task-specific models such as supervised, unsupervised, weakly supervised, and semi-supervised learning methods, as well as unimodal and multimodal foundation models. Next, we review tumor-related indicators that can be predicted from pathological images, focusing on two main categories: indicators that can be recognized by pathologists, such as tumor classification, grading, and region recognition; and those that cannot be recognized by pathologists, including molecular subtype prediction, tumor origin prediction, biomarker prediction, and survival prediction. Finally, we summarize the key challenges in digital pathology and propose potential future directions.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107578"},"PeriodicalIF":6.2,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning protein language contrastive models with multi-knowledge representation 利用多知识表示学习蛋白质语言对比模型
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-25 DOI: 10.1016/j.future.2024.107580
Wenjun Xu , Yingchun Xia , Bifan Sun , Zihao Zhao , Lianggui Tang , Xiaobo Zhou , Qingyong Wang , Lichuan Gu
Protein representation learning plays a crucial role in obtaining a comprehensive understanding of biological regulatory mechanisms and in developing proteins and drugs for therapeutic purposes. However, labeled proteins, such as sequenced and functionally annotated data, are incomplete and few. Thus, contrastive learning has emerged as the preferred technique for learning meaningful representations from unlabeled data samples. In addition, at present, natural proteins cannot be fully described by extracting protein knowledge from a single domain. Therefore, Pro-CoRL, a protein contrastive models framework based on multi-knowledge representation learning, was proposed in this study. In particular, Pro-CoRL smooths the objective function using convex approximation, thereby improving the stability of training. Extensive experiments on predicting protein–protein interaction types and clustering protein families have confirmed the high accuracy and robustness of Pro-CoRL.
蛋白质表征学习在全面了解生物调控机制以及开发用于治疗目的的蛋白质和药物方面发挥着至关重要的作用。然而,有标记的蛋白质,如测序和功能注释数据,既不完整也很少。因此,对比学习已成为从无标记数据样本中学习有意义表征的首选技术。此外,目前从单一领域提取蛋白质知识并不能完全描述天然蛋白质。因此,本研究提出了基于多知识表征学习的蛋白质对比模型框架 Pro-CoRL。其中,Pro-CoRL 利用凸近似平滑目标函数,从而提高了训练的稳定性。在预测蛋白质-蛋白质相互作用类型和聚类蛋白质家族方面的大量实验证实了 Pro-CoRL 的高准确性和鲁棒性。
{"title":"Learning protein language contrastive models with multi-knowledge representation","authors":"Wenjun Xu ,&nbsp;Yingchun Xia ,&nbsp;Bifan Sun ,&nbsp;Zihao Zhao ,&nbsp;Lianggui Tang ,&nbsp;Xiaobo Zhou ,&nbsp;Qingyong Wang ,&nbsp;Lichuan Gu","doi":"10.1016/j.future.2024.107580","DOIUrl":"10.1016/j.future.2024.107580","url":null,"abstract":"<div><div>Protein representation learning plays a crucial role in obtaining a comprehensive understanding of biological regulatory mechanisms and in developing proteins and drugs for therapeutic purposes. However, labeled proteins, such as sequenced and functionally annotated data, are incomplete and few. Thus, contrastive learning has emerged as the preferred technique for learning meaningful representations from unlabeled data samples. In addition, at present, natural proteins cannot be fully described by extracting protein knowledge from a single domain. Therefore, Pro-CoRL, a <u>pro</u>tein <u>co</u>ntrastive models framework based on multi-knowledge <u>r</u>epresentation <u>l</u>earning, was proposed in this study. In particular, Pro-CoRL smooths the objective function using convex approximation, thereby improving the stability of training. Extensive experiments on predicting protein–protein interaction types and clustering protein families have confirmed the high accuracy and robustness of Pro-CoRL.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107580"},"PeriodicalIF":6.2,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-round decentralized dataset distillation with federated learning for Low Earth Orbit satellite communication 利用联合学习为低地球轨道卫星通信提供多轮分散式数据集提炼服务
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1016/j.future.2024.107570
Jianlong Xu, Mengqing Jin, Jinze Xiao, Dianming Lin, Yuelong Liu
Satellite communication and Low Earth Orbit (LEO) satellites are important components of the 6G network, widely used for Earth observation tasks due to their low cost and short return period, making them a key technology for 6G network connectivity. Due to limitations in satellite system technology and downlink bandwidth, it is not feasible to download all high-resolution image information to ground stations. Even in existing federated learning (FL) methods, sharing well-trained parts of the model can still bottleneck with increasing model size. To address these challenges, we propose a new federated learning framework (FL-M3D) for LEO satellite communication that employs multi-round decentralized dataset distillation techniques. It allows satellites to independently extract local datasets and transmit them to ground stations instead of exchanging model parameters. Communication costs depend only on the size of the synthesized dataset and do not increase with larger models. However, the heterogeneity of satellite datasets can lead to sample ambiguity and decreased model convergence speed. Therefore, we propose distilling the datasets to mitigate the negative effects of data heterogeneity. Through experiments using real-world image datasets, FL-M3D reduces communication volume in simulated satellite networks by approximately 49.84% and achieves improved model performance.
卫星通信和低地球轨道(LEO)卫星是 6G 网络的重要组成部分,因其成本低、返回周期短而被广泛用于地球观测任务,成为 6G 网络连接的关键技术。由于卫星系统技术和下行带宽的限制,将所有高分辨率图像信息下载到地面站是不可行的。即使在现有的联合学习(FL)方法中,共享模型中训练有素的部分也会随着模型规模的增大而出现瓶颈。为了应对这些挑战,我们为低地轨道卫星通信提出了一种新的联合学习框架(FL-M3D),它采用了多轮分散数据集提炼技术。它允许卫星独立提取本地数据集并将其传输到地面站,而不是交换模型参数。通信成本仅取决于合成数据集的大小,不会随着模型的增大而增加。然而,卫星数据集的异质性会导致样本模糊和模型收敛速度下降。因此,我们建议对数据集进行提炼,以减轻数据异质性的负面影响。通过使用真实世界图像数据集进行实验,FL-M3D 将模拟卫星网络中的通信量减少了约 49.84%,并提高了模型性能。
{"title":"Multi-round decentralized dataset distillation with federated learning for Low Earth Orbit satellite communication","authors":"Jianlong Xu,&nbsp;Mengqing Jin,&nbsp;Jinze Xiao,&nbsp;Dianming Lin,&nbsp;Yuelong Liu","doi":"10.1016/j.future.2024.107570","DOIUrl":"10.1016/j.future.2024.107570","url":null,"abstract":"<div><div>Satellite communication and Low Earth Orbit (LEO) satellites are important components of the 6G network, widely used for Earth observation tasks due to their low cost and short return period, making them a key technology for 6G network connectivity. Due to limitations in satellite system technology and downlink bandwidth, it is not feasible to download all high-resolution image information to ground stations. Even in existing federated learning (FL) methods, sharing well-trained parts of the model can still bottleneck with increasing model size. To address these challenges, we propose a new federated learning framework (FL-M3D) for LEO satellite communication that employs multi-round decentralized dataset distillation techniques. It allows satellites to independently extract local datasets and transmit them to ground stations instead of exchanging model parameters. Communication costs depend only on the size of the synthesized dataset and do not increase with larger models. However, the heterogeneity of satellite datasets can lead to sample ambiguity and decreased model convergence speed. Therefore, we propose distilling the datasets to mitigate the negative effects of data heterogeneity. Through experiments using real-world image datasets, FL-M3D reduces communication volume in simulated satellite networks by approximately 49.84% and achieves improved model performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107570"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142573080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cloud-based solution for urbanization monitoring using satellite images 利用卫星图像监测城市化进程的云解决方案
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1016/j.future.2024.107579
Ion-Dorinel Filip , Cristian Cune , Florin Pop
Motivated by the large amount of available satellite data and increasing interest in the study of urbanization, this paper presents a way for better supervision of urbanization, as more and more people are looking to increase their quality of life by migrating to urban areas. This project is particularly useful for environmental researchers or citizens who are looking to make informed decisions. This project utilizes Sentinel Hub, a multi-spectral satellite imagery cloud service, to access Sentinel 2 data to detect changes in Romania’s urban environment automatically. Sentinel Hub’s spectral bands, which describe the reflectance properties of a surface, are used to compute spectral indices that highlight patterns in satellite images. The paper analyzes two urban indices that successfully map build-up regions and a vegetation index that assesses the degree of vegetation in an urbanized area. It employs different methods to enhance each index and evaluates its performance in a town that has seen rapid urban expansion.
随着越来越多的人希望通过迁移到城市地区来提高生活质量,大量可用的卫星数据以及人们对城市化研究日益浓厚的兴趣促使本文提出了一种更好地监督城市化进程的方法。该项目对环境研究人员或希望做出明智决策的市民特别有用。该项目利用多光谱卫星图像云服务 Sentinel Hub 访问 Sentinel 2 数据,自动检测罗马尼亚城市环境的变化。Sentinel Hub 的光谱波段描述了表面的反射特性,可用于计算光谱指数,从而突出卫星图像中的模式。本文分析了成功绘制建筑密集区地图的两个城市指数和评估城市化地区植被程度的植被指数。它采用不同的方法来增强每种指数,并在一个城市快速扩张的城镇中对其性能进行了评估。
{"title":"Cloud-based solution for urbanization monitoring using satellite images","authors":"Ion-Dorinel Filip ,&nbsp;Cristian Cune ,&nbsp;Florin Pop","doi":"10.1016/j.future.2024.107579","DOIUrl":"10.1016/j.future.2024.107579","url":null,"abstract":"<div><div>Motivated by the large amount of available satellite data and increasing interest in the study of urbanization, this paper presents a way for better supervision of urbanization, as more and more people are looking to increase their quality of life by migrating to urban areas. This project is particularly useful for environmental researchers or citizens who are looking to make informed decisions. This project utilizes Sentinel Hub, a multi-spectral satellite imagery cloud service, to access Sentinel 2 data to detect changes in Romania’s urban environment automatically. Sentinel Hub’s spectral bands, which describe the reflectance properties of a surface, are used to compute spectral indices that highlight patterns in satellite images. The paper analyzes two urban indices that successfully map build-up regions and a vegetation index that assesses the degree of vegetation in an urbanized area. It employs different methods to enhance each index and evaluates its performance in a town that has seen rapid urban expansion.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107579"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572978","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CBWO: A Novel Multi-objective Load Balancing Technique for Cloud Computing CBWO:一种新颖的云计算多目标负载平衡技术
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1016/j.future.2024.107561
Vahideh Hayyolalam, Öznur Özkasap
In cloud computing systems, the growing demand for diverse applications has led to challenges in resource allocation and workload distribution, resulting in increased energy consumption and computational costs. To address these challenges, we propose a novel load-balancing method, namely CBWO, that integrates Chaos theory with the Black Widow Optimization algorithm. Our approach is designed to optimize cloud computing environments by improving energy efficiency and resource utilization. We employ CloudSim for simulations, evaluating key performance metrics such as energy consumption, resource utilization, makespan, task completion time, and imbalance degree. The experimental results demonstrate the superiority of our method, achieving average improvements of 67.28% in makespan and 29.03% in energy consumption compared to existing solutions.
在云计算系统中,多样化应用的需求不断增长,导致资源分配和工作负载分配面临挑战,从而增加了能源消耗和计算成本。为了应对这些挑战,我们提出了一种新颖的负载平衡方法,即 CBWO,它将混沌理论与黑寡妇优化算法相结合。我们的方法旨在通过提高能源效率和资源利用率来优化云计算环境。我们采用 CloudSim 进行仿真,评估能源消耗、资源利用率、时间跨度、任务完成时间和不平衡程度等关键性能指标。实验结果证明了我们的方法的优越性,与现有的解决方案相比,我们的方法平均提高了 67.28% 的时间跨度和 29.03% 的能耗。
{"title":"CBWO: A Novel Multi-objective Load Balancing Technique for Cloud Computing","authors":"Vahideh Hayyolalam,&nbsp;Öznur Özkasap","doi":"10.1016/j.future.2024.107561","DOIUrl":"10.1016/j.future.2024.107561","url":null,"abstract":"<div><div>In cloud computing systems, the growing demand for diverse applications has led to challenges in resource allocation and workload distribution, resulting in increased energy consumption and computational costs. To address these challenges, we propose a novel load-balancing method, namely CBWO, that integrates Chaos theory with the Black Widow Optimization algorithm. Our approach is designed to optimize cloud computing environments by improving energy efficiency and resource utilization. We employ CloudSim for simulations, evaluating key performance metrics such as energy consumption, resource utilization, makespan, task completion time, and imbalance degree. The experimental results demonstrate the superiority of our method, achieving average improvements of 67.28% in makespan and 29.03% in energy consumption compared to existing solutions.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107561"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SWQC: Efficient sequencing data quality control on the next-generation sunway platform SWQC:新一代 sunway 平台上的高效测序数据质量控制
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-24 DOI: 10.1016/j.future.2024.107577
Lifeng Yan , Zekun Yin , Tong Zhang , Fangjin Zhu , Xiaohui Duan , Bertil Schmidt , Weiguo Liu
Sequencing data quality control can significantly prevent low-quality data from impacting downstream applications in bioinformatics. The enormous growth of biological sequencing data in recent years introduces new challenges to the efficiency of quality control processes and motivates the need for fast implementations on modern compute systems. The powerful next-generation heterogeneous Sunway platform holds significant potential for addressing this challenge. However, there are currently no dedicated quality control applications that can fully utilize its computational power. To bridge this gap, we introduce SWQC, a novel quality control application specifically designed for the Sunway platform. We present an efficient distributed FASTQ I/O framework for Sunway-based workstations and supercomputers to take advantage of fast SSDs and the parallel file system. In order to support both process-level and thread-level (CPE-level) parallelism to leverage the computational power, we refactor and optimize all standard quality control modules for the heterogeneous Sunway architecture. When using a single node, SWQC achieves speedups between 2 and 40 over highly optimized quality control applications executed on a high-end 48-core AMD server. Additionally, when using 16 nodes, SWQC achieves parallel efficiencies of 70% (for reading and writing a single file) and 95% (for reading one file and writing split files) compared to a single node. Overall, SWQC is able to perform quality control operations for a 140GB FASTQ file within only 70 s using a single Sunway node. It is publicly available at https://github.com/RabbitBio/SWQC.
测序数据质量控制能有效防止低质量数据影响生物信息学的下游应用。近年来,生物测序数据的巨大增长给质量控制流程的效率带来了新的挑战,并促使人们需要在现代计算系统上快速实现这一功能。功能强大的下一代异构 Sunway 平台具有应对这一挑战的巨大潜力。然而,目前还没有专门的质量控制应用能充分利用其计算能力。为了弥补这一差距,我们推出了 SWQC,这是一款专为 Sunway 平台设计的新型质量控制应用程序。我们为基于 Sunway 的工作站和超级计算机提出了一个高效的分布式 FASTQ I/O 框架,以充分利用快速固态硬盘和并行文件系统。为了支持进程级和线程级(CPE 级)并行以充分利用计算能力,我们针对异构 Sunway 架构重构和优化了所有标准质量控制模块。在使用单个节点时,SWQC 比在高端 48 核 AMD 服务器上执行的高度优化质量控制应用程序的速度提高了 2 到 40 倍。此外,在使用 16 个节点时,与单节点相比,SWQC 的并行效率分别达到 70%(读写单个文件)和 95%(读取一个文件并写入分割文件)。总之,使用单个 Sunway 节点,SWQC 只需 70 秒就能完成 140GB FASTQ 文件的质量控制操作。它可在 https://github.com/RabbitBio/SWQC 上公开获取。
{"title":"SWQC: Efficient sequencing data quality control on the next-generation sunway platform","authors":"Lifeng Yan ,&nbsp;Zekun Yin ,&nbsp;Tong Zhang ,&nbsp;Fangjin Zhu ,&nbsp;Xiaohui Duan ,&nbsp;Bertil Schmidt ,&nbsp;Weiguo Liu","doi":"10.1016/j.future.2024.107577","DOIUrl":"10.1016/j.future.2024.107577","url":null,"abstract":"<div><div>Sequencing data quality control can significantly prevent low-quality data from impacting downstream applications in bioinformatics. The enormous growth of biological sequencing data in recent years introduces new challenges to the efficiency of quality control processes and motivates the need for fast implementations on modern compute systems. The powerful next-generation heterogeneous Sunway platform holds significant potential for addressing this challenge. However, there are currently no dedicated quality control applications that can fully utilize its computational power. To bridge this gap, we introduce SWQC, a novel quality control application specifically designed for the Sunway platform. We present an efficient distributed FASTQ I/O framework for Sunway-based workstations and supercomputers to take advantage of fast SSDs and the parallel file system. In order to support both process-level and thread-level (CPE-level) parallelism to leverage the computational power, we refactor and optimize all standard quality control modules for the heterogeneous Sunway architecture. When using a single node, SWQC achieves speedups between 2 and 40 over highly optimized quality control applications executed on a high-end 48-core AMD server. Additionally, when using 16 nodes, SWQC achieves parallel efficiencies of 70% (for reading and writing a single file) and 95% (for reading one file and writing split files) compared to a single node. Overall, SWQC is able to perform quality control operations for a 140GB FASTQ file within only 70 s using a single Sunway node. It is publicly available at <span><span>https://github.com/RabbitBio/SWQC</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107577"},"PeriodicalIF":6.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient security interface for high-performance Ceph storage systems 高性能 Ceph 存储系统的高效安全接口
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-23 DOI: 10.1016/j.future.2024.107571
Fatemeh Khoda Parast , Seyed Alireza Damghani , Brett Kelly , Yang Wang , Kenneth B. Kent
Ceph portrays a resilient clustered storage solution with supporting object, block, and file storage capabilities with no single point of failure. Despite these qualifications, data confidentiality defines a concern in the system, as authentication and access control are the only data protection security services in Ceph. CephArmor was proposed as a third-party security interface to protect data confidentiality by adding an extra protection layer to data at rest. Despite the added layer, the initial design of the API needed to be more efficient in addressing security and performance simultaneously. In this study, we propose a new architectural design to address the associated issues with the preliminary prototype. Comprehensive performance and security analysis verify the improvement of the proposed method compared to the initial approach. The benchmark result has indicated a 37% improvement on average in IOPS, elapsed time, and bandwidth for the write benchmark compared to the initial model.
Ceph 是一种弹性集群存储解决方案,支持对象、块和文件存储功能,没有单点故障。尽管有这些优点,但数据保密性仍是系统中的一个问题,因为身份验证和访问控制是 Ceph 中唯一的数据保护安全服务。CephArmor 被提议作为第三方安全接口,通过为静态数据添加额外的保护层来保护数据的机密性。尽管增加了保护层,但最初设计的 API 需要更有效地同时解决安全性和性能问题。在本研究中,我们提出了一种新的架构设计,以解决与初步原型相关的问题。全面的性能和安全分析验证了与最初的方法相比,所提出的方法有所改进。基准结果表明,与初始模型相比,写入基准的 IOPS、耗时和带宽平均提高了 37%。
{"title":"Efficient security interface for high-performance Ceph storage systems","authors":"Fatemeh Khoda Parast ,&nbsp;Seyed Alireza Damghani ,&nbsp;Brett Kelly ,&nbsp;Yang Wang ,&nbsp;Kenneth B. Kent","doi":"10.1016/j.future.2024.107571","DOIUrl":"10.1016/j.future.2024.107571","url":null,"abstract":"<div><div>Ceph portrays a resilient clustered storage solution with supporting object, block, and file storage capabilities with no single point of failure. Despite these qualifications, data confidentiality defines a concern in the system, as authentication and access control are the only data protection security services in Ceph. CephArmor was proposed as a third-party security interface to protect data confidentiality by adding an extra protection layer to data at rest. Despite the added layer, the initial design of the API needed to be more efficient in addressing security and performance simultaneously. In this study, we propose a new architectural design to address the associated issues with the preliminary prototype. Comprehensive performance and security analysis verify the improvement of the proposed method compared to the initial approach. The benchmark result has indicated a 37% improvement on average in IOPS, elapsed time, and bandwidth for the <em>write</em> benchmark compared to the initial model.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107571"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142554219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimization of IoT perceived content caching in F-RANs: Minimum retrieval delay and resource extension with performance sensitivity 优化 F-RAN 中的物联网感知内容缓存:对性能敏感的最小检索延迟和资源扩展
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-23 DOI: 10.1016/j.future.2024.107572
Chia-Cheng Hu
In the Internet of Things (IoT) perceived applications of monitoring the states of the environment, a feasible technology is to use fog radio access networks (F-RANs) to alleviate the problems of long response time and cloud server bottlenecks in cloud computing. In response to the above problems, this work investigates the problem of minimizing the retrieval delay of IoT contents in F-RANs under the constraints of system resources. The problem is formulated as an integer linear programming (ILP) model. Then, a polynomial-time method with linear programming (LP) relaxation and rounding is proposed to approximate the optimal solution of the problem. Through proof, the method can obtain a feasible solution with a bounded approximation ratio in polynomial time. The conducted simulations validate that the obtained feasible solution is very close to the optimal one. On the other hand, when the system resources are not enough to meet the continuous growth of content retrieval and need to be expanded, this work further establishes an association relation between cached contents and system resources. Based on the above relation, the second method of expanding system resources with performance sensitivity is proposed to provide the service provider with an effective and economical expansion of system resources. It utilizes a predefined system parameter in balancing the trade-off between the approximation ratio to the optimal solution of the problem and the extended system resources. The solution obtained by the second method is also proved to have a bounded approximation ratio.
在物联网(IoT)感知环境状态的应用中,一种可行的技术是利用雾无线接入网(F-RAN)来缓解云计算中响应时间长和云服务器瓶颈的问题。针对上述问题,本研究探讨了在系统资源约束下,如何最小化 F-RAN 中物联网内容的检索延迟问题。该问题被表述为一个整数线性规划(ILP)模型。然后,提出了一种利用线性规划(LP)松弛和舍入的多项式时间方法来逼近问题的最优解。通过证明,该方法可以在多项式时间内获得具有有界近似率的可行解。仿真验证了所得到的可行解非常接近最优解。另一方面,当系统资源不足以满足内容检索的持续增长而需要扩充时,这项工作进一步建立了缓存内容与系统资源之间的关联关系。基于上述关联关系,本文提出了第二种性能敏感的系统资源扩展方法,为服务提供商提供有效、经济的系统资源扩展。它利用一个预定义的系统参数来平衡问题最优解的近似率与扩展系统资源之间的权衡。第二种方法得到的解也被证明具有有界近似率。
{"title":"Optimization of IoT perceived content caching in F-RANs: Minimum retrieval delay and resource extension with performance sensitivity","authors":"Chia-Cheng Hu","doi":"10.1016/j.future.2024.107572","DOIUrl":"10.1016/j.future.2024.107572","url":null,"abstract":"<div><div>In the Internet of Things (IoT) perceived applications of monitoring the states of the environment, a feasible technology is to use fog radio access networks (F-RANs) to alleviate the problems of long response time and cloud server bottlenecks in cloud computing. In response to the above problems, this work investigates the problem of minimizing the retrieval delay of IoT contents in F-RANs under the constraints of system resources. The problem is formulated as an integer linear programming (ILP) model. Then, a polynomial-time method with linear programming (LP) relaxation and rounding is proposed to approximate the optimal solution of the problem. Through proof, the method can obtain a feasible solution with a bounded approximation ratio in polynomial time. The conducted simulations validate that the obtained feasible solution is very close to the optimal one. On the other hand, when the system resources are not enough to meet the continuous growth of content retrieval and need to be expanded, this work further establishes an association relation between cached contents and system resources. Based on the above relation, the second method of expanding system resources with performance sensitivity is proposed to provide the service provider with an effective and economical expansion of system resources. It utilizes a predefined system parameter in balancing the trade-off between the approximation ratio to the optimal solution of the problem and the extended system resources. The solution obtained by the second method is also proved to have a bounded approximation ratio.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107572"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142572975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing optimal Quantum Key Distribution Networks based on Time-Division Multiplexing of QKD transceivers: qTDM-QKDN 设计基于 QKD 收发器时分复用的最佳量子密钥分发网络:qTDM-QKDN
IF 6.2 2区 计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS Pub Date : 2024-10-23 DOI: 10.1016/j.future.2024.107557
Juan Carlos Hernandez-Hernandez , David Larrabeiti , Maria Calderon , Ignacio Soto , Bruno Cimoli , Hui Liu , Idelfonso Tafur Monroy
Time-sharing of Quantum Key Distribution (QKD) transceivers with the help of optical switches and a central Software-Defined Networking (SDN) controller is a promising technique to better amortize the large investments required to build a Quantum Key Distribution Network (QKDN). In this work, we investigate the implications of introducing Time-Division Multiplexing (TDM) in trusted-relay QKDNs at the wide-area network scale in terms of performance and cost-saving. To this end, we developed both a Mixed Integer Linear Programming (qTDM-MILP) model and a Heuristic Algorithm (qTDM-HA) to solve the allocation of QKD transceivers and network resources for a novel switched QKDN operating scheme: qTDM-QKDN. Our heuristic method provides a close-to-optimal resource planning for the offline problem that computes the minimum number of QKD transceivers and optical switch ports at each node, as well as the number of quantum channels on each link required to satisfy a target set of end-to-end secret-keyrate demands. Moreover, both the model and the heuristic provide the time fractions that each QKD transceiver needs to peer with each neighbor QKD transceiver. We compared our proposed model and heuristic algorithm for cost minimization with non-time sharing QKD transceivers (nTDM) as baseline. The results show that qTDM can achieve substantial cost-savings in the range of 10%–40% compared to nTDM. Furthermore, this work sheds light on the selection of the value for the working cycle T and its influence on network performance.
在光交换机和中央软件定义网络(SDN)控制器的帮助下,量子密钥分发(QKD)收发器的时间共享是一种很有前途的技术,可以更好地摊薄构建量子密钥分发网络(QKDN)所需的巨额投资。在这项工作中,我们研究了在广域网络规模的可信中继量子密钥分发网络中引入时分复用(TDM)技术对性能和成本节约的影响。为此,我们开发了混合整数线性规划(qTDM-MILP)模型和启发式算法(qTDM-HA),以解决新型交换式 QKDN 运营方案:qTDM-QKDN 的 QKD 收发器和网络资源的分配问题。我们的启发式方法为离线问题提供了接近最优的资源规划,可以计算出每个节点上 QKD 收发器和光交换端口的最小数量,以及每个链路上满足端到端密钥需求目标集所需的量子信道数量。此外,模型和启发式都提供了每个 QKD 收发器与每个相邻 QKD 收发器对等所需的时间分数。我们以非时间共享 QKD 收发器(nTDM)为基准,比较了我们提出的成本最小化模型和启发式算法。结果表明,与 nTDM 相比,qTDM 可以节省 10%-40%的大量成本。此外,这项工作还揭示了工作周期 T 值的选择及其对网络性能的影响。
{"title":"Designing optimal Quantum Key Distribution Networks based on Time-Division Multiplexing of QKD transceivers: qTDM-QKDN","authors":"Juan Carlos Hernandez-Hernandez ,&nbsp;David Larrabeiti ,&nbsp;Maria Calderon ,&nbsp;Ignacio Soto ,&nbsp;Bruno Cimoli ,&nbsp;Hui Liu ,&nbsp;Idelfonso Tafur Monroy","doi":"10.1016/j.future.2024.107557","DOIUrl":"10.1016/j.future.2024.107557","url":null,"abstract":"<div><div>Time-sharing of Quantum Key Distribution (QKD) transceivers with the help of optical switches and a central Software-Defined Networking (SDN) controller is a promising technique to better amortize the large investments required to build a Quantum Key Distribution Network (QKDN). In this work, we investigate the implications of introducing Time-Division Multiplexing (TDM) in trusted-relay QKDNs at the wide-area network scale in terms of performance and cost-saving. To this end, we developed both a Mixed Integer Linear Programming (qTDM-MILP) model and a Heuristic Algorithm (qTDM-HA) to solve the allocation of QKD transceivers and network resources for a novel switched QKDN operating scheme: qTDM-QKDN. Our heuristic method provides a close-to-optimal resource planning for the offline problem that computes the minimum number of QKD transceivers and optical switch ports at each node, as well as the number of quantum channels on each link required to satisfy a target set of end-to-end secret-keyrate demands. Moreover, both the model and the heuristic provide the time fractions that each QKD transceiver needs to peer with each neighbor QKD transceiver. We compared our proposed model and heuristic algorithm for cost minimization with non-time sharing QKD transceivers (nTDM) as baseline. The results show that qTDM can achieve substantial cost-savings in the range of 10%–40% compared to nTDM. Furthermore, this work sheds light on the selection of the value for the working cycle <span><math><mi>T</mi></math></span> and its influence on network performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"164 ","pages":"Article 107557"},"PeriodicalIF":6.2,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142651665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Future Generation Computer Systems-The International Journal of Escience
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1