Pub Date : 2024-06-06DOI: 10.1007/s11390-024-3538-1
Peng-Ju Liu, Cui-Ping Li, Hong Chen
Data partitioning techniques are pivotal for optimal data placement across storage devices, thereby enhancing resource utilization and overall system throughput. However, the design of effective partition schemes faces multiple challenges, including considerations of the cluster environment, storage device characteristics, optimization objectives, and the balance between partition quality and computational efficiency. Furthermore, dynamic environments necessitate robust partition detection mechanisms. This paper presents a comprehensive survey structured around partition deployment environments, outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are addressed. We discuss partitioning features pertaining to database schema, table data, workload, and runtime metrics. We then delve into the partition generation process, segmenting it into initialization and optimization stages. A comparative analysis of partition generation and update algorithms is provided, emphasizing their suitability for different scenarios and optimization objectives. Additionally, we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and solutions. This survey aims to foster the implementation, deployment, and updating of high-quality partitions for specific system scenarios.
{"title":"Enhancing Storage Efficiency and Performance: A Survey of Data Partitioning Techniques","authors":"Peng-Ju Liu, Cui-Ping Li, Hong Chen","doi":"10.1007/s11390-024-3538-1","DOIUrl":"https://doi.org/10.1007/s11390-024-3538-1","url":null,"abstract":"<p>Data partitioning techniques are pivotal for optimal data placement across storage devices, thereby enhancing resource utilization and overall system throughput. However, the design of effective partition schemes faces multiple challenges, including considerations of the cluster environment, storage device characteristics, optimization objectives, and the balance between partition quality and computational efficiency. Furthermore, dynamic environments necessitate robust partition detection mechanisms. This paper presents a comprehensive survey structured around partition deployment environments, outlining the distinguishing features and applicability of various partitioning strategies while delving into how these challenges are addressed. We discuss partitioning features pertaining to database schema, table data, workload, and runtime metrics. We then delve into the partition generation process, segmenting it into initialization and optimization stages. A comparative analysis of partition generation and update algorithms is provided, emphasizing their suitability for different scenarios and optimization objectives. Additionally, we illustrate the applications of partitioning in prevalent database products and suggest potential future research directions and solutions. This survey aims to foster the implementation, deployment, and updating of high-quality partitions for specific system scenarios.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"5 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1007/s11390-024-1743-6
Wei-Dong Lin, Yu-Yan Deng, Yang Gao, Ning Wang, Ling-Qiao Liu, Lei Zhang, Peng Wang
Given a query patch from a novel class, one-shot object detection aims to detect all instances of this class in a target image through the semantic similarity comparison. However, due to the extremely limited guidance in the novel class as well as the unseen appearance difference between the query and target instances, it is difficult to appropriately exploit their semantic similarity and generalize well. To mitigate this problem, we present a universal Cross-Attention Transformer (CAT) module for accurate and efficient semantic similarity comparison in one-shot object detection. The proposed CAT utilizes the transformer mechanism to comprehensively capture bi-directional correspondence between any paired pixels from the query and the target image, which empowers us to sufficiently exploit their semantic characteristics for accurate similarity comparison. In addition, the proposed CAT enables feature dimensionality compression for inference speedup without performance loss. Extensive experiments on three object detection datasets MS-COCO, PASCAL VOC and FSOD under the one-shot setting demonstrate the effectiveness and efficiency of our model, e.g., it surpasses CoAE, a major baseline in this task, by 1.0% in average precision (AP) on MS-COCO and runs nearly 2.5 times faster.
{"title":"CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection","authors":"Wei-Dong Lin, Yu-Yan Deng, Yang Gao, Ning Wang, Ling-Qiao Liu, Lei Zhang, Peng Wang","doi":"10.1007/s11390-024-1743-6","DOIUrl":"https://doi.org/10.1007/s11390-024-1743-6","url":null,"abstract":"<p>Given a query patch from a novel class, one-shot object detection aims to detect all instances of this class in a target image through the semantic similarity comparison. However, due to the extremely limited guidance in the novel class as well as the unseen appearance difference between the query and target instances, it is difficult to appropriately exploit their semantic similarity and generalize well. To mitigate this problem, we present a universal Cross-Attention Transformer (CAT) module for accurate and efficient semantic similarity comparison in one-shot object detection. The proposed CAT utilizes the transformer mechanism to comprehensively capture bi-directional correspondence between any paired pixels from the query and the target image, which empowers us to sufficiently exploit their semantic characteristics for accurate similarity comparison. In addition, the proposed CAT enables feature dimensionality compression for inference speedup without performance loss. Extensive experiments on three object detection datasets MS-COCO, PASCAL VOC and FSOD under the one-shot setting demonstrate the effectiveness and efficiency of our model, e.g., it surpasses CoAE, a major baseline in this task, by 1.0% in average precision (AP) on MS-COCO and runs nearly 2.5 times faster.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"37 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1007/s11390-023-1611-9
Yun-Hao Cao, Jian-Xin Wu
Many real-world datasets suffer from the unavoidable issue of missing values, and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors. In this paper, we propose a random subspace sampling method, RSS, by sampling missing items from the corresponding feature histogram distributions in random subspaces, which is effective and efficient at different levels of missing data. Unlike most established approaches, RSS does not train on fixed imputed datasets. Instead, we design a dynamic training strategy where the filled values change dynamically by resampling during training. Moreover, thanks to the sampling strategy, we design an ensemble testing strategy where we combine the results of multiple runs of a single model, which is more efficient and resource-saving than previous ensemble methods. Finally, we combine these two strategies with the random subspace method, which makes our estimations more robust and accurate. The effectiveness of the proposed RSS method is well validated by experimental studies.
{"title":"Random Subspace Sampling for Classification with Missing Data","authors":"Yun-Hao Cao, Jian-Xin Wu","doi":"10.1007/s11390-023-1611-9","DOIUrl":"https://doi.org/10.1007/s11390-023-1611-9","url":null,"abstract":"<p>Many real-world datasets suffer from the unavoidable issue of missing values, and therefore classification with missing data has to be carefully handled since inadequate treatment of missing values will cause large errors. In this paper, we propose a random subspace sampling method, RSS, by sampling missing items from the corresponding feature histogram distributions in random subspaces, which is effective and efficient at different levels of missing data. Unlike most established approaches, RSS does not train on fixed imputed datasets. Instead, we design a dynamic training strategy where the filled values change dynamically by resampling during training. Moreover, thanks to the sampling strategy, we design an ensemble testing strategy where we combine the results of multiple runs of a single model, which is more efficient and resource-saving than previous ensemble methods. Finally, we combine these two strategies with the random subspace method, which makes our estimations more robust and accurate. The effectiveness of the proposed RSS method is well validated by experimental studies.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"4 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1007/s11390-023-2121-5
Hui Jiang, Yu-Xin Deng, Ming Xu
The goal of qubit mapping is to map a logical circuit to a physical device by introducing additional gates as few as possible in an acceptable amount of time. We present an effective approach called Tabu Search Based Adjustment (TSA) algorithm to construct the mappings. It consists of two key steps: one is making use of a combined subgraph isomorphism and completion to initialize some candidate mappings, and the other is dynamically modifying the mappings by TSA. Our experiments show that, compared with state-of-the-art methods, TSA can generate mappings with a smaller number of additional gates and have better scalability for large-scale circuits.
量子位映射的目标是在可接受的时间内,通过引入尽可能少的附加门,将逻辑电路映射到物理设备。我们提出了一种名为基于塔布搜索调整(Tabu Search Based Adjustment,TSA)算法的有效方法来构建映射。它由两个关键步骤组成:一个是利用子图同构和补全组合来初始化一些候选映射,另一个是通过 TSA 动态修改映射。我们的实验表明,与最先进的方法相比,TSA 可以用较少的额外门数生成映射,对大规模电路具有更好的可扩展性。
{"title":"Qubit Mapping Based on Tabu Search","authors":"Hui Jiang, Yu-Xin Deng, Ming Xu","doi":"10.1007/s11390-023-2121-5","DOIUrl":"https://doi.org/10.1007/s11390-023-2121-5","url":null,"abstract":"<p>The goal of qubit mapping is to map a logical circuit to a physical device by introducing additional gates as few as possible in an acceptable amount of time. We present an effective approach called Tabu Search Based Adjustment (TSA) algorithm to construct the mappings. It consists of two key steps: one is making use of a combined subgraph isomorphism and completion to initialize some candidate mappings, and the other is dynamically modifying the mappings by TSA. Our experiments show that, compared with state-of-the-art methods, TSA can generate mappings with a smaller number of additional gates and have better scalability for large-scale circuits.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"1 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1007/s11390-024-4150-0
Xiao-Fei Liao, Wen-Ju Zhao, Hai Jin, Peng-Cheng Yao, Yu Huang, Qing-Gang Wang, Jin Zhao, Long Zheng, Yu Zhang, Zhi-Yuan Shao
Graph processing has been widely used in many scenarios, from scientific computing to artificial intelligence. Graph processing exhibits irregular computational parallelism and random memory accesses, unlike traditional workloads. Therefore, running graph processing workloads on conventional architectures (e.g., CPUs and GPUs) often shows a significantly low compute-memory ratio with few performance benefits, which can be, in many cases, even slower than a specialized single-thread graph algorithm. While domain-specific hardware designs are essential for graph processing, it is still challenging to transform the hardware capability to performance boost without coupled software codesigns. This article presents a graph processing ecosystem from hardware to software. We start by introducing a series of hardware accelerators as the foundation of this ecosystem. Subsequently, the codesigned parallel graph systems and their distributed techniques are presented to support graph applications. Finally, we introduce our efforts on novel graph applications and hardware architectures. Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.
图处理已被广泛应用于从科学计算到人工智能等多个领域。与传统工作负载不同,图处理具有不规则的计算并行性和随机内存访问。因此,在传统架构(如 CPU 和 GPU)上运行图处理工作负载时,计算内存比往往明显偏低,性能优势也不明显,在很多情况下甚至比专门的单线程图算法还要慢。虽然特定领域的硬件设计对图处理至关重要,但在没有耦合软件代码设计的情况下,将硬件能力转化为性能提升仍具有挑战性。本文介绍了从硬件到软件的图处理生态系统。我们首先介绍了一系列硬件加速器,作为该生态系统的基础。随后,介绍了支持图形应用的代码设计并行图形系统及其分布式技术。最后,我们介绍了我们在新型图形应用和硬件架构方面所做的努力。大量结果表明,各种图应用都可以在这个图处理生态系统中得到有效加速。
{"title":"Towards High-Performance Graph Processing: From a Hardware/Software Co-Design Perspective","authors":"Xiao-Fei Liao, Wen-Ju Zhao, Hai Jin, Peng-Cheng Yao, Yu Huang, Qing-Gang Wang, Jin Zhao, Long Zheng, Yu Zhang, Zhi-Yuan Shao","doi":"10.1007/s11390-024-4150-0","DOIUrl":"https://doi.org/10.1007/s11390-024-4150-0","url":null,"abstract":"<p>Graph processing has been widely used in many scenarios, from scientific computing to artificial intelligence. Graph processing exhibits irregular computational parallelism and random memory accesses, unlike traditional workloads. Therefore, running graph processing workloads on conventional architectures (e.g., CPUs and GPUs) often shows a significantly low compute-memory ratio with few performance benefits, which can be, in many cases, even slower than a specialized single-thread graph algorithm. While domain-specific hardware designs are essential for graph processing, it is still challenging to transform the hardware capability to performance boost without coupled software codesigns. This article presents a graph processing ecosystem from hardware to software. We start by introducing a series of hardware accelerators as the foundation of this ecosystem. Subsequently, the codesigned parallel graph systems and their distributed techniques are presented to support graph applications. Finally, we introduce our efforts on novel graph applications and hardware architectures. Extensive results show that various graph applications can be efficiently accelerated in this graph processing ecosystem.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"85 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-06DOI: 10.1007/s11390-024-3414-z
Zi-Nuo Li, Xu-Hang Chen, Shu-Na Guo, Shu-Qiang Wang, Chi-Man Pun
Image enhancement is a widely used technique in digital image processing that aims to improve image aesthetics and visual quality. However, traditional methods of enhancement based on pixel-level or global-level modifications have limited effectiveness. Recently, as learning-based techniques gain popularity, various studies are now focusing on utilizing networks for image enhancement. However, these techniques often fail to optimize image frequency domains. This study addresses this gap by introducing a transformer-based model for improving images in the wavelet domain. The proposed model refines various frequency bands of an image and prioritizes local details and high-level features. Consequently, the proposed technique produces superior enhancement results. The proposed model’s performance was assessed through comprehensive benchmark evaluations, and the results suggest it outperforms the state-of-the-art techniques.
{"title":"WavEnhancer: Unifying Wavelet and Transformer for Image Enhancement","authors":"Zi-Nuo Li, Xu-Hang Chen, Shu-Na Guo, Shu-Qiang Wang, Chi-Man Pun","doi":"10.1007/s11390-024-3414-z","DOIUrl":"https://doi.org/10.1007/s11390-024-3414-z","url":null,"abstract":"<p>Image enhancement is a widely used technique in digital image processing that aims to improve image aesthetics and visual quality. However, traditional methods of enhancement based on pixel-level or global-level modifications have limited effectiveness. Recently, as learning-based techniques gain popularity, various studies are now focusing on utilizing networks for image enhancement. However, these techniques often fail to optimize image frequency domains. This study addresses this gap by introducing a transformer-based model for improving images in the wavelet domain. The proposed model refines various frequency bands of an image and prioritizes local details and high-level features. Consequently, the proposed technique produces superior enhancement results. The proposed model’s performance was assessed through comprehensive benchmark evaluations, and the results suggest it outperforms the state-of-the-art techniques.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"33 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141546856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-05-10DOI: 10.1007/s11390-024-3419-7
Younhyun Jung, Jim Kong, Bin Sheng, Jinman Kim
Direct volume rendering (DVR) is a technique that emphasizes structures of interest (SOIs) within a volume visually, while simultaneously depicting adjacent regional information, e.g., the spatial location of a structure concerning its neighbors. In DVR, transfer function (TF) plays a key role by enabling accurate identification of SOIs interactively as well as ensuring appropriate visibility of them. TF generation typically involves non-intuitive trial-and-error optimization of rendering parameters, which is time-consuming and inefficient. Attempts at mitigating this manual process have led to approaches that make use of a knowledge database consisting of pre-designed TFs by domain experts. In these approaches, a user navigates the knowledge database to find the most suitable pre-designed TF for their input volume to visualize the SOIs. Although these approaches potentially reduce the workload to generate the TFs, they, however, require manual TF navigation of the knowledge database, as well as the likely fine tuning of the selected TF to suit the input. In this work, we propose a TF design approach, CBR-TF, where we introduce a new content-based retrieval (CBR) method to automatically navigate the knowledge database. Instead of pre-designed TFs, our knowledge database contains volumes with SOI labels. Given an input volume, our CBR-TF approach retrieves relevant volumes (with SOI labels) from the knowledge database; the retrieved labels are then used to generate and optimize TFs of the input. This approach largely reduces manual TF navigation and fine tuning. For our CBR-TF approach, we introduce a novel volumetric image feature which includes both a local primitive intensity profile along the SOIs and regional spatial semantics available from the co-planar images to the profile. For the regional spatial semantics, we adopt a convolutional neural network to obtain high-level image feature representations. For the intensity profile, we extend the dynamic time warping technique to address subtle alignment differences between similar profiles (SOIs). Finally, we propose a two-stage CBR scheme to enable the use of these two different feature representations in a complementary manner, thereby improving SOI retrieval performance. We demonstrate the capabilities of our CBR-TF approach with comparison with a conventional approach in visualization, where an intensity profile matching algorithm is used, and also with potential use-cases in medical volume visualization.
直接体积渲染(DVR)是一种在视觉上强调体积内感兴趣的结构(SOIs),同时描绘相邻区域信息的技术,例如,一个结构与其相邻结构的空间位置。在 DVR 中,传递函数(TF)起着关键作用,它能以交互方式准确识别 SOI,并确保其适当的可见性。TF 生成通常需要对渲染参数进行非直观的试错优化,既耗时又低效。为了减少这种手动过程,人们尝试使用由领域专家预先设计的 TF 组成的知识数据库。在这些方法中,用户通过浏览知识数据库来找到最适合其输入体积的预设计 TF,从而实现 SOI 可视化。虽然这些方法有可能减少生成 TF 的工作量,但它们需要对知识数据库进行手动 TF 导航,还可能需要对选定的 TF 进行微调以适应输入。在这项工作中,我们提出了一种 TF 设计方法--CBR-TF,其中我们引入了一种新的基于内容的检索(CBR)方法来自动导航知识数据库。我们的知识数据库不包含预先设计的 TF,而是包含带有 SOI 标签的卷。给定一个输入卷,我们的 CBR-TF 方法会从知识数据库中检索相关卷(带 SOI 标签);然后利用检索到的标签生成和优化输入的 TF。这种方法在很大程度上减少了人工 TF 导航和微调。在 CBR-TF 方法中,我们引入了一种新颖的容积图像特征,其中包括沿 SOI 的局部原始强度剖面和从共平面图像到剖面的区域空间语义。对于区域空间语义,我们采用卷积神经网络来获得高级图像特征表示。对于强度剖面,我们扩展了动态时间扭曲技术,以解决相似剖面(SOIs)之间微妙的配准差异。最后,我们提出了一种两阶段 CBR 方案,以互补的方式使用这两种不同的特征表示,从而提高 SOI 检索性能。我们将 CBR-TF 方法与可视化领域的传统方法(使用强度剖面匹配算法)以及医学体量可视化领域的潜在用例进行了比较,从而展示了 CBR-TF 方法的能力。
{"title":"A Transfer Function Design for Medical Volume Data Using a Knowledge Database Based on Deep Image and Primitive Intensity Profile Features Retrieval","authors":"Younhyun Jung, Jim Kong, Bin Sheng, Jinman Kim","doi":"10.1007/s11390-024-3419-7","DOIUrl":"https://doi.org/10.1007/s11390-024-3419-7","url":null,"abstract":"<p>Direct volume rendering (DVR) is a technique that emphasizes structures of interest (SOIs) within a volume visually, while simultaneously depicting adjacent regional information, e.g., the spatial location of a structure concerning its neighbors. In DVR, transfer function (TF) plays a key role by enabling accurate identification of SOIs interactively as well as ensuring appropriate visibility of them. TF generation typically involves non-intuitive trial-and-error optimization of rendering parameters, which is time-consuming and inefficient. Attempts at mitigating this manual process have led to approaches that make use of a knowledge database consisting of pre-designed TFs by domain experts. In these approaches, a user navigates the knowledge database to find the most suitable pre-designed TF for their input volume to visualize the SOIs. Although these approaches potentially reduce the workload to generate the TFs, they, however, require manual TF navigation of the knowledge database, as well as the likely fine tuning of the selected TF to suit the input. In this work, we propose a TF design approach, CBR-TF, where we introduce a new content-based retrieval (CBR) method to automatically navigate the knowledge database. Instead of pre-designed TFs, our knowledge database contains volumes with SOI labels. Given an input volume, our CBR-TF approach retrieves relevant volumes (with SOI labels) from the knowledge database; the retrieved labels are then used to generate and optimize TFs of the input. This approach largely reduces manual TF navigation and fine tuning. For our CBR-TF approach, we introduce a novel volumetric image feature which includes both a local primitive intensity profile along the SOIs and regional spatial semantics available from the co-planar images to the profile. For the regional spatial semantics, we adopt a convolutional neural network to obtain high-level image feature representations. For the intensity profile, we extend the dynamic time warping technique to address subtle alignment differences between similar profiles (SOIs). Finally, we propose a two-stage CBR scheme to enable the use of these two different feature representations in a complementary manner, thereby improving SOI retrieval performance. We demonstrate the capabilities of our CBR-TF approach with comparison with a conventional approach in visualization, where an intensity profile matching algorithm is used, and also with potential use-cases in medical volume visualization.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"38 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140930540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s11390-023-3964-5
Franklin Yang
This paper presents the design of a Coherence-Free Processor (CFP) that enables a scalable multiprocessor by eliminating cache coherence operations in both hardware and software. The CFP uses a coherence-free cache (CFC) that can improve the cost-effectiveness and performance-effectiveness of the existing multiprocessors for commonly used workloads. The CFC is feasible because not all program data that reside in a multiprocessor cache need to be accessed by other processors, and private caches at level 1 (L1) and level 2 (L2) facilitate this method of sharing. Reentrant programs are specifically designed to protect their data from modification by other tasks. Program data that are modified but not shared with other tasks do not require a coherence protocol. Adding processors reduces the multitasking queue, reducing elapsed time. Simultaneous execution replaces concurrent execution.
{"title":"CFP: A Coherence-Free Processor Design","authors":"Franklin Yang","doi":"10.1007/s11390-023-3964-5","DOIUrl":"https://doi.org/10.1007/s11390-023-3964-5","url":null,"abstract":"<p>This paper presents the design of a Coherence-Free Processor (CFP) that enables a scalable multiprocessor by eliminating cache coherence operations in both hardware and software. The CFP uses a coherence-free cache (CFC) that can improve the cost-effectiveness and performance-effectiveness of the existing multiprocessors for commonly used workloads. The CFC is feasible because not all program data that reside in a multiprocessor cache need to be accessed by other processors, and private caches at level 1 (L1) and level 2 (L2) facilitate this method of sharing. Reentrant programs are specifically designed to protect their data from modification by other tasks. Program data that are modified but not shared with other tasks do not require a coherence protocol. Adding processors reduces the multitasking queue, reducing elapsed time. Simultaneous execution replaces concurrent execution.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"11 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s11390-023-1535-4
Zhi-Wei Xu, Li Pan, Shi-Jun Liu
Infrastructure-as-a-Service (IaaS) cloud platforms offer resources with diverse buying options. Users can run an instance on the on-demand market which is stable but expensive or on the spot market with a significant discount. However, users have to carefully weigh the low cost of spot instances against their poor availability. Spot instances will be revoked when the revocation event occurs. Thus, an important problem that an IaaS user faces now is how to use spot instances in a cost-effective and low-risk way. Based on the replication-based fault tolerance mechanism, we propose an online termination algorithm that optimizes the cost of using spot instances while ensuring operational stability. We prove that in most cases, the cost of our proposed online algorithm will not exceed twice the minimum cost of the optimal offline algorithm that knows the exact future a priori. Through a large number of experiments, we verify that our algorithm in most cases has a competitive ratio of no more than 2, and in other cases it can also reach the guaranteed competitive ratio.
{"title":"An Online Algorithm Based on Replication for Using Spot Instances in IaaS Clouds","authors":"Zhi-Wei Xu, Li Pan, Shi-Jun Liu","doi":"10.1007/s11390-023-1535-4","DOIUrl":"https://doi.org/10.1007/s11390-023-1535-4","url":null,"abstract":"<p>Infrastructure-as-a-Service (IaaS) cloud platforms offer resources with diverse buying options. Users can run an instance on the on-demand market which is stable but expensive or on the spot market with a significant discount. However, users have to carefully weigh the low cost of spot instances against their poor availability. Spot instances will be revoked when the revocation event occurs. Thus, an important problem that an IaaS user faces now is how to use spot instances in a cost-effective and low-risk way. Based on the replication-based fault tolerance mechanism, we propose an online termination algorithm that optimizes the cost of using spot instances while ensuring operational stability. We prove that in most cases, the cost of our proposed online algorithm will not exceed twice the minimum cost of the optimal offline algorithm that knows the exact future a priori. Through a large number of experiments, we verify that our algorithm in most cases has a competitive ratio of no more than 2, and in other cases it can also reach the guaranteed competitive ratio.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"116 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-01-30DOI: 10.1007/s11390-023-3204-z
Cong-Xi Song, Biao Han, Jin-Shu Su
In recent years, live streaming has become a popular application, which uses TCP as its primary transport protocol. Quick UDP Internet Connections (QUIC) protocol opens up new opportunities for live streaming. However, how to leverage QUIC to transmit live videos has not been studied yet. This paper first investigates the achievable quality of experience (QoE) of streaming live videos over TCP, QUIC, and their multipath extensions Multipath TCP (MPTCP) and Multipath QUIC (MPQUIC). We observe that MPQUIC achieves the best performance with bandwidth aggregation and transmission reliability. However, network fluctuations may cause heterogeneous paths, high path loss, and bandwidth degradation, resulting in significant QoE deterioration. Motivated by the above observations, we investigate the multipath packet scheduling problem in live streaming and design 4D-MAP, a multipath adaptive packet scheduling scheme over QUIC. Specifically, a linear upper confidence bound (LinUCB)-based online learning algorithm, along with four novel scheduling mechanisms, i.e., Dispatch, Duplicate, Discard, and Decompensate, is proposed to conquer the above problems. 4D-MAP has been evaluated in both controlled emulation and real-world networks to make comparison with the state-of-the-art multipath transmission schemes. Experimental results reveal that 4D-MAP outperforms others in terms of improving the QoE of live streaming.
{"title":"4D-MAP: Multipath Adaptive Packet Scheduling for Live Streaming over QUIC","authors":"Cong-Xi Song, Biao Han, Jin-Shu Su","doi":"10.1007/s11390-023-3204-z","DOIUrl":"https://doi.org/10.1007/s11390-023-3204-z","url":null,"abstract":"<p>In recent years, live streaming has become a popular application, which uses TCP as its primary transport protocol. Quick UDP Internet Connections (QUIC) protocol opens up new opportunities for live streaming. However, how to leverage QUIC to transmit live videos has not been studied yet. This paper first investigates the achievable quality of experience (QoE) of streaming live videos over TCP, QUIC, and their multipath extensions Multipath TCP (MPTCP) and Multipath QUIC (MPQUIC). We observe that MPQUIC achieves the best performance with bandwidth aggregation and transmission reliability. However, network fluctuations may cause heterogeneous paths, high path loss, and bandwidth degradation, resulting in significant QoE deterioration. Motivated by the above observations, we investigate the multipath packet scheduling problem in live streaming and design 4D-MAP, a multipath adaptive packet scheduling scheme over QUIC. Specifically, a linear upper confidence bound (LinUCB)-based online learning algorithm, along with four novel scheduling mechanisms, i.e., Dispatch, Duplicate, Discard, and Decompensate, is proposed to conquer the above problems. 4D-MAP has been evaluated in both controlled emulation and real-world networks to make comparison with the state-of-the-art multipath transmission schemes. Experimental results reveal that 4D-MAP outperforms others in terms of improving the QoE of live streaming.</p>","PeriodicalId":50222,"journal":{"name":"Journal of Computer Science and Technology","volume":"51 1","pages":""},"PeriodicalIF":1.9,"publicationDate":"2024-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140581921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}