首页 > 最新文献

2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)最新文献

英文 中文
Incenter-based nearest feature space method for hyperspectral image classification using GPU 基于最近邻特征空间的GPU高光谱图像分类方法
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097911
Yang-Lang Chang, Hsien-Tang Chao, Min-Yu Huang, Lena Chang, Jyh-Perng Fang, Tung-Ju Hsieh
In this paper a novel technique based on nearest feature space (NFS), known as incenter-based nearest feature space (INFS), is proposed for supervised hyperspectral image classification. Due to the class separability and neighborhood structure, the traditional NFS can perform well for classification of remote sensing images. However, in some instances, the overlapping training samples might cause classification errors in spite of the high classification accuracy of NFS for normal cases. In response, the INFS is proposed to overcome this problem in this paper. INFS method makes use of the incircle of a triangle which is tangent to its three sides and form a INFS. In addition, an incenter can be calculated by three training samples of the same class efficiently. Furthermore, in order to speed up the computation performance, this paper proposes a parallel computing version of INFS, namely parallel INFS (PINFS). It uses a modern graphics processing unit (GPU) architecture with NVIDIA's compute unified device architecture (CUDA) technology to improve the computational speed of INFS. Experimental results demonstrate the proposed INFS approach is suitable for land cover classification in earth remote sensing. It can achieve the better performance than NFS classifier when the class sample distribution overlaps. Through the computation of GPU by CUDA, we can also gain better speedup.
本文提出了一种基于最近邻特征空间(NFS)的有监督高光谱图像分类新技术——基于中心的最近邻特征空间(INFS)。由于类可分离性和邻域结构,传统的NFS可以很好地用于遥感图像的分类。然而,在某些情况下,尽管NFS在正常情况下具有很高的分类准确率,但重叠的训练样本可能会导致分类错误。针对这一问题,本文提出了一种信息系统。INFS方法利用一个三角形的三个边相切的圆,形成一个INFS。此外,同一类的三个训练样本可以有效地计算出一个中心。此外,为了加快计算速度,本文提出了一种并行计算版本的INFS,即并行INFS (PINFS)。它采用现代图形处理单元(GPU)架构和NVIDIA的计算统一设备架构(CUDA)技术来提高INFS的计算速度。实验结果表明,该方法适用于地球遥感土地覆盖分类。当类样本分布重叠时,它比NFS分类器具有更好的性能。通过CUDA对GPU的计算,我们也可以获得更好的加速。
{"title":"Incenter-based nearest feature space method for hyperspectral image classification using GPU","authors":"Yang-Lang Chang, Hsien-Tang Chao, Min-Yu Huang, Lena Chang, Jyh-Perng Fang, Tung-Ju Hsieh","doi":"10.1109/PADSW.2014.7097911","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097911","url":null,"abstract":"In this paper a novel technique based on nearest feature space (NFS), known as incenter-based nearest feature space (INFS), is proposed for supervised hyperspectral image classification. Due to the class separability and neighborhood structure, the traditional NFS can perform well for classification of remote sensing images. However, in some instances, the overlapping training samples might cause classification errors in spite of the high classification accuracy of NFS for normal cases. In response, the INFS is proposed to overcome this problem in this paper. INFS method makes use of the incircle of a triangle which is tangent to its three sides and form a INFS. In addition, an incenter can be calculated by three training samples of the same class efficiently. Furthermore, in order to speed up the computation performance, this paper proposes a parallel computing version of INFS, namely parallel INFS (PINFS). It uses a modern graphics processing unit (GPU) architecture with NVIDIA's compute unified device architecture (CUDA) technology to improve the computational speed of INFS. Experimental results demonstrate the proposed INFS approach is suitable for land cover classification in earth remote sensing. It can achieve the better performance than NFS classifier when the class sample distribution overlaps. Through the computation of GPU by CUDA, we can also gain better speedup.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129767526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A hybrid on-chip network with a low buffer requirement 具有低缓冲需求的混合片上网络
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097818
Jen-Yu Wang, Yarsun Hsu
As the CMOS technology develops, the number of buffers required in a network-on-chip increases with flit width. This increase of buffers provides more power and area overhead to a network router. This paper proposes a hybrid packet-switched and circuit-switched network in which the total buffer requirement depends on only the width of the short message and buffer depth, and does not increase with the network width. The performance is maintained through a low latency circuit-switch by using a simple reverse path reservation method. The simulation results indicated that a considerable amount of power and area can be saved by the buffer reduction, whereas performance is maintained.
随着CMOS技术的发展,片上网络所需的缓冲器数量随着飞动宽度的增加而增加。缓冲区的增加为网络路由器提供了更多的功率和面积开销。本文提出了一种分组交换和电路交换的混合网络,其中总缓冲区需求仅取决于短消息的宽度和缓冲区深度,而不随网络宽度的增加而增加。通过使用简单的反向路径保留方法,通过低延迟电路切换来保持性能。仿真结果表明,在保证性能的前提下,减少缓冲可以节省大量的功率和面积。
{"title":"A hybrid on-chip network with a low buffer requirement","authors":"Jen-Yu Wang, Yarsun Hsu","doi":"10.1109/PADSW.2014.7097818","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097818","url":null,"abstract":"As the CMOS technology develops, the number of buffers required in a network-on-chip increases with flit width. This increase of buffers provides more power and area overhead to a network router. This paper proposes a hybrid packet-switched and circuit-switched network in which the total buffer requirement depends on only the width of the short message and buffer depth, and does not increase with the network width. The performance is maintained through a low latency circuit-switch by using a simple reverse path reservation method. The simulation results indicated that a considerable amount of power and area can be saved by the buffer reduction, whereas performance is maintained.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"523 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129790925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orchestrating safe streaming computations with precise control 协调安全流计算与精确控制
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097925
Peng Li, Kunal Agrawal, J. Buhler, R. Chamberlain
Streaming computing is a paradigm of distributed computing that features networked nodes connected by first-in-first-out data channels. Communication between nodes may include not only high-volume data tokens but also infrequent and unpredictable control messages carrying control information, such as data set boundaries, exceptions, or reconfiguration requests. In many applications, it is necessary to order delivery of control messages precisely relative to data tokens, which can be especially challenging when nodes can filter data tokens. Existing approaches, mainly data serialization protocols, do not exploit the low-volume nature of control messages and may not guarantee that synchronization of these messages with data will be free of deadlock. In this paper, we propose an efficient messaging system for adding precisely ordered control messages to streaming applications. We use a credit-based protocol to avoid the need to tag data tokens and control messages. For potential deadlocks caused by filtering behavior and global synchronization, we propose deadlock avoidance solutions and prove their correctness.
流计算是分布式计算的一种范例,其特点是通过先进先出的数据通道连接网络节点。节点之间的通信可能不仅包括大容量数据令牌,还包括携带控制信息的不频繁和不可预测的控制消息,例如数据集边界、异常或重新配置请求。在许多应用程序中,有必要相对于数据令牌精确地安排控制消息的交付顺序,当节点可以过滤数据令牌时,这尤其具有挑战性。现有的方法(主要是数据序列化协议)没有利用控制消息的低容量特性,并且可能无法保证这些消息与数据的同步不会出现死锁。在本文中,我们提出了一种高效的消息传递系统,用于向流应用程序添加精确排序的控制消息。我们使用基于信用的协议来避免标记数据令牌和控制消息的需要。针对过滤行为和全局同步导致的潜在死锁,提出了避免死锁的解决方案,并证明了其正确性。
{"title":"Orchestrating safe streaming computations with precise control","authors":"Peng Li, Kunal Agrawal, J. Buhler, R. Chamberlain","doi":"10.1109/PADSW.2014.7097925","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097925","url":null,"abstract":"Streaming computing is a paradigm of distributed computing that features networked nodes connected by first-in-first-out data channels. Communication between nodes may include not only high-volume data tokens but also infrequent and unpredictable control messages carrying control information, such as data set boundaries, exceptions, or reconfiguration requests. In many applications, it is necessary to order delivery of control messages precisely relative to data tokens, which can be especially challenging when nodes can filter data tokens. Existing approaches, mainly data serialization protocols, do not exploit the low-volume nature of control messages and may not guarantee that synchronization of these messages with data will be free of deadlock. In this paper, we propose an efficient messaging system for adding precisely ordered control messages to streaming applications. We use a credit-based protocol to avoid the need to tag data tokens and control messages. For potential deadlocks caused by filtering behavior and global synchronization, we propose deadlock avoidance solutions and prove their correctness.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121051239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
BusCast: Flexible and privacy preserving message delivery using urban buses BusCast:使用城市总线进行灵活且保护隐私的消息传递
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097847
Shan Chang, Hongzi Zhu, M. Dong, K. Ota, Xiaoqiang Liu, Guangtao Xue, Xuemin Shen
With the popularity of intelligent mobile devices, enormous urban information has been generated and required by the public. In response, ShanghaiGrid (SG) aims to providing abundant information services to the public. With fixed schedule and urban-wide coverage, an appealing service in SG is to provide free message delivery service to the public using buses, which allows mobile device users to send messages to locations of interest via buses. The main challenge in realizing this service is to provide efficient routing scheme with privacy preservation under highly dynamic urban traffic condition. In this paper, we present an innovative scheme BusCast to tackle this problem. In BusCast, buses can pick up and forward personal messages to their destination locations in a store-carry-forward fashion. For each message, BusCast conservatively associates a routing graph rather than a fixed routing path with the message in order to adapt the dynamic of urban traffic. Meanwhile, the privacy information about the user and the message destination is concealed from both intermediate relay buses and outside adversaries. Both rigorous privacy analysis and extensive trace-driven simulations demonstrate the efficacy of BusCast scheme.
随着智能移动设备的普及,产生了海量的城市信息,并满足了公众的需求。为此,上海电网旨在为公众提供丰富的信息服务。在固定的时间和城市覆盖范围内,SG的一个吸引人的服务是使用公共汽车向公众提供免费的消息传递服务,这使得移动设备用户可以通过公共汽车向感兴趣的地点发送消息。实现这种服务的主要挑战是在高度动态的城市交通条件下提供有效的路由方案并保护隐私。在本文中,我们提出了一种创新的方案BusCast来解决这个问题。在BusCast中,公共汽车可以以存储结转的方式拾取个人信息并将其转发到目的地。对于每个消息,BusCast保守地将路由图而不是固定的路由路径与消息关联,以适应城市交通的动态。同时,用户和消息目的地的隐私信息对中间中继总线和外部攻击者都是隐藏的。严格的隐私分析和广泛的跟踪驱动仿真都证明了BusCast方案的有效性。
{"title":"BusCast: Flexible and privacy preserving message delivery using urban buses","authors":"Shan Chang, Hongzi Zhu, M. Dong, K. Ota, Xiaoqiang Liu, Guangtao Xue, Xuemin Shen","doi":"10.1109/PADSW.2014.7097847","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097847","url":null,"abstract":"With the popularity of intelligent mobile devices, enormous urban information has been generated and required by the public. In response, ShanghaiGrid (SG) aims to providing abundant information services to the public. With fixed schedule and urban-wide coverage, an appealing service in SG is to provide free message delivery service to the public using buses, which allows mobile device users to send messages to locations of interest via buses. The main challenge in realizing this service is to provide efficient routing scheme with privacy preservation under highly dynamic urban traffic condition. In this paper, we present an innovative scheme BusCast to tackle this problem. In BusCast, buses can pick up and forward personal messages to their destination locations in a store-carry-forward fashion. For each message, BusCast conservatively associates a routing graph rather than a fixed routing path with the message in order to adapt the dynamic of urban traffic. Meanwhile, the privacy information about the user and the message destination is concealed from both intermediate relay buses and outside adversaries. Both rigorous privacy analysis and extensive trace-driven simulations demonstrate the efficacy of BusCast scheme.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123339684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An improved realistic mobility model and mechanism for VANET based on SUMO and NS3 collaborative simulations 基于SUMO和NS3协同仿真的VANET仿真模型与机制
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097905
Yunyun Su, Haibin Cai, Jingmin Shi
The information field is undergoing a new round of technological revolution from the Internet to the Internet of things. Vehicle ad-hoc network (VANET), an application of the internet of things using in Intelligent Transportation System (ITS), has already attracted broad attention in the world in recent years. It mainly provides communications between vehicle-to-vehicle and vehicle-to-infrastructure, which significantly improve road transport efficiency, reduce energy consumption and ease traffic congestion. In this paper, we developed a client to make SUMO and NS3 work parallel by TraCI (Traffic Control Interface) in NS3. It helps NS3 get SUMO's information and sends instructions to change the states of vehicles and traffic lights. We present a realistic road traffic model with kinds of vehicles and intelligent traffic lights. The model is built in SUMO (Simulation of Urban Mobility). We use OpenStreetMap to generate a realistic map near the bund in Shanghai. The traffic flow is built according to a survey which makes us get meaningful and reliable statistics. A mechanism of changing the traffic lights dynamically is introduced to minimize traffic jams and give high priority to emergency vehicle. As a result, the waiting time and the duration of the vehicles in the scenario have reduced significantly after using the mechanism. The emergency vehicle's waiting time is less than others.
信息领域正在经历从互联网到物联网的新一轮技术革命。车辆自组织网络(VANET)是物联网在智能交通系统(ITS)中的一种应用,近年来在世界范围内引起了广泛关注。它主要提供车与车、车与基础设施之间的通信,大大提高了道路运输效率,降低了能源消耗,缓解了交通拥堵。在本文中,我们开发了一个客户端,通过NS3中的TraCI (Traffic Control Interface)实现SUMO和NS3的并行工作。它帮助NS3获取相扑的信息,并发送指令来改变车辆和交通灯的状态。提出了一种具有多种车辆和智能交通灯的现实道路交通模型。该模型在SUMO (Simulation of Urban Mobility)中建立。我们使用OpenStreetMap在上海外滩附近生成一个逼真的地图。交通流量是根据调查建立的,这使我们得到有意义和可靠的统计数据。引入了一种动态改变交通信号灯的机制,使交通堵塞最小化,并给予应急车辆优先权。因此,在使用该机制后,该场景下车辆的等待时间和持续时间大大减少。急救车辆的等待时间比其他车辆短。
{"title":"An improved realistic mobility model and mechanism for VANET based on SUMO and NS3 collaborative simulations","authors":"Yunyun Su, Haibin Cai, Jingmin Shi","doi":"10.1109/PADSW.2014.7097905","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097905","url":null,"abstract":"The information field is undergoing a new round of technological revolution from the Internet to the Internet of things. Vehicle ad-hoc network (VANET), an application of the internet of things using in Intelligent Transportation System (ITS), has already attracted broad attention in the world in recent years. It mainly provides communications between vehicle-to-vehicle and vehicle-to-infrastructure, which significantly improve road transport efficiency, reduce energy consumption and ease traffic congestion. In this paper, we developed a client to make SUMO and NS3 work parallel by TraCI (Traffic Control Interface) in NS3. It helps NS3 get SUMO's information and sends instructions to change the states of vehicles and traffic lights. We present a realistic road traffic model with kinds of vehicles and intelligent traffic lights. The model is built in SUMO (Simulation of Urban Mobility). We use OpenStreetMap to generate a realistic map near the bund in Shanghai. The traffic flow is built according to a survey which makes us get meaningful and reliable statistics. A mechanism of changing the traffic lights dynamically is introduced to minimize traffic jams and give high priority to emergency vehicle. As a result, the waiting time and the duration of the vehicles in the scenario have reduced significantly after using the mechanism. The emergency vehicle's waiting time is less than others.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132518796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
EasiCAE: A runtime framework for efficient sensor sharing among concurrent IoT applications EasiCAE:在并发物联网应用程序之间有效共享传感器的运行时框架
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097825
Hailong Shi, Dong Li, H. Chen, J. Qiu, Li Cui
Traditional wireless sensor networks (WSNs) can be integrated into Internet and be regarded as its sensing infrastructure, which supports development and running of multiple third-party applications simultaneously. Therefore, due to constrained resource of sensor nodes, it is necessary to establish a runtime framework to improve sensor sharing efficiency for concurrent third-party applications. This paper presents EasiCAE, a concurrent applications runtime framework, to enhance sensor sharing efficiency greatly by incorporating task allocation with redundancy elimination. In brief, EasiCAE decompose the applications into tasks and distributes tasks to the sensors which will bring the least energy to run them. EasiCAE has three salient features. Firstly, we define task-sensor correlation to indicate how many samplings of a sensor can be shared with the new task. Secondly, EasiCAE reduces energy consumption by assigning tasks to a sensor with higher task-sensor correlation. Finally, a light-weight merging algorithm is proposed to eliminate redundant samplings for the assigned sensors. Experimental results show that EasiCAE reduces energy consumption by 31% to 79% compared with existing methods, while introducing tolerable overheads. We also evaluate EasiCAE with various influencing parameters, showing that the performance of EasiCAE increases stably as the network scale and the number of concurrent applications increases.
传统的无线传感器网络可以集成到互联网中,作为互联网的传感基础设施,支持多个第三方应用程序的同时开发和运行。因此,由于传感器节点资源有限,有必要建立一个运行时框架,以提高并发第三方应用的传感器共享效率。本文提出了一个并行应用运行框架EasiCAE,将任务分配与冗余消除相结合,大大提高了传感器共享效率。简而言之,EasiCAE将应用程序分解为任务,并将任务分配给运行它们所需能量最少的传感器。EasiCAE有三个显著特点。首先,我们定义了任务-传感器的相关性来表示一个传感器的多少采样可以与新任务共享。其次,EasiCAE通过将任务分配给任务-传感器相关性较高的传感器来降低能耗。最后,提出了一种轻量级的合并算法来消除分配传感器的冗余采样。实验结果表明,与现有方法相比,EasiCAE的能耗降低了31%至79%,同时引入了可容忍的开销。我们还利用各种影响参数对EasiCAE进行了评估,结果表明EasiCAE的性能随着网络规模和并发应用数量的增加而稳定增长。
{"title":"EasiCAE: A runtime framework for efficient sensor sharing among concurrent IoT applications","authors":"Hailong Shi, Dong Li, H. Chen, J. Qiu, Li Cui","doi":"10.1109/PADSW.2014.7097825","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097825","url":null,"abstract":"Traditional wireless sensor networks (WSNs) can be integrated into Internet and be regarded as its sensing infrastructure, which supports development and running of multiple third-party applications simultaneously. Therefore, due to constrained resource of sensor nodes, it is necessary to establish a runtime framework to improve sensor sharing efficiency for concurrent third-party applications. This paper presents EasiCAE, a concurrent applications runtime framework, to enhance sensor sharing efficiency greatly by incorporating task allocation with redundancy elimination. In brief, EasiCAE decompose the applications into tasks and distributes tasks to the sensors which will bring the least energy to run them. EasiCAE has three salient features. Firstly, we define task-sensor correlation to indicate how many samplings of a sensor can be shared with the new task. Secondly, EasiCAE reduces energy consumption by assigning tasks to a sensor with higher task-sensor correlation. Finally, a light-weight merging algorithm is proposed to eliminate redundant samplings for the assigned sensors. Experimental results show that EasiCAE reduces energy consumption by 31% to 79% compared with existing methods, while introducing tolerable overheads. We also evaluate EasiCAE with various influencing parameters, showing that the performance of EasiCAE increases stably as the network scale and the number of concurrent applications increases.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132625759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Traffic-aware frequency scaling for balanced on-chip networks on GPGPUs gpgpu上均衡片上网络的流量感知频率缩放
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097795
Chiao-Yun Tu, Yuan-Ying Chang, C. King, Chien-Ting Chen, Tai-Yuan Wang
General-purpose computing on graphics processing units (GPGPU) can provide orders of magnitude more computing power than general purpose processors (CPU) for highly parallel applications. For such parallel applications, the memory traffic pattern of GPGPUs behaves considerably different from that of CPUs. This gives rise to opportunities for optimizing the on-chip interconnection network (NoC) of GPGPUs. In this work, we first investigate the characteristics of GPGPU memory traffic of typical benchmarks and categorize the memory traffic patterns. Different traffic patterns require different throughput in the request and reply paths of the NoC to match the network load. To meet this requirement, we examine the feasibility of scaling the network frequency dynamically to balance the throughput of the request and reply networks. The decision is guided by monitoring some shader cores to identify the memory traffic pattern. Performance evaluation shows that this dynamic frequency tuning design can achieve up to 27% improvement in terms of execution speedup compared to a baseline setting and 7.4% improvement on average.
图形处理单元(GPGPU)上的通用计算可以为高度并行应用程序提供比通用处理器(CPU)多几个数量级的计算能力。对于这种并行应用程序,gpgpu的内存流量模式与cpu的内存流量模式有很大的不同。这为优化gpgpu的片上互连网络(NoC)提供了机会。在这项工作中,我们首先研究了典型基准测试的GPGPU内存流量特征,并对内存流量模式进行了分类。不同的流量模式要求NoC的请求和应答路径的吞吐量不同,以匹配网络负载。为了满足这一需求,我们研究了动态扩展网络频率以平衡请求和应答网络吞吐量的可行性。这个决定是通过监控一些着色器内核来识别内存流量模式来指导的。性能评估表明,与基线设置相比,这种动态频率调优设计在执行加速方面可以提高27%,平均提高7.4%。
{"title":"Traffic-aware frequency scaling for balanced on-chip networks on GPGPUs","authors":"Chiao-Yun Tu, Yuan-Ying Chang, C. King, Chien-Ting Chen, Tai-Yuan Wang","doi":"10.1109/PADSW.2014.7097795","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097795","url":null,"abstract":"General-purpose computing on graphics processing units (GPGPU) can provide orders of magnitude more computing power than general purpose processors (CPU) for highly parallel applications. For such parallel applications, the memory traffic pattern of GPGPUs behaves considerably different from that of CPUs. This gives rise to opportunities for optimizing the on-chip interconnection network (NoC) of GPGPUs. In this work, we first investigate the characteristics of GPGPU memory traffic of typical benchmarks and categorize the memory traffic patterns. Different traffic patterns require different throughput in the request and reply paths of the NoC to match the network load. To meet this requirement, we examine the feasibility of scaling the network frequency dynamically to balance the throughput of the request and reply networks. The decision is guided by monitoring some shader cores to identify the memory traffic pattern. Performance evaluation shows that this dynamic frequency tuning design can achieve up to 27% improvement in terms of execution speedup compared to a baseline setting and 7.4% improvement on average.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"206 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128612028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
esDMT: Efficient and scalable deterministic multithreading through memory isolation esDMT:通过内存隔离实现高效和可扩展的确定性多线程
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097794
Jie Sun, Xiaofei Liao, Long Zheng, Hai Jin, Yu Zhang
Deterministic multithreading (DMT) system is well-known to eliminate the harmful program behaviors caused by nondeterminism, i.e., always proceeding the program execution into the same thread schedule for the same given input. To achieve this goal, two kinds of schedules are enforced by existing DMT systems. 1) A mem-based schedule ensures the determinism with the total order of the shared memory accesses, and 2) A sync-based schedule makes it by only enforcing the total order of the synchronization operations. Mem-schedule achieves full determinism but suffers from prohibitive overhead; while sync-schedule mitigates this overhead but cannot ensure the determinism for the race schedules, i.e., part determinism. Much recent research is devoted to the hybrid schedule combining the determinism of mem-schedule and efficiency of sync-schedule. However, they suffer from the practicability and scalability problems due to the defects of their technical characteristics, such as trace collection in advance and huge schedule memoization. To address the above problem, this paper proposes esDMT, an efficient and scalable DMT system using a new technique of memory isolation. It can improve the efficiency by proceeding the execution of each thread in parallel within its private virtual memory, and defers the determinism guarantee by updating private memory into shared memory in a deterministic order according to deterministic lock algorithm, thus further reducing the overhead of inter-thread waiting. In contrast to the previous hybrid work avoiding the nondeterminism of race schedules offline based on the enormous historical records, our key insight is to eliminate the nondeterminism of race schedules online at runtime. Our experimental results on PARSEC benchmarks show that esDMT eliminates the nondeterminism successfully, almost gains the same performance as the sync-schedule (with <;18% slowdown compared with pthread library at most), and manifests good scalability on an 8-core machine.
确定性多线程(DMT)系统以消除由不确定性引起的有害程序行为而闻名,即对于相同的给定输入,总是将程序执行推进到相同的线程调度中。为了实现这一目标,现有的DMT系统强制执行两种调度。1)基于内存的调度确保了共享内存访问总顺序的确定性,2)基于同步的调度仅通过强制同步操作的总顺序来实现确定性。memo -schedule实现了完全的确定性,但承受了过高的开销;同步调度减轻了这种开销,但不能确保竞争调度的确定性,即部分确定性。近年来,人们对混合调度进行了大量的研究,这些研究结合了调度的确定性和同步调度的效率。然而,由于其技术特性的缺陷,如预先跟踪收集和大量的调度记忆,导致其实用性和可扩展性问题。为了解决上述问题,本文提出了一种基于内存隔离技术的高效可扩展DMT系统esDMT。它可以通过在私有虚拟内存中并行执行每个线程来提高效率,并根据确定性锁算法以确定性顺序将私有内存更新为共享内存来延迟确定性保证,从而进一步降低线程间等待的开销。与以往基于大量历史记录避免离线比赛日程不确定性的混合工作不同,我们的关键观点是在运行时消除在线比赛日程的不确定性。我们在PARSEC基准测试上的实验结果表明,esDMT成功地消除了不确定性,几乎获得了与同步调度相同的性能(与pthread库相比,最多降低了18%),并且在8核机器上表现出良好的可伸缩性。
{"title":"esDMT: Efficient and scalable deterministic multithreading through memory isolation","authors":"Jie Sun, Xiaofei Liao, Long Zheng, Hai Jin, Yu Zhang","doi":"10.1109/PADSW.2014.7097794","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097794","url":null,"abstract":"Deterministic multithreading (DMT) system is well-known to eliminate the harmful program behaviors caused by nondeterminism, i.e., always proceeding the program execution into the same thread schedule for the same given input. To achieve this goal, two kinds of schedules are enforced by existing DMT systems. 1) A mem-based schedule ensures the determinism with the total order of the shared memory accesses, and 2) A sync-based schedule makes it by only enforcing the total order of the synchronization operations. Mem-schedule achieves full determinism but suffers from prohibitive overhead; while sync-schedule mitigates this overhead but cannot ensure the determinism for the race schedules, i.e., part determinism. Much recent research is devoted to the hybrid schedule combining the determinism of mem-schedule and efficiency of sync-schedule. However, they suffer from the practicability and scalability problems due to the defects of their technical characteristics, such as trace collection in advance and huge schedule memoization. To address the above problem, this paper proposes esDMT, an efficient and scalable DMT system using a new technique of memory isolation. It can improve the efficiency by proceeding the execution of each thread in parallel within its private virtual memory, and defers the determinism guarantee by updating private memory into shared memory in a deterministic order according to deterministic lock algorithm, thus further reducing the overhead of inter-thread waiting. In contrast to the previous hybrid work avoiding the nondeterminism of race schedules offline based on the enormous historical records, our key insight is to eliminate the nondeterminism of race schedules online at runtime. Our experimental results on PARSEC benchmarks show that esDMT eliminates the nondeterminism successfully, almost gains the same performance as the sync-schedule (with <;18% slowdown compared with pthread library at most), and manifests good scalability on an 8-core machine.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126892420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An access point to device association technique for optimized data transfer in mobile grids 一种用于移动网格中优化数据传输的设备关联技术的接入点
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097881
A. Banerjee, H. Paul, A. Mukherjee, P. Datta, Sajal K. Das
In a mobile grid computing framework where mobile devices are used as computing resources, minimizing the task offloading time remains an important issue. A task is an independent unit of execution consisting of a input data volume for execution and optionally a target-specific executable. We consider a mobile grid infrastructure where mobile devices are connected via Wi-Fi network and the grid infrastructure has a set of tasks (i.e. a set of data volumes) to be transferred to a subset of the mobile devices. In a Wi-Fi network, mobile devices usually associate themselves to the access points (APs) having the strongest radio signal. In this paper, we address the problem of AP activation (by frequency assignment) and association of AP with devices in the context of minimizing the overall data-transfer completion time. We present a constraint based formulation and also a heuristic as solutions. Simulations results are presented which contrast our proposed methods with some of the earlier works.
在使用移动设备作为计算资源的移动网格计算框架中,最小化任务卸载时间仍然是一个重要的问题。任务是一个独立的执行单元,由用于执行的输入数据卷和可选的特定于目标的可执行文件组成。我们考虑一个移动网格基础设施,其中移动设备通过Wi-Fi网络连接,网格基础设施有一组任务(即一组数据量)要传输到移动设备的子集。在Wi-Fi网络中,移动设备通常与具有最强无线电信号的接入点(ap)相关联。在本文中,我们在最小化总体数据传输完成时间的背景下解决AP激活(通过频率分配)和AP与设备关联的问题。我们提出了一个基于约束的公式和一个启发式的解决方案。仿真结果表明,本文提出的方法与前人的一些研究成果进行了对比。
{"title":"An access point to device association technique for optimized data transfer in mobile grids","authors":"A. Banerjee, H. Paul, A. Mukherjee, P. Datta, Sajal K. Das","doi":"10.1109/PADSW.2014.7097881","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097881","url":null,"abstract":"In a mobile grid computing framework where mobile devices are used as computing resources, minimizing the task offloading time remains an important issue. A task is an independent unit of execution consisting of a input data volume for execution and optionally a target-specific executable. We consider a mobile grid infrastructure where mobile devices are connected via Wi-Fi network and the grid infrastructure has a set of tasks (i.e. a set of data volumes) to be transferred to a subset of the mobile devices. In a Wi-Fi network, mobile devices usually associate themselves to the access points (APs) having the strongest radio signal. In this paper, we address the problem of AP activation (by frequency assignment) and association of AP with devices in the context of minimizing the overall data-transfer completion time. We present a constraint based formulation and also a heuristic as solutions. Simulations results are presented which contrast our proposed methods with some of the earlier works.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126673489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Intactness verification in anonymous RFID systems 匿名RFID系统的完整性验证
Pub Date : 2014-12-01 DOI: 10.1109/PADSW.2014.7097801
Kai Bu, Jia Liu, Bin Xiao, Xuan Liu, Shigeng Zhang
Radio-Frequency Identification (RFID) technology has fostered many object monitoring systems. Along with this trend, tagged objects' value and privacy become a primary concern. A corresponding important problem is to verify the intactness of a set of tagged objects without leaking tag identifiers (IDs). However, existing solutions necessitate the knowledge of tag IDs. Without tag IDs as a priori, this paper studies intactness verification in anonymous RFID systems. We identify three critical solution requirements, that is, deterministic verification, anonymity preservation, and scalability. We propose Cardiff and Divar, two crypto-free, lightweight protocols that isolate tag IDs from intactness verification and satisfy solution requirements. Cardiff explores tag cardinality as intactness proof while Divar leverages Direct-Sequence Spread Spectrum (DSSS) enabled RFID. Both analytical and simulation results demonstrate that Cardiff and Divar can satisfy the requirements of accuracy, privacy, and scalability.
射频识别(RFID)技术促进了许多对象监控系统的发展。随着这一趋势,标记对象的价值和隐私成为主要关注的问题。一个相应的重要问题是在不泄漏标签标识符(id)的情况下验证一组标记对象的完整性。但是,现有的解决方案需要了解标签id。本文在不先验标签id的情况下,研究匿名RFID系统的完整性验证。我们确定了三个关键的解决方案需求,即确定性验证、匿名保护和可扩展性。我们提出Cardiff和Divar,这是两个无加密的轻量级协议,可以将标签id与完整性验证隔离开来,并满足解决方案的要求。卡迪夫探索标签基数作为完整性证明,而迪瓦利用直接序列扩频(DSSS)启用RFID。分析和仿真结果表明,Cardiff和Divar可以满足准确性、隐私性和可扩展性的要求。
{"title":"Intactness verification in anonymous RFID systems","authors":"Kai Bu, Jia Liu, Bin Xiao, Xuan Liu, Shigeng Zhang","doi":"10.1109/PADSW.2014.7097801","DOIUrl":"https://doi.org/10.1109/PADSW.2014.7097801","url":null,"abstract":"Radio-Frequency Identification (RFID) technology has fostered many object monitoring systems. Along with this trend, tagged objects' value and privacy become a primary concern. A corresponding important problem is to verify the intactness of a set of tagged objects without leaking tag identifiers (IDs). However, existing solutions necessitate the knowledge of tag IDs. Without tag IDs as a priori, this paper studies intactness verification in anonymous RFID systems. We identify three critical solution requirements, that is, deterministic verification, anonymity preservation, and scalability. We propose Cardiff and Divar, two crypto-free, lightweight protocols that isolate tag IDs from intactness verification and satisfy solution requirements. Cardiff explores tag cardinality as intactness proof while Divar leverages Direct-Sequence Spread Spectrum (DSSS) enabled RFID. Both analytical and simulation results demonstrate that Cardiff and Divar can satisfy the requirements of accuracy, privacy, and scalability.","PeriodicalId":421740,"journal":{"name":"2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116819023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1